Incident runbook
During an outage, the server should not become a puzzle.
A Linux incident runbook gives maintainers a calm order of checks: confirm the symptom, inspect logs, check disk and services, restart safely, roll back if needed, protect data, and communicate clearly.
Incidents are not only technical failures. They are also information failures. When nobody knows which service runs the app, where logs live, how to restart safely, or who owns DNS, a small outage becomes a long one. A runbook does not eliminate incidents, but it reduces confusion.
Official source note: The journalctl manual documents querying logs from the systemd journal, which is often part of Linux incident investigation: journalctl manual page.



