A 56-page guide with 10 production debugging playbooks, severity frameworks, escalation templates, and everything your team needs to handle incidents with confidence.
Individual license $49 / Team license $79These are the problems that turn small incidents into long outages.
Different people classify the same incident as P1, P2, or P3. Response time depends on who's on call, not on actual impact.
Your team re-discovers the same diagnostic commands during every incident because nobody wrote them down after the last one.
Engineers spend 30 minutes wondering if they should "bother" someone while the customer impact grows.
Postmortems get skipped or produce vague action items like "improve monitoring" that never get implemented.
No handoff process, no runbooks, no alert tuning. Your best engineers are exhausted and thinking about leaving.
Communication during incidents is inconsistent. Sometimes updates come every 15 minutes, sometimes there's silence for two hours.
56 pages of practical, copy-paste-ready frameworks. No theory fluff.
Who this is for, how to use it, and the entire framework summarized on one page so you can start immediately.
P1-P4 definitions, a decision matrix that removes guesswork, real-world war stories, and anti-patterns to avoid.
Step-by-step escalation paths per severity, the IC role, 4 communication templates, and 5 communication anti-patterns.
The core of the bundle. Step-by-step procedures for the 10 most common production failures. Copy-paste commands included.
How to run postmortems that produce real improvements. Includes a full 5 Whys walkthrough and a guide to building a postmortem library.
Rotation setup, shadow rotations for new engineers, handoff checklists, alert management, and building runbooks people actually use.
5 controlled experiments, game day planning, tool recommendations (free and commercial), a maturity model, and how to handle objections.
10 anti-patterns that hurt reliability - fixing without understanding, alert fatigue, knowledge silos, and more - with concrete fixes.
Each follows the same structure: Symptoms, Immediate Triage (first 5 minutes), Diagnosis, Remediation, Prevention.
YAML and Markdown files you can drop into your workflow today. Customize for your team.
incident-declaration.yaml
Structured incident declaration with metadata, classification, impact, and response team.
postmortem-template.md
Full postmortem document: executive summary, timeline, 5 Whys, action items, lessons learned.
runbook-template.md
Blank runbook with triage, diagnosis, remediation, and escalation sections.
oncall-handoff.md
End-of-rotation handoff: active issues, known risks, alert status, recommendations.
stakeholder-comms.md
6 communication templates for every phase of an incident.
severity-reference.md
One-page quick reference. Print it and pin it next to your monitor.
chaos-experiment.yaml
Structured experiment planning: hypothesis, abort criteria, monitoring, results.
This playbook is the product of years handling production incidents across startups and enterprises.
One purchase. Lifetime access. Free updates.
A structured 3-page template for running postmortems that produce real improvements. Includes the 5 Whys framework, contributing factors checklist, and action item tracker.
No spam. Unsubscribe anytime. You'll also get notified about the upcoming DevSecOps Pipeline Starter Kit.
DevOps engineers, SREs, platform engineers, and engineering managers at small-to-mid teams (5-100 engineers) who want a structured approach to incident management. It's also useful for freelance infrastructure consultants who need templates for client engagements.
The main guide is a professionally formatted PDF. The 7 templates are provided as individual YAML and Markdown files that you can drop directly into your workflow, version control, or documentation system.
The playbooks use common Linux commands, PostgreSQL/MySQL syntax, Kubernetes commands, and AWS CLI examples. The concepts apply universally - if you use GCP or Azure, the diagnostic approach is the same; only the specific commands differ. All templates are plain text and tool-agnostic.
No. One-time purchase, lifetime access, free updates. When new playbooks or templates are added, you get them at no extra cost.
The Individual License is for personal use. If you want to share with your team, the Team License ($79) covers unlimited team members and includes editable Google Docs and Notion versions.
If the playbook doesn't help your team handle incidents better, email hello@incidentplaybook.dev within 30 days for a full refund. No questions asked.