Incidents & severities
Understand how monitors and heartbeats resolve into incidents, lifecycle states, and severity levels.
What is an incident?
An incident represents a service disruption that requires attention. Incidents are created automatically when a monitor or heartbeat exceeds its alert threshold, and resolved automatically when the service recovers. You can also create incidents manually for planned maintenance or issues not yet detected by monitors.
Incident detection pipeline
The diagram below shows how a monitor failure flows through the system until an incident is opened and an on-call notification is sent.
Incident detection pipeline — from probe failure to on-call notification
Incident lifecycle
TRIGGERED → ACKNOWLEDGED → RESOLVED
| State | Meaning |
|---|---|
| Triggered | Monitor failed beyond threshold. Notifications sent to configured channels. |
| Acknowledged | A team member is working on the issue. Escalation pauses (if configured). |
| Resolved | Service recovered. Recovery notification sent. Uptime data updated. |
Example timeline
14:32
Monitor DOWN — Production API (status_code=500)
14:32
Alert dispatched · Slack #ops, Email on-call
14:34
Acknowledged by Sarah (on-call)
14:39
Recovered · 7 min downtime
Auto-resolution
When a monitor returns to a healthy state after a failure, the incident resolves automatically. A recovery notification is sent to the same channels that received the original alert. Incidents can also be resolved manually at any time.
Severities
Each incident has a severity that controls escalation timing and notification channels:
| Severity | Use case |
|---|---|
| Critical | Full outage — immediate page |
| High | Significant degradation — escalates within minutes |
| Medium | Partial failure — standard on-call routing |
| Low | Minor issue — no on-call page by default |
Severity thresholds are configured per monitor in Settings → Monitors.
Automatic incident triggers
An incident is triggered when:
- HTTP monitor returns a status outside the expected range (default: non-2xx)
- Monitor times out — no response within the configured timeout
- SSL certificate is expired or invalid
- Heartbeat has not pinged within
interval + grace_period - Response body does not contain the required keyword
Manual incidents
Create a manual incident for situations not captured by monitors — for example, a database migration that causes degraded performance, or a known upstream provider issue.
- Go to Incidents in the sidebar.
- Click + Manual Incident.
- Select the affected monitor(s) and write a description.
- The incident appears on the status page immediately.
Incident notes
While working on an incident, add internal notes to track your investigation:
- Notes are internal only — not visible on the public status page.
- Markdown formatting is supported.
- Use notes for root-cause analysis, team communication, and post-mortem documentation.
Status page visibility
Active incidents appear automatically on your public status page. To add a customer-visible message:
- Open the incident.
- Fill in the Public message field.
- Post updates as the incident progresses.
Incident updates appear in chronological order on the status page.
Correlation: multiple monitors, one incident
When multiple monitors fail at the same time, the platform can group them into a single correlated incident rather than flooding your team with individual alerts. Configure this under Monitors → Correlation Groups.
Alert threshold and deduplication
The alert threshold on a monitor controls how many consecutive failures are required before an incident opens. Once an incident is open, additional check failures for the same monitor do not open new incidents — they are deduplicated into the existing one.
A new incident can only open for a monitor after the previous incident for that monitor has been resolved.
History retention
Resolved incidents are retained in the incident history according to your plan. You can filter the history by monitor, date range, or status from the Incidents view.
Related
Was this page helpful?