Heartbeats
Understand passive checks — how heartbeats monitor cron jobs, background workers, and scheduled tasks.
What is a heartbeat?
A heartbeat is a passive check. Instead of the platform actively probing your service, your service probes the platform. At the end of each scheduled job run, your script sends an HTTP POST to a unique ping URL. The platform records the ping and resets the timer. If no ping arrives within interval + grace_period, an incident is opened.
This inverted model is ideal for monitoring things that cannot be reached from outside your network: cron jobs, internal workers, batch processes, or any task that runs on a schedule.
How the timing works
T=0 Job starts
T=N Job completes successfully → sends ping
T=N+interval Platform expects next ping
T=N+interval+grace_period No ping received → incident opens
The grace period is an intentional buffer to absorb jobs with variable runtime. Set it to at least 20–30% of the expected runtime variance.
Heartbeat statuses
| Status | Meaning |
|---|---|
| New | Created, no pings received yet |
| Up | Latest ping arrived within interval + grace_period |
| Late | Interval exceeded but within grace period — no alert yet |
| Down | Grace period exceeded — incident opened, alert sent |
Ping URL format
Each heartbeat gets a unique URL of the form:
POST /api/v1/hb/{heartbeat_id}
Both GET and POST are accepted. The response body and any request body are ignored — only the HTTP method and URL matter. A 200 OK response confirms the ping was recorded.
The ping URL is a secret. If it is ever exposed (committed to a public repo, logged in a public place), rotate it from the heartbeat detail page.
AI Heartbeats (Premium)
On Premium plans, you can upgrade a heartbeat to AI mode. Instead of a fixed interval and grace period, the platform's ML model learns the natural cadence of your job from at least 7 days of historical pings — schedule, frequency, and typical payload size.
When a heartbeat deviates significantly from the learned pattern, you receive an early warning notification before the grace period expires. This is particularly useful for jobs with highly variable runtime (ML training jobs, large data exports) where a fixed interval would generate false positives.
History retention
| Plan | Ping history |
|---|---|
| Free | 7 days |
| Standard | 30 days |
| Premium | 90 days |
| Enterprise | 1 year |
History is visible on the heartbeat detail page as a ping timeline chart and a log table.
Common patterns
Job success only
/path/to/job.sh && curl -sS -X POST "$PING_URL"
Separate success from failure
if /path/to/job.sh; then
curl -sS -X POST "$PING_URL"
else
echo "Job failed, skipping heartbeat ping" >&2
exit 1
fi
Python context manager
import contextlib, requests
@contextlib.contextmanager
def heartbeat(ping_url: str):
yield
requests.post(ping_url, timeout=5)
with heartbeat("$PING_URL"):
run_my_job()
Related
Was this page helpful?