Skip to content

Get system health

GET
/v1/durable/health
curl --request GET \
--url https://app.everruns.com/api/v1/durable/health

System health

Media type application/json

System health response

object
active_workers
required

Number of workers in the running state, ready to claim tasks.

integer
claimed_tasks
required

Tasks currently claimed by a worker (gauge).

integer
completed_tasks
required

Cumulative count of tasks that completed successfully (monotonic counter).

integer
completed_workflows
required

Cumulative count of workflows that completed successfully (monotonic counter).

integer
current_load
required

Total tasks currently in flight across all workers.

integer
dlq_size
required

Size of the dead-letter queue (gauge). High values indicate stuck activities.

integer
event_delivery

Event-delivery backend in use: nats for distributed deployments, in_memory for single-instance. None if the field was omitted by an older server.

string | null
failed_tasks
required

Cumulative count of tasks that failed terminally or were sent to the DLQ (monotonic counter).

integer
failed_workflows
required

Cumulative count of workflows that ended in failure (monotonic counter).

integer
load_percentage
required

current_load / total_capacity * 100. 0.0 when no workers are registered.

number format: double
pending_tasks
required

Tasks waiting to be claimed (gauge).

integer
pending_workflows
required

Workflows waiting to be claimed (gauge).

integer
running_workflows
required

Workflows currently executing (gauge).

integer
started_tasks
required

Cumulative count of tasks claimed at least once (monotonic counter).

integer
started_workflows
required

Cumulative count of workflows that started (monotonic counter).

integer
status
required

Aggregate system status: healthy, degraded, or unhealthy. Derived from worker availability, load, and queue depths.

string
total_capacity
required

Sum of max_concurrency across all workers (the upper bound on concurrent task execution).

integer
total_workers
required

Total number of workers registered (heartbeating in the last window).

integer
workers_accepting
required

Number of workers currently accepting new task assignments (subset of active_workers; drains/backpressure excluded).

integer
Example generated
{
"active_workers": 1,
"claimed_tasks": 1,
"completed_tasks": 1,
"completed_workflows": 1,
"current_load": 1,
"dlq_size": 1,
"event_delivery": "example",
"failed_tasks": 1,
"failed_workflows": 1,
"load_percentage": 1,
"pending_tasks": 1,
"pending_workflows": 1,
"running_workflows": 1,
"started_tasks": 1,
"started_workflows": 1,
"status": "example",
"total_capacity": 1,
"total_workers": 1,
"workers_accepting": 1
}

Internal server error

Media type application/json

Standard error response.

Wire shape is RFC 9457 Problem Details: every error response includes title and status, and may include detail, code, allowed_actions, retry_after_seconds, instance, and type. The content type is rewritten to application/problem+json by [problem_json_content_type].

object
allowed_actions

Recovery actions the caller can take next.

Array<object>

Agent-actionable recovery hint attached to an error response.

object
hint

Short, agent-readable hint (e.g. “Shorten ‘name’ to <= 200 chars.”).

string | null
href

Optional absolute or relative URL the caller may invoke directly.

string | null
operation_id

OpenAPI operationId the caller should invoke to recover.

string | null
rel
required

Link relation describing the action (e.g. retry, get-existing, unarchive, retry-later).

string
code

Stable, machine-readable error code (snake_case).

string | null
detail

Human-readable explanation specific to this occurrence.

string | null
instance

Request URI for this occurrence.

string | null
retry_after_seconds

Seconds the caller should wait before retrying (429 / transient 503).

integer | null format: int32
status
required

HTTP status code; mirrors the response status line.

integer format: int32
title
required

Short, human-readable summary of the problem (e.g. “Not Found”).

string
type

RFC 9457 problem type URI. Optional; identifies the problem class.

string | null
Example generated
{
"allowed_actions": [
{
"hint": "example",
"href": "example",
"operation_id": "example",
"rel": "example"
}
],
"code": "example",
"detail": "example",
"instance": "example",
"retry_after_seconds": 1,
"status": 1,
"title": "example",
"type": "example"
}