Skip to content

Enforce a budget

Budgets cap how much a session can spend on LLM calls. After every generation, Everruns debits the cost from any active budgets; when the balance reaches zero, the session stops. This guide creates and applies a budget.

For the design and enforcement semantics, see Budgets.

Terminal window
curl -X POST http://localhost:9300/api/v1/budgets \
-H "Content-Type: application/json" \
-d '{
"scope": "session",
"scope_id": "session_...",
"currency": "usd",
"limit": 10.00,
"soft_limit": 8.00
}'

When session spend exceeds soft_limit, the session pauses (status becomes paused) so a human can decide to top up or stop. When it hits limit, the session terminates.

Terminal window
curl -X POST http://localhost:9300/api/v1/budgets \
-H "Content-Type: application/json" \
-d '{
"scope": "agent",
"scope_id": "agent_...",
"currency": "tokens",
"limit": 2000000
}'

Token budgets are model-agnostic — they cap raw token usage regardless of which provider the session uses.

CurrencyUnitCost basis
usdUS dollarsPer-model pricing (input/output cost per million tokens)
tokensRaw tokensDirect count of input + output tokens
credits1 credit = 1,000 tokensToken count ÷ 1,000
CustomAny stringFalls back to raw token count

USD budgets reflect real costs: $10 lasts much longer on GPT-4o than on Claude Opus.

You can apply multiple budgets to a session at once. The most restrictive wins. A common pattern:

  • $10 USD session budget — caps dollar cost.
  • 2,000,000 tokens agent budget — caps total tokens regardless of pricing.

Both apply; whichever runs out first stops the session.

Budget thresholds emit events you can subscribe to:

async for event in client.events.stream(session.id):
if event.type == "budget.warning":
print(f"Budget warning: {event.data}")
elif event.type == "budget.paused":
print(f"Session paused at soft limit")
elif event.type == "budget.exhausted":
print(f"Session stopped — budget exhausted")

A warning fires at 20% remaining; pause fires when crossing the soft limit; exhaustion fires at zero balance.

After a budget.paused event, you can:

  • Increase limit to give the session more headroom:

    Terminal window
    curl -X PATCH http://localhost:9300/api/v1/budgets/$BUDGET_ID \
    -H "Content-Type: application/json" \
    -d '{ "limit": 20.00, "soft_limit": 16.00 }'
  • Or call the resume endpoint to continue against the existing limit (the next LLM call may push the budget over).

Budget checks run after each LLM call, not before, to avoid latency on the hot path. The last generation can slightly overshoot the limit — this is expected and by design. Treat budgets as cost caps, not hard cutoffs measured in single tokens.