Recipe 06

Task budgets

max_tokens is a wall the model hits blind. A task budget is a fuel gauge it can see — and Fable plans around what's left.

The shape

Beta header task-budgets-2026-03-13; the budget covers the whole agentic loop — thinking, tool calls, and final output combined:

response = client.beta.messages.create(
    betas=["task-budgets-2026-03-13"],
    model="claude-fable-5",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[{"role": "user", "content": long_agentic_task}],
)

Budget vs. ceiling

task_budgetmax_tokens
Model aware of itYes — sees a running countdownNo
EnforcementSuggestion — model self-moderatesHard per-response cap
ScopeThe whole task loopOne response
Behavior at the limitPrioritizes and wraps up gracefullyTruncates mid-thought

Use both: a generous task_budget so Fable paces itself, and max_tokens as the hard backstop.

Choosing the number

  • Minimum is 20,000 — below that the request errors.
  • Generous for open-ended work, tighter for latency-sensitive runs. Too tight and the model completes the task less thoroughly, citing the budget as its constraint.
  • Don't guess in production: run the workload once unbudgeted, measure with response.usage, then set the budget from data.
  • Per-step depth is still effort — the budget shapes the run's total spend, not how hard each step thinks.

Moral: tell the traveler how far the inn is, and they'll pace the horse themselves.

This recipe in about a minute