From Opus 4.7 / 4.8
Fable 5 keeps the same API surface. Swap the model string and you're done:
response = client.messages.create(
model="claude-fable-5", # was "claude-opus-4-8"
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "high"},
messages=[...],
)
One new rule: on Opus 4.7/4.8 an explicit thinking: {"type": "disabled"} is accepted; on Fable it returns 400. If you were disabling thinking, omit the field entirely instead.
From Opus 4.6 or older
Apply these in order — each returns a clean 400 if you miss it:
| Remove | Replace with |
|---|---|
temperature / top_p / top_k | Nothing — steer with prompting |
thinking: {"type":"enabled","budget_tokens":N} | thinking: {"type":"adaptive"} + output_config.effort |
thinking: {"type":"disabled"} | Omit the thinking field |
| Assistant-turn prefill as last message | Structured outputs — see Recipe 03 |
Non-streaming max_tokens > ~16K | client.messages.stream(...) |
After it compiles: re-tune
- Effort. Fable's intelligence ceiling is higher, so don't reflexively run
xhigh. Sweepmedium/high/xhighon your own eval set — Fable took the top FrontierCode score at medium. - Token counts. Re-baseline with
client.messages.count_tokens()againstclaude-fable-5on representative prompts before reacting to cost dashboards. Don't apply a blanket multiplier. - Cache. Changing the model string invalidates your existing prompt cache — the first request on Fable writes it fresh. Expected, not a bug.
- Thinking display. If your product shows reasoning, request
thinking: {"type":"adaptive","display":"summarized"}— the default is omitted, which renders as a long silent pause.
Verify:
assert response.model.startswith("claude-fable-5") on one real request before rolling out.Moral: the migration is mechanical; the wins come from the re-tune.