Prompt caching — The Fable Cookbook

The one rule

Caching is a prefix match. The cache key is the exact bytes of the rendered prompt — tools, then system, then messages. A single byte changed anywhere invalidates everything after it. Get the ordering right (stable content first, volatile content last) and caching mostly works for free.

The basic move

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    system=[{
        "type": "text",
        "text": LARGE_STABLE_SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},  # 5-min TTL; "ttl": "1h" available
    }],
    messages=[{"role": "user", "content": question}],
)

For multi-turn agents, also mark the last content block of the newest turn — earlier breakpoints stay valid, so hits accrue as the conversation grows. Max 4 breakpoints per request.

Verify it's working

print(response.usage.cache_creation_input_tokens)  # wrote cache (~1.25x)
print(response.usage.cache_read_input_tokens)      # read cache (~0.1x)
print(response.usage.input_tokens)                 # full price

If cache_read_input_tokens stays zero across identical-prefix requests, hunt for a silent invalidator:

Pattern	Why it kills the cache
`datetime.now()` in the system prompt	Prefix changes every request
`json.dumps(d)` without `sort_keys=True`	Non-deterministic bytes
Per-user IDs interpolated early	No cross-user sharing
Tool set varies per request	Tools render at position 0 — everything misses

Fable-specific: the minimum cacheable prefix on claude-fable-5 is 2,048 tokens. Shorter prefixes silently don't cache — no error, just cache_creation_input_tokens: 0.

Reads cost ~10% of base input; 5-minute-TTL writes cost 1.25× — two requests already break even.
Switching models invalidates the cache (it's model-scoped) — expect one fresh write after migrating.
Don't edit the system prompt mid-session; inject dynamic context later in messages instead.

Moral: the cache doesn't reward cleverness, it rewards stillness — freeze the front of your prompt.

This recipe in about a minute

Part of a bigger loop: this recipe is one piece of running Fable 5 unattended. The full system — constitution, separation of powers, earned-trust ledgers, budgets, and injection defense — is in Build an autonomous agent with Fable 5, the nine-layer field guide.

← Recipe 03: Structured outputs Recipe 05: Memory files →