Fable 5's debugging scores "dropped 70%" — but it's the router, not the model

Two days after Fable 5's return, a benchmark result set off a wave of "the redeployed model is broken" posts: on a TypeScript debugging suite, Fable 5's score fell from 86.2 to 25.9 — a roughly 70% collapse. Taken at face value, it looks like Anthropic shipped a crippled model to get it past the export-control review.

The face value is wrong, and the real explanation is more useful to know.

What the number actually measures

The detail buried in the benchmark: of 12 TypeScript debugging tasks, only three ever reached Fable 5. The other nine were intercepted by the new safety classifier and rerouted to Claude Opus 4.8 — which then answered as Opus, at Opus's level, and got scored as if it were Fable. The "collapse" is mostly a chart of how often the classifier fired, not how well the model reasoned.

A broader read supports that. Arena.AI's human-preference testing across diverse prompts showed Fable 5 holding mostly steady against its pre-shutdown June version, with frontend-code performance staying inside the confidence interval. The one-line summary from the analyses: Fable 5 still performs like Fable 5 when prompts reach it. The problem is that security-adjacent coding work can be diverted before the model responds.

Why ordinary debugging trips it

The classifier was trained to catch the reported cyber-jailbreak, and it deliberately runs with a wide safety margin — it would rather block a benign request than miss a harmful one. The trouble is that routine debugging structurally resembles the thing it's watching for: words like "vulnerability," "exploit," or even "fix this" in security-adjacent code can read, to a classifier, like the framing that started this whole episode. So a legitimate bug hunt gets waved off to Opus with a notification.

Anthropic has said it will keep refining the classifier over the coming weeks to reduce these false positives — without committing to a timeline.

What to do about it

Don't read the benchmark as model decay. If your task reaches Fable, you're getting Fable. The scores that scared you are measuring the gate, not the engine.
Watch which model finished the job. Fable notifies you on a handoff — if a debugging session quietly feels like Opus, it probably is. Live status & behavior notes.
Rephrase, don't rage. Neutral wording ("this function returns the wrong value on empty input") clears the classifier far more often than security-loaded phrasing ("find the exploit here").
For security-heavy work, plan for Opus. Pentest tooling and CVE analysis will trip the gate by design this week — route them to Opus 4.8 deliberately rather than fighting the filter. Full triage.

Moral: a redeployed model and an over-eager bouncer look identical from the scoreboard. Check who's actually answering before you write the eulogy.

Sources

Yellow.com — "A router problem, not model decay" · TechTimes — debugging scores drop 70% · benchmark data attributed to BridgeMind (TypeScript suite) and Arena.AI (human-preference); Anthropic's classifier detail in Redeploying Fable 5 (hosted at /press/anthropic.html).

Keep reading

← All news The classifier trade-offs →

Fable 5's debugging scores “dropped 70%” — but it's the router, not the model

What the number actually measures

Why ordinary debugging trips it

What to do about it

Sources

Keep reading