Two days after Fable 5's return, a benchmark result set off a wave of "the redeployed model is broken" posts: on a TypeScript debugging suite, Fable 5's score fell from 86.2 to 25.9 — a roughly 70% collapse. Taken at face value, it looks like Anthropic shipped a crippled model to get it past the export-control review.
The face value is wrong, and the real explanation is more useful to know.
What the number actually measures
The detail buried in the benchmark: of 12 TypeScript debugging tasks, only three ever reached Fable 5. The other nine were intercepted by the new safety classifier and rerouted to Claude Opus 4.8 — which then answered as Opus, at Opus's level, and got scored as if it were Fable. The "collapse" is mostly a chart of how often the classifier fired, not how well the model reasoned.
A broader read supports that. Arena.AI's human-preference testing across diverse prompts showed Fable 5 holding mostly steady against its pre-shutdown June version, with frontend-code performance staying inside the confidence interval. The one-line summary from the analyses: Fable 5 still performs like Fable 5 when prompts reach it. The problem is that security-adjacent coding work can be diverted before the model responds.
Why ordinary debugging trips it
The classifier was trained to catch the reported cyber-jailbreak, and it deliberately runs with a wide safety margin — it would rather block a benign request than miss a harmful one. The trouble is that routine debugging structurally resembles the thing it's watching for: words like "vulnerability," "exploit," or even "fix this" in security-adjacent code can read, to a classifier, like the framing that started this whole episode. So a legitimate bug hunt gets waved off to Opus with a notification.
Anthropic has said it will keep refining the classifier over the coming weeks to reduce these false positives — without committing to a timeline.
What to do about it
- Don't read the benchmark as model decay. If your task reaches Fable, you're getting Fable. The scores that scared you are measuring the gate, not the engine.
- Watch which model finished the job. Fable notifies you on a handoff — if a debugging session quietly feels like Opus, it probably is. Live status & behavior notes.
- Rephrase, don't rage. Neutral wording ("this function returns the wrong value on empty input") clears the classifier far more often than security-loaded phrasing ("find the exploit here").
- For security-heavy work, plan for Opus. Pentest tooling and CVE analysis will trip the gate by design this week — route them to Opus 4.8 deliberately rather than fighting the filter. Full triage.
Moral: a redeployed model and an over-eager bouncer look identical from the scoreboard. Check who's actually answering before you write the eulogy.
Sources
Yellow.com — "A router problem, not model decay" · TechTimes — debugging scores drop 70% · benchmark data attributed to BridgeMind (TypeScript suite) and Arena.AI (human-preference); Anthropic's classifier detail in Redeploying Fable 5 (hosted at /press/anthropic.html).