Software engineering
Stripe: five months into days
The launch's flagship story. In a 50-million-line Ruby codebase, Fable finished a migration in one day that a team had scoped at two-plus months; Stripe says Fable compressed months of engineering into days overall.
Copy this: migrations are Fable's natural habitat — mechanical-but-judgment-laden work at scale, specified once up front. See the whole-task-brief pattern. Source: announcement.
Every: 91/100 on a human-grade exam
Publication-and-products studio Every runs a "Senior Engineer" internal coding exam. Fable 5 scored 91 of 100 — brushing the range of their actual human engineers. Opus 4.8, the previous flagship, scored 63. That 28-point jump in one generation is the clearest single number for "what changed."
Copy this: build your own senior-engineer exam from your real tickets — it tells you more than any public benchmark. Source: AI+ Founders coverage of Every's results.
GitLab: multi-day goal-directed runs
GitLab reports Fable 5 sustains multi-day, goal-directed runs with strong instruction adherence — corroborating the 12-hour-run reports from independent testing and moving "long-horizon" from marketing word to operating parameter.
Copy this: for runs past a few hours, pair a memory file with a task budget — notes keep it coherent, the budget keeps it honest. Sources: GitLab via launch coverage; Digital Today.
The tooling vote: Copilot, Cursor, Cognition
Fable 5 went generally available in GitHub Copilot on day one. Cursor calls it state-of-the-art on CursorBench, opening "a class of long-horizon problems out of reach for earlier models." Cognition measured the top frontier score on FrontierCode — at medium effort. When the three most-watched coding harnesses adopt on day one, that's a case study in itself.
Copy this: if you live in Copilot or Cursor, you don't need the API to try Fable — switch the model picker and re-run last week's hardest task. Sources linked above.
Knowledge work
Hebbia: the 90% wall falls
Finance-AI firm Hebbia: first model past 90% on their core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus — and the top score on their Finance Benchmark for senior-level reasoning.
Copy this: document-heavy analysis with chart and table interpretation is now a Fable-class task; route it accordingly. Source: announcement.
IMC: trading analysis, nearly across the board
Trading firm IMC says Fable "aced their trading-analysis evaluations nearly across the board" — notable because trading evals punish confident wrongness harder than most domains.
Copy this: high-stakes analysis is exactly where Fable's price premium pays — wrong answers cost more than tokens. Source: announcement.
Harvey: serving lawyers on day one
Legal-AI platform Harvey shipped Fable 5 to its users immediately — a domain where errors get noticed by opposing counsel.
Copy this: if a legal-AI vendor trusts it day one, contract-review-shaped workloads are fair game. Source: Harvey's blog.
Notation Capital: four days of physics in 36 hours
"Strongest model on frontier physics research while using a third of the reasoning tokens. In 36 hours it got nearly to where GPT-5.5 landed after four days."
Copy this: token efficiency means the sticker price overstates the real cost — measure completed-task cost, not per-token cost. Source: announcement.
Science (Mythos 5, with safeguards lifted)
Run by vetted partners under the restricted-access program — included because it shows the ceiling of the same underlying model:
- Drug design: ~10× acceleration on parts of the process; Mythos ran the full scientist's loop and 9 of 14 protein targets yielded strong candidates.
- Genomics: a week-long, largely autonomous run assembled single-cell data across 138 species and trained a model that beat a recent Science publication at 1/100th the size.
- Hypothesis generation: scientists preferred Mythos hypotheses ~80% of the time in blinded comparisons; one E. coli mechanism hypothesis was independently corroborated.
Source: announcement; details on our Mythos page.
Builders in the wild, week one
Day one on a real product
sheets.works published a day-one log of putting Fable on their spreadsheet product — the most honest genre of case study: a builder, a real codebase, first contact.
The demos that taught something
- A Mario-style game from one prompt — the single-shot ceiling, on video below.
- A solar-system simulation that predicted a real eclipse — physics from a prompt, checked against the sky.
- Factorio, played autonomously — community long-horizon testing; Anthropic's own version was Pokémon FireRed on vision alone.
Copy this: the demos all share one move — a complete, vivid brief in a single turn. The model does the rest. More on the video charts.
Moral: benchmarks tell you what a model can do; case studies tell you what it's for.