Case studies

What people are actually doing with Fable

Not benchmarks — work. Every entry is documented and sourced; each ends with the part you can copy.

Software engineering

Stripe: five months into days

The launch's flagship story. In a 50-million-line Ruby codebase, Fable finished a migration in one day that a team had scoped at two-plus months; Stripe says Fable compressed months of engineering into days overall.

Copy this: migrations are Fable's natural habitat — mechanical-but-judgment-laden work at scale, specified once up front. See the whole-task-brief pattern. Source: announcement.

Every: 91/100 on a human-grade exam

Publication-and-products studio Every runs a "Senior Engineer" internal coding exam. Fable 5 scored 91 of 100 — brushing the range of their actual human engineers. Opus 4.8, the previous flagship, scored 63. That 28-point jump in one generation is the clearest single number for "what changed."

Copy this: build your own senior-engineer exam from your real tickets — it tells you more than any public benchmark. Source: AI+ Founders coverage of Every's results.

GitLab: multi-day goal-directed runs

GitLab reports Fable 5 sustains multi-day, goal-directed runs with strong instruction adherence — corroborating the 12-hour-run reports from independent testing and moving "long-horizon" from marketing word to operating parameter.

Copy this: for runs past a few hours, pair a memory file with a task budget — notes keep it coherent, the budget keeps it honest. Sources: GitLab via launch coverage; Digital Today.

The tooling vote: Copilot, Cursor, Cognition

Fable 5 went generally available in GitHub Copilot on day one. Cursor calls it state-of-the-art on CursorBench, opening "a class of long-horizon problems out of reach for earlier models." Cognition measured the top frontier score on FrontierCode — at medium effort. When the three most-watched coding harnesses adopt on day one, that's a case study in itself.

Copy this: if you live in Copilot or Cursor, you don't need the API to try Fable — switch the model picker and re-run last week's hardest task. Sources linked above.

Knowledge work

Hebbia: the 90% wall falls

Finance-AI firm Hebbia: first model past 90% on their core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus — and the top score on their Finance Benchmark for senior-level reasoning.

Copy this: document-heavy analysis with chart and table interpretation is now a Fable-class task; route it accordingly. Source: announcement.

IMC: trading analysis, nearly across the board

Trading firm IMC says Fable "aced their trading-analysis evaluations nearly across the board" — notable because trading evals punish confident wrongness harder than most domains.

Copy this: high-stakes analysis is exactly where Fable's price premium pays — wrong answers cost more than tokens. Source: announcement.

Harvey: serving lawyers on day one

Legal-AI platform Harvey shipped Fable 5 to its users immediately — a domain where errors get noticed by opposing counsel.

Copy this: if a legal-AI vendor trusts it day one, contract-review-shaped workloads are fair game. Source: Harvey's blog.

Notation Capital: four days of physics in 36 hours

"Strongest model on frontier physics research while using a third of the reasoning tokens. In 36 hours it got nearly to where GPT-5.5 landed after four days."

Copy this: token efficiency means the sticker price overstates the real cost — measure completed-task cost, not per-token cost. Source: announcement.

Science (Mythos 5, with safeguards lifted)

Run by vetted partners under the restricted-access program — included because it shows the ceiling of the same underlying model:

  • Drug design: ~10× acceleration on parts of the process; Mythos ran the full scientist's loop and 9 of 14 protein targets yielded strong candidates.
  • Genomics: a week-long, largely autonomous run assembled single-cell data across 138 species and trained a model that beat a recent Science publication at 1/100th the size.
  • Hypothesis generation: scientists preferred Mythos hypotheses ~80% of the time in blinded comparisons; one E. coli mechanism hypothesis was independently corroborated.

Source: announcement; details on our Mythos page.

Builders in the wild, week one

Day one on a real product

sheets.works published a day-one log of putting Fable on their spreadsheet product — the most honest genre of case study: a builder, a real codebase, first contact.

The demos that taught something

  • A Mario-style game from one prompt — the single-shot ceiling, on video below.
  • A solar-system simulation that predicted a real eclipse — physics from a prompt, checked against the sky.
  • Factorio, played autonomously — community long-horizon testing; Anthropic's own version was Pokémon FireRed on vision alone.

Copy this: the demos all share one move — a complete, vivid brief in a single turn. The model does the rest. More on the video charts.

Send us yours: shipped something real with Fable 5? If it's documented and verifiable, it belongs here — the bar is receipts, same as the evidence wall.

Moral: benchmarks tell you what a model can do; case studies tell you what it's for.