The evidence wall

Every claim, with its receipt

All launch-day numbers in one place, each traceable to a source. Third-party evals get added as they publish — claims without receipts don't get on the wall.

Benchmark table comparing Claude Fable and Mythos to other leading models
The master table from Anthropic's announcement — Fable and Mythos against other frontier models.

Software engineering

ClaimSource
50-million-line Ruby migration completed in one day (team scoped two months)Stripe, via announcement
Highest frontier score on FrontierCode — at medium effort (chart, chart 2)Cognition
"State of the art model on CursorBench"Michael Truell, Cursor
Long-horizon autonomy "exceeded previous benchmarks"Mario Rodriguez, GitHub
Highest on ViBench end-to-end vibe-coding, "nearly saturating base use cases"Michele Catasta, Vibe

Knowledge work & research

ClaimSource
First past 90% on core analytics benchmark — 10 points over Opus; top of Finance BenchmarkIzzy Miller, Hebbia
"Aced trading-analysis evaluations nearly across the board"IMC
Strongest on frontier physics at a third of the reasoning tokens; 36 hours ≈ GPT-5.5's four daysMatthew Pines, Notation Capital

Vision, memory, autonomy

ClaimSource
Completed Pokémon FireRed vision-only with a minimal harnessAnnouncement
Rebuilds web-app source from screenshots; extracts precise values from scientific figuresAnnouncement
With file memory in Slay the Spire: improved 3× more than Opus 4.8; reached final act 3× more oftenAnnouncement

Mythos research results

Run with safeguards lifted, by vetted partners — context on the Mythos page:

ClaimSource
~10× acceleration on aspects of drug design; 9 of 14 protein targets yielded strong candidates (chart)Announcement
Hypotheses preferred ~80% over Opus-class in blinded comparisons; one corroborated independentlyAnnouncement
Genomics model beat a recent Science publication at 1/100th the size, from a week-long autonomous runAnnouncement
Beat dedicated protein language models on AAV prediction (chart)Announcement
Reading the wall: everything above is launch-day material — Anthropic's own numbers and partner quotes from the announcement. Independent third-party evals get their own section here as they publish. If a claim has no receipt, it doesn't go up.