Day two brought the first non-Anthropic data points. BenchLM places Fable 5 at #2 of 123 models on its provisional leaderboard (96/100 overall), with top-tier coding and agentic-tool-use scores — and a notably weaker #18 placement on multimodal/grounded tasks (79), the first independent wrinkle in the vision story. Reported SWE-Bench Verified: 95.0%.
Digital Today reports sustained 12-hour autonomous runs — consistent with the long-horizon claims, and the kind of duration that matters more than any single score for agentic work. Tom's Hardware and VentureBeat round out the mainstream technical coverage.
Caveats apply: provisional leaderboards move, and methodologies vary. These go on the evidence wall as independent-but-early. The week-long hands-on below remains the most thorough third-party test published so far.