The First Big Run — Post-Mortem & Playbook

The instruction was simple and ambitious: take ~78 sites that looked like generic AI templates and, in a single unattended overnight run, give each one a bespoke redesign, five custom themes with a picker, a 20-video page, a sources page, and an Agents-First score of 90+. The run was set to Fable 5. By morning, about 10 of 78 sites were live, the session limit had been hit, and the weekly cap sat at 41% used. This page is the autopsy.

None of what follows is a complaint about the model. Fable did exactly what it was asked: produce genuinely bespoke, high-craft design, one site at a time. The failure was one of scoping and orchestration — asking a premium, token-hungry process to run 78 times in one night without a budget, a cap, or a model that matched each sub-task. That's a planning problem, and planning problems are fixable.

01 What we set out to do

The spec, captured in NIGHT-PLAN-260610.md, applied to every one of the 78 checked sites:

Bespoke redesign — a unique visual identity per subject, explicitly not the dark-navy / Space Grotesk / card-grid AI look.
5 unique themes + a theme picker — a full CSS-variable theme system, persisted in localStorage, applied across every page.
/videos/ page — top 20 YouTube videos by view count, real IDs scraped (no fabrication), curated.
/sources/ page — 12–20 annotated external references.
Agents First 90–100, with AdSense, GA, promo-bar and JSON-LD preserved verbatim.

Each of those is, on its own, a reasonable ask. Stacked together and multiplied by 78, they describe one of the largest single batch jobs the network has ever attempted — and the prompt treated it as a single overnight task.

02 What actually happened

10 / 78

sites fully live by morning (6 done + 4 finished on recovery)

~231k

average tokens per design subagent

100%

session limit consumed — run killed mid-flight

41%

of the weekly all-models cap, spent on a fraction of the job

The sequence, reconstructed from the transcript:

List-building cost real money before any work began. Six or seven Explore agents read ~175 sites to judge which looked generic (148k, 96k, 112k tokens and more). Necessary, but a heavy down-payment.
Wave 1 ran four design agents at 208k, 251k, 191k and 276k tokens. Four sites, nearly a million tokens.
The session hit its limit mid-run. Fable's per-session ceiling — not just the weekly cap — was exhausted while four heavy agents ran concurrently. Agents died partway through.
Recovery thrashed. The four interrupted sites (austincast, austinfestivalcalendar, austinhangout, austinhomesearches) were relaunched repeatedly. austincast.com alone shows up across three "finish" waves at roughly 128k + 152k + 155k tokens — ~435k tokens for a single site that still wasn't confirmed done.
A coordination file got corrupted. NIGHT-QUEUE.md shows garbled, interleaved text — the signature of an Edit interrupted or racing against itself.

The expensive part — the subagent redesigns — is already spent whether you stop or not. That single sentence from the recovery session is the whole lesson: by the time you notice the budget problem, you've already paid for it.

The measured ledger from usage.html

The cost panel puts hard numbers on it: $177.04 of API-equivalent value (subscription work, not money spent), of which Fable 5 alone was $148.15 — 84%. Opus 4.8 was $27.30, Haiku $1.59, and Sonnet 0% — the cheap tier that should have done the bulk went completely unused. Output volume tells the same story: 1.2M Fable tokens vs 367k Opus vs 22k Haiku. Caching was not the problem — Fable read 42.8M tokens from cache against 100.7k of fresh input (>99% hit rate). The full breakdown, limit gauges, and the tooling's own contributing-factor diagnostics live on the usage ledger.

03 What went right: stepping back gracefully

This is the part the transcript actually documents — and the reason it's worth keeping. The session you're reading is the recovery session; the first run is nested inside it. When the limit hit, the response wasn't panic or brute force. It was a clean, disciplined de-escalation that set a new course. Several things were done right, and they're worth naming as plainly as the mistakes.

The subscription line held

At the limit, the system offered the easy out: /usage-credits to finish what you're working on. Taking it would have meant pay-per-token billing — a direct violation of the network's first rule. Instead the run downshifted Fable 5 → Opus 4.8, exactly the prescribed move (“when usage limits run low, downshift models — never buy extra credits”). The budget ceiling was respected under pressure, which is the only time it matters.

Explicit tightening, stated out loud. The new course was set with one sentence — “resume operations but be more careful to conserve tokens” — and the recovery session immediately changed behavior: agents told to return one-line status, deploy and verify batched into single commands.
Finish what's already paid for, then stop. The recovery correctly identified that the four in-flight sites' design cost was already spent, so completing them (just the cheap push + verify remained) was the high-value move — while launching new waves was not. Pay for nothing extra; bank what's bought.
Stop on a safe seam. The chosen stopping point was between an agent's return and the next wave — never mid-write. That's what kept the finished work intact; the one file that did get corrupted, NIGHT-QUEUE.md, was the exception that proves the rule.
Leave a clean resume state. The other ~68 sites stayed marked PENDING in _night/STATUS-260610.md, droplet backups sat in /root/night-backups/, and the WORKLOG claim stayed accurate — so a later session can pick up exactly where this one stopped, redoing nothing.

In other words: the run mis-scoped the start, but it handled the stop about as well as a stop can be handled. A blown budget that ends in a recoverable, well-documented checkpoint is a far better outcome than one that ends in corrupted sites and no idea what finished. The course-correction is the template; the rest of this page is about not needing it next time.

04 The token math nobody ran first

Here is the calculation that should have happened before the run, not after. It's not complicated — which is exactly why skipping it hurts.

Component	Per site	× 78 sites
Design subagent (bespoke + 5 themes)	~230k	~17.9M
Videos + sources + AF file updates	~60k	~4.7M
Controller (pull / push / verify)	~25k	~2.0M
List-building + retries + thrash	—	~4M+
Estimated total (Opus-equivalent units)	—	~28M+
On Fable (≈2× burn rate)	—	~50–60M effective

The run could never have finished in one week

Roughly 10 sites' worth of work (plus list-building and retries) consumed 41% of the weekly all-models cap. Linear extrapolation puts the full 78-site run at ~300% of a single week's allowance — and that's on Fable, which burns the weekly limit twice as fast as Opus. The job as scoped was a 3-to-4-week project being asked to run in one night. The session limit didn't cause the failure; it just surfaced an impossibility that was baked into the scope from the start.

05 Root causes — what actually went wrong

1. No budget gate before launch

The single biggest miss. A two-minute estimate (sites × tokens-per-site ÷ weekly cap) would have shown the run was 3× too big. The fix is a hard rule: any batch over ~5 sites gets a token estimate and a stated stopping point before the first agent launches.

2. Wrong model for most of the work

Fable's strength is bespoke creative design. But most of the per-site work isn't design — it's mechanical: scraping video JSON into a template, generating a sources list, updating llms.txt and sitemaps, pushing and verifying. Running all of that on Fable paid a 2× premium for templating a cheaper model does just as well.

3. One agent did everything, at full cost, 78 times

Every site reinvented its own theme system from scratch at ~230k tokens. But sites cluster — 15 Austin guides, 11 tech/AI, 9 TV/media. Designing one excellent theme system per cluster and applying it mechanically would have turned ~17.9M tokens of design into maybe 2–3M, with more visual consistency, not less.

4. Concurrency burned the session cap, then restarts burned it again

Four heavy agents running at once drain the rolling session limit fast — and when the session dies mid-write, the restarted agent re-reads everything from zero. That's why one site cost 435k tokens. Interrupt-and-restart is the most expensive possible failure mode because nothing is checkpointed.

5. The cost was premium output at huge context — not cold cache

It's tempting to blame caching, but the ledger says otherwise: Fable read 42.8M tokens from cache against 100.7k of fresh input — a hit rate over 99%. The cache was warm and working. The money went to output volume (1.2M generated tokens of bespoke HTML/CSS) and the tooling's top flag — 78% of usage at >150k context, which is expensive even when cached. Generating premium design output 78 times, each in a large context, is the cost. The fix isn't "cache better"; it's "produce far less premium output by designing once and stamping many," and /clear between sites so context doesn't balloon.

06 How to do this better — the Opus + Fable division of labor

The right shape isn't "use Opus instead of Fable." It's use each model where its cost/strength fits, and template aggressively so you pay the premium once, not 78 times.

Task	Model	Why
Orchestration, budgeting, pull/push/verify	Opus 4.8	Smart, cheaper burn, stays in one warm-cached session
One flagship theme system per cluster	Fable 5	Pay for genuine craft once; ~6–8 clusters, not 78 sites
Apply theme + rewrite pages per site	Sonnet 4.6	Mechanical transform of an existing design; no premium needed
Videos page from JSON, sources, AF files, sitemaps	Haiku / Sonnet	Pure templating — the cheapest model that can read JSON

The structural win: design once, stamp many

Instead of 78 independent 230k-token design sessions, you run ~7 Fable sessions (one bespoke theme system per cluster) during the Fable window, save them as reusable CSS/JS kits, and then let a cheap model stamp each site. The premium creative spend drops by roughly 80%, the network gets more visual coherence within clusters, and the whole job fits inside a sane weekly budget.

Operational guardrails

Estimate, then cap. State a token budget and a site count up front. The Workflow harness supports a hard budget ceiling (budget.remaining()) — loop until it's nearly spent, then stop cleanly instead of getting killed.
Checkpoint per site. Mark each site DONE in the status tracker the instant it verifies live, so a restart resumes the list instead of re-reading finished work.
Serial waves sized to the session limit. Run 2–3 sites, check /usage, stop before the cap — never let the session die mid-write (that's what corrupted NIGHT-QUEUE.md).
One long warm session beats many cold ones. Keep the controller in a single session so prompt caching serves the repeated spec at 1/10 price.
Agents return one line, not a report. The recovery session adopted this — files-written + counts, nothing more. Do it from the start.

The recipe, start to finish

Budget gate. Sites × ~250k ÷ weekly cap. If it's >1 week, split it into weekly batches before launching anything.
Cluster the sites (Austin guides, tech/AI, TV/media, …) — ~7 groups, not 78 individuals.
Fable, once per cluster. Generate one flagship bespoke design + 5-theme system per cluster and save it as a reusable CSS/JS kit. This is the only premium spend.
Sonnet, per site. Stamp the cluster kit onto each site and rewrite its pages — mechanical transform, no premium needed.
Haiku, per site. Build the videos page from the scraped JSON, the sources page, and the AF files. Pure templating.
Opus controller, one warm session. Pull, push, verify, checkpoint each site DONE, watch /usage, stop on a clean seam before the cap.

Why it's dramatically cheaper: this run spent $148 of Fable on ~10 sites — extrapolated, the all-Fable approach is north of $1,100 API-equivalent for 78 sites and physically can't fit the weekly cap. The recipe pays Fable's premium ~7 times total (once per cluster) and runs the other ~71 stamps on Sonnet/Haiku — plausibly an 80%+ reduction in premium spend, with more visual consistency inside each cluster, and it fits inside a single week's allowance.

07 The Fable deadline — and why it changes everything June 22

Fable 5 is included in the plan only through June 22, 2026.

After that date, continuing on Fable means switching to usage credits — pay-per-token billing.

This is not a minor scheduling note. The network runs on one hard rule above all others: subscription only — never API billing. The flat plan fee is the budget ceiling. Usage credits are exactly the pay-per-token model that rule exists to forbid. So the practical reading is blunt:

After June 22, Fable is off the table for big batch work

Not because it stops being good, but because using it at scale would mean buying credits — which the network's number-one rule prohibits. For large projects, the daily default returns to Opus 4.8, with Sonnet and Haiku for the mechanical bulk.

What that means for a project like this one:

Use the remaining Fable window for what only Fable should do. Between now and June 22, the highest-value Fable spend is building the reusable assets — the cluster theme systems, the flagship designs, the templates. Bank the craft now, while it's included.
Don't try to brute-force all 78 sites on Fable before the deadline. The token math says you can't anyway — it's 3× a weekly cap. Trying would just burn the cap and stall, exactly as the first run did.
Make the post-deadline plan Opus-shaped. Once the templates exist, applying them is mechanical work Opus and Sonnet handle comfortably, on the subscription, indefinitely. The deadline only bites if you've left the creative work undone when it arrives.
Watch the weekly cap as the real constraint. Even within the Fable window, the limiting resource isn't the calendar — it's the weekly allowance. Pace the cluster-design sessions across the days remaining rather than in one overnight push.

08 Sequence the rest by revenue potential

Ten sites are live and verified. The scorecard below adds a rev score (11–99) — a heuristic read on each site's earning potential from niche ad CPC, commercial intent and traffic, not measured earnings (those need the AdSense reporting API). The column isn't there to grade finished work; it's there to sequence the ~68 sites still pending so the highest-earning ones get rebuilt first.

Site	rev	Status	Sitemap	AF installed
austinhomesearches.com	91	✓ live · 20 vids · 5 themes · AF · ads	11	Jun 11 03:42
askemai.com	83	✓ live · 20 vids · 5 themes · AF · ads	20	Jun 10 18:23
wholetexas.com	67	✓ live · 20 vids · 5 themes · AF · ads	10	Jun 10 18:13
austin.com.co	64	✓ live · 20 vids · 5 themes · AF · ads	10	Jun 10 18:55
austincast.com	56	✓ live · 20 vids · 5 themes · AF · ads	10	Jun 11 02:58
austinfestivalcalendar.com	52	✓ live · 20 vids · 5 themes · AF · ads	10	Jun 11 03:01
austincoffeeshowdown.com	45	✓ live · 20 vids · 5 themes · AF · ads	11	Jun 10 18:23
austinlifestyles.com	41	✓ live · 20 vids · 5 themes · AF · ads	20	Jun 10 19:05
austinhangout.com	38	✓ live · 20 vids · 5 themes · AF · ads	12	Jun 11 03:00
austinrepeater.com	22	✓ live · 20 vids · 5 themes · AF · ads	11	Jun 10 18:13

rev is a heuristic (niche CPC × commercial intent × traffic reach), 11 lowest → 99 highest. Real per-site earnings require the AdSense Management API; see the usage ledger for what's measurable today.

Let revenue potential pick the model, too

The rev score plugs straight into the playbook's model mix. A 91-rev real-estate site earns back a bespoke Fable design; a 22-rev hobby site does not — it should get the cheap Sonnet/Haiku stamp of a cluster kit, never premium Fable time. Spend craft where it pays back: high-rev sites justify the premium, low-rev sites ride the template. Sequencing the remaining ~68 high-rev-first means the earning sites are live soonest and the expensive model is aimed only where the return justifies it. Front-load the high-CPC niches — real estate, AI/SaaS tools, travel and relocation — and batch the low-CPC hobby and hyper-local sites onto cheap template stamps at the end.

09 The one-paragraph version

The first big run mis-scoped its start: a premium, design-grade process was asked to run 78 times in one night with no budget, no cap, the wrong model for most of the work, and no checkpoints — so it hit the session limit, thrashed on restarts, and finished an eighth of the job. But it handled the stop well — it held the subscription line instead of buying credits, downshifted Fable → Opus, finished only the work already paid for, and left a clean, resumable checkpoint. The fix for next time is to estimate first, design each cluster's look once on Fable while it's still included, stamp the individual sites with cheaper models, checkpoint every site, and pace the whole thing to the weekly cap instead of the clock. And because Fable leaves the plan on June 22, the smart move is to spend the remaining window banking reusable design, then run the mechanical rollout on Opus — on the subscription, where it belongs.

This analysis is a companion to firstfailure.html, the raw session transcript it draws on. Figures marked “~” are estimates reconstructed from per-agent token counts in that transcript and should be read as order-of-magnitude, not exact.