Claude Fable 5: What Developers Need to Know

Photo by Google DeepMind

Photo by Google DeepMind
On June 9, 2026, Anthropic shipped Claude Fable 5, the first model of the Claude 5 family and the first generally available member of a new Mythos-class tier that sits above Claude Opus in capability. I have been running it inside Claude Code since launch week, and it is the first model release in a while where the API surface changed enough that a careless model-string swap will break your production code with HTTP 400s.
This post is the guide I wish I had on day one: who should use Fable 5 versus Opus, Sonnet, and Haiku, what the cost math actually looks like at 10 dollars input and 50 dollars output per million tokens, which request parameters now return errors, and what always-on thinking means for your streaming UX.
Everything here is verified against Anthropic's launch announcement and the official model documentation, both linked in the sources box below. No leaked benchmarks, no speculation.
Claude Fable 5 (model ID claude-fable-5) is Anthropic's most intelligent generally available model. It is not an Opus update: it launches a new tier called Mythos-class, positioned above Opus. The headline specs are a 1-million-token context window as the default, up to 128K output tokens per request, and the same tokenizer as Opus 4.8, so token counts are roughly unchanged if you are migrating from Opus 4.7 or 4.8.
Anthropic's launch post says it plainly: Fable 5's capabilities exceed those of any model they have ever made generally available. Where it pulls away from Opus is long-horizon agentic work: single turns on hard tasks can run many minutes, and it reflects on and validates its own work at the highest effort settings.
What about Claude Mythos 5?
Claude Mythos 5 (claude-mythos-5) is the same underlying model offered without certain dual-use safety measures, available only to approved organizations through Project Glasswing, such as authorized cybersecurity partners and selected biomedical researchers. For everyone else, Fable 5 is the model. Same pricing, same API surface.
Fable 5 costs exactly twice Opus 4.8 on both sides of the ledger. Here is the current first-party API lineup per million tokens:
| Model | Input per MTok | Output per MTok | Context window |
|---|---|---|---|
| Claude Fable 5 | 10 USD | 50 USD | 1M tokens, 128K max output |
| Claude Opus 4.8 | 5 USD | 25 USD | 1M tokens, 128K max output |
| Claude Sonnet 4.6 | 3 USD | 15 USD | 1M tokens, 64K max output |
| Claude Haiku 4.5 | 1 USD | 5 USD | 200K tokens, 64K max output |
The 2x multiplier understates the real delta because thinking is always on and Fable 5 thinks a lot. A long agentic coding session that consumed, say, 2 million input and 300K output tokens on Opus 4.8 (about 17.50 USD) will not just double on Fable 5 — the model also tends to reason more deeply per step, so output volume grows too. Budget for 2x to 3x per task in practice and measure your own workloads before committing a pipeline to it.
The flip side: on genuinely hard long-horizon tasks, Fable 5 often finishes in one turn what took Opus several review-and-retry cycles. When you count engineer time and failed-run tokens, the per-outcome cost can land lower. That is the honest framing: per-token it is the most expensive model Anthropic sells widely; per-completed-task it can be the cheapest.
After a few weeks of routing different workloads across the family, this is the split I have settled on for my own projects — the ERP backends, this portfolio, and the AI features I ship on the side:
Fable 5: the hardest 5 percent
Multi-hour autonomous refactors, overnight agentic runs, migrations across dozens of files, deep debugging where Opus loops. Give it the full task spec up front and let it run at high or xhigh effort.
Opus 4.8: the daily driver
Day-to-day coding, code review, complex reasoning at half the price. It remains Anthropic's recommended default for complex tasks, and it is the official fallback target when Fable 5's classifiers decline a request.
Sonnet 4.6: high-volume product features
User-facing AI features where latency and unit economics matter: summarization, extraction, chat. At 3 and 15 dollars per MTok with a 1M context window, it is the workhorse for production features.
Haiku 4.5: classification and glue
Routing, tagging, guardrail checks, autocomplete-grade tasks. At 1 and 5 dollars per MTok it is cheap enough to run on every request in a pipeline.
Fable 5 removes more request parameters than any previous Claude release. If you lift an Opus 4.6-era request body and only swap the model string, expect HTTP 400 invalid_request_error. The exact failure modes:
// All three of these return HTTP 400 on claude-fable-5
await client.messages.create({
model: "claude-fable-5",
max_tokens: 16000,
temperature: 0.7, // 400 — sampling params removed
thinking: { type: "disabled" }, // 400 — thinking cannot be disabled
// thinking: { type: "enabled", budget_tokens: N } // 400 — budget_tokens removed
messages: [{ role: "user", content: "..." }],
})
// The correct shape: omit thinking entirely, control depth with effort
await client.messages.create({
model: "claude-fable-5",
max_tokens: 16000,
output_config: { effort: "high" }, // low | medium | high | xhigh | max
messages: [{ role: "user", content: "..." }],
})The biggest conceptual shift: Fable 5 always thinks. There is no toggle, and the raw chain of thought is never returned to you. By default, thinking blocks stream back with empty text, which means your UI shows a long, silent pause before the first visible token — on a hard task, potentially minutes of nothing.
The fix is the display option on the thinking parameter: request summarized and the API streams a readable summary of the reasoning so users see progress. Depth is controlled separately with the effort parameter inside output_config.
// Streaming UX: request a readable summary of the reasoning,
// otherwise thinking blocks arrive with empty text and your UI
// shows a long silent pause before the first visible token.
const stream = client.messages.stream({
model: "claude-fable-5",
max_tokens: 64000,
thinking: { type: "adaptive", display: "summarized" },
output_config: { effort: "xhigh" },
messages: [{ role: "user", content: "Refactor the billing module..." }],
})Plan your timeouts around this too. A single Fable 5 request at xhigh effort on a real engineering task can legitimately run 10 to 15 minutes. Stream everything, lift your HTTP client timeouts, and design check-in-asynchronously flows rather than blocking a request-response cycle on one giant turn.
With budget_tokens gone, output_config effort is how you trade depth for cost and latency. Five levels:
Fable 5 runs safety classifiers focused on research biology and most cybersecurity content. A declined request comes back as HTTP 200 with stop_reason refusal — not an HTTP error — so code that reads content unconditionally will break on refused requests.
If you work anywhere near security tooling, as I do, benign requests can occasionally trip false positives. Anthropic's answer is a beta server-side fallbacks parameter: name claude-opus-4-8 as a fallback and the API retries the declined request on Opus in the same round trip.
// Beta: retry on Opus 4.8 server-side, in one round trip
const response = await client.beta.messages.create({
model: "claude-fable-5",
max_tokens: 16000,
betas: ["server-side-fallback-2026-06-01"],
fallbacks: [{ model: "claude-opus-4-8" }],
messages: [{ role: "user", content: "..." }],
})
// Always branch on stop_reason before reading content
if (response.stop_reason === "refusal") {
// pre-output refusal: empty content, not billed
// mid-stream refusal: partial output billed — discard it
}A pre-output refusal has an empty content array and is not billed at all. A mid-stream refusal bills the already-streamed output, and you should discard the partial response rather than treating it as complete.
Beyond the API mechanics, Fable 5 behaves differently enough that prompts tuned for Opus deserve a second look:
Sources and further reading
Fable 5 is not a drop-in upgrade and is not priced like one. Treat it as a new tier: keep Opus 4.8 as your default, route your genuinely hardest long-horizon work to Fable 5, and do the small amount of API homework — no sampling params, no thinking config, refusal handling, retention requirements — before anything touches production.
My rule of thumb after launch month: if a task fits in one sitting, Opus 4.8. If I would hand it to a contractor for a day and review the result tomorrow, that is a Fable 5 job — full spec up front, xhigh effort, and let it run.