What is Claude Fable 5 and how does it differ from Claude Opus?

Claude Fable 5 (model ID claude-fable-5) is Anthropic's most intelligent generally available model and is not an Opus update — it launches a new tier called Mythos-class that sits above Opus in capability. It features a 1-million-token context window by default, up to 128K output tokens per request, and the same tokenizer as Opus 4.8. Its biggest advantage over Opus is long-horizon agentic work, where a single turn on a hard task can run many minutes while the model reflects on and validates its own output.

How much does Claude Fable 5 cost compared to Opus 4.8, and what should I budget for agentic tasks?

Fable 5 costs $10 per million input tokens and $50 per million output tokens — exactly twice Opus 4.8's $5 input and $25 output rates. However, because thinking is always on and the model reasons more deeply per step, output volume tends to grow beyond a simple 2x multiplier. The post recommends budgeting 2x to 3x per task in practice and measuring your own workloads before committing a pipeline to Fable 5.

What API parameters are removed in Fable 5 and why do I get HTTP 400 errors after swapping the model string?

Fable 5 removes temperature, top_p, and top_k (sampling parameters), the thinking parameter (always-on and non-configurable), and assistant prefill support. If your org is configured for zero data retention, every request returns 400 regardless of payload validity — Fable 5 requires 30-day data retention. The fix is to remove these parameters, switch JSON-forcing to structured outputs via output_config format, and check your org's retention settings before debugging the request body.

How does always-on thinking affect the streaming UX, and how can I show progress to users?

Because Fable 5 always thinks and never returns the raw chain of thought, the default behavior is thinking blocks streaming with empty text — causing a potentially minutes-long silent pause before the first visible token. The fix is to set the display option on the thinking parameter to summarized, which causes the API to stream a readable reasoning summary so users see progress. The post also advises lifting HTTP client timeouts and designing asynchronous check-in flows, since a single xhigh-effort request can legitimately run 10 to 15 minutes.

When should I use Fable 5 versus Opus 4.8, Sonnet 4.6, or Haiku 4.5?

The post recommends Fable 5 only for the hardest 5% of tasks: multi-hour autonomous refactors, overnight agentic runs, and deep debugging where Opus loops. Opus 4.8 remains the daily driver for complex coding and reasoning at half the price. Sonnet 4.6 suits high-volume user-facing features like summarization and chat at $3/$15 per MTok, while Haiku 4.5 covers routing, tagging, and guardrail checks at $1/$5 per MTok.

Claude Fable 5: What Developers Need to Know

On June 9, 2026, Anthropic shipped Claude Fable 5, the first model of the Claude 5 family and the first generally available member of a new Mythos-class tier that sits above Claude Opus in capability. I have been running it inside Claude Code since launch week, and it is the first model release in a while where the API surface changed enough that a careless model-string swap will break your production code with HTTP 400s.

This post is the guide I wish I had on day one: who should use Fable 5 versus Opus, Sonnet, and Haiku, what the cost math actually looks like at 10 dollars input and 50 dollars output per million tokens, which request parameters now return errors, and what always-on thinking means for your streaming UX.

Everything here is verified against Anthropic's launch announcement and the official model documentation, both linked in the sources box below. No leaked benchmarks, no speculation.

What Exactly Is Claude Fable 5?

Claude Fable 5 (model ID claude-fable-5) is Anthropic's most intelligent generally available model. It is not an Opus update: it launches a new tier called Mythos-class, positioned above Opus. The headline specs are a 1-million-token context window as the default, up to 128K output tokens per request, and the same tokenizer as Opus 4.8, so token counts are roughly unchanged if you are migrating from Opus 4.7 or 4.8.

Anthropic's launch post says it plainly: Fable 5's capabilities exceed those of any model they have ever made generally available. Where it pulls away from Opus is long-horizon agentic work: single turns on hard tasks can run many minutes, and it reflects on and validates its own work at the highest effort settings.

What about Claude Mythos 5?

Claude Mythos 5 (claude-mythos-5) is the same underlying model offered without certain dual-use safety measures, available only to approved organizations through Project Glasswing, such as authorized cybersecurity partners and selected biomedical researchers. For everyone else, Fable 5 is the model. Same pricing, same API surface.

Pricing: The Cost Math vs Opus, Sonnet, and Haiku

Fable 5 costs exactly twice Opus 4.8 on both sides of the ledger. Here is the current first-party API lineup per million tokens:

Model	Input per MTok	Output per MTok	Context window
Claude Fable 5	10 USD	50 USD	1M tokens, 128K max output
Claude Opus 4.8	5 USD	25 USD	1M tokens, 128K max output
Claude Sonnet 4.6	3 USD	15 USD	1M tokens, 64K max output
Claude Haiku 4.5	1 USD	5 USD	200K tokens, 64K max output

The 2x multiplier understates the real delta because thinking is always on and Fable 5 thinks a lot. A long agentic coding session that consumed, say, 2 million input and 300K output tokens on Opus 4.8 (about 17.50 USD) will not just double on Fable 5 — the model also tends to reason more deeply per step, so output volume grows too. Budget for 2x to 3x per task in practice and measure your own workloads before committing a pipeline to it.

The flip side: on genuinely hard long-horizon tasks, Fable 5 often finishes in one turn what took Opus several review-and-retry cycles. When you count engineer time and failed-run tokens, the per-outcome cost can land lower. That is the honest framing: per-token it is the most expensive model Anthropic sells widely; per-completed-task it can be the cheapest.

Which Model Should You Use? A Practical Split

After a few weeks of routing different workloads across the family, this is the split I have settled on for my own projects — the ERP backends, this portfolio, and the AI features I ship on the side:

Fable 5: the hardest 5 percent

Multi-hour autonomous refactors, overnight agentic runs, migrations across dozens of files, deep debugging where Opus loops. Give it the full task spec up front and let it run at high or xhigh effort.

Opus 4.8: the daily driver

Day-to-day coding, code review, complex reasoning at half the price. It remains Anthropic's recommended default for complex tasks, and it is the official fallback target when Fable 5's classifiers decline a request.

Sonnet 4.6: high-volume product features

User-facing AI features where latency and unit economics matter: summarization, extraction, chat. At 3 and 15 dollars per MTok with a 1M context window, it is the workhorse for production features.

Haiku 4.5: classification and glue

Routing, tagging, guardrail checks, autocomplete-grade tasks. At 1 and 5 dollars per MTok it is cheap enough to run on every request in a pipeline.

API Migration Gotchas: Where the 400s Come From

Fable 5 removes more request parameters than any previous Claude release. If you lift an Opus 4.6-era request body and only swap the model string, expect HTTP 400 invalid_request_error. The exact failure modes:

// All three of these return HTTP 400 on claude-fable-5
await client.messages.create({
  model: "claude-fable-5",
  max_tokens: 16000,
  temperature: 0.7,                                  // 400 — sampling params removed
  thinking: { type: "disabled" },                    // 400 — thinking cannot be disabled
  // thinking: { type: "enabled", budget_tokens: N } // 400 — budget_tokens removed
  messages: [{ role: "user", content: "..." }],
})

// The correct shape: omit thinking entirely, control depth with effort
await client.messages.create({
  model: "claude-fable-5",
  max_tokens: 16000,
  output_config: { effort: "high" }, // low | medium | high | xhigh | max
  messages: [{ role: "user", content: "..." }],
})

Sampling parameters are gone. temperature, top_p, and top_k all return 400. Prompting is now the only way to steer output variance.
The thinking parameter must be omitted entirely. Sending type disabled or a budget_tokens value both return 400 — thinking is always on and cannot be configured off.
Assistant prefill is not supported. If you were prefilling the assistant turn to force JSON, switch to structured outputs via output_config format.
Fable 5 requires 30-day data retention. If your org is configured for zero data retention, every single request returns 400 even with a perfectly valid payload — check retention settings before debugging your request body.
Token counts are roughly unchanged from Opus 4.7 and 4.8 because the tokenizer is the same. Coming from Opus 4.6, Sonnet, or Haiku, re-baseline with the count_tokens endpoint — the newer tokenizer produces roughly 30 percent more tokens on the same text.

Always-On Thinking and Your Streaming UX

The biggest conceptual shift: Fable 5 always thinks. There is no toggle, and the raw chain of thought is never returned to you. By default, thinking blocks stream back with empty text, which means your UI shows a long, silent pause before the first visible token — on a hard task, potentially minutes of nothing.

The fix is the display option on the thinking parameter: request summarized and the API streams a readable summary of the reasoning so users see progress. Depth is controlled separately with the effort parameter inside output_config.

// Streaming UX: request a readable summary of the reasoning,
// otherwise thinking blocks arrive with empty text and your UI
// shows a long silent pause before the first visible token.
const stream = client.messages.stream({
  model: "claude-fable-5",
  max_tokens: 64000,
  thinking: { type: "adaptive", display: "summarized" },
  output_config: { effort: "xhigh" },
  messages: [{ role: "user", content: "Refactor the billing module..." }],
})

Plan your timeouts around this too. A single Fable 5 request at xhigh effort on a real engineering task can legitimately run 10 to 15 minutes. Stream everything, lift your HTTP client timeouts, and design check-in-asynchronously flows rather than blocking a request-response cycle on one giant turn.

The Effort Parameter: Your Only Depth Dial

With budget_tokens gone, output_config effort is how you trade depth for cost and latency. Five levels:

low — routine work, sub-agents, latency-sensitive paths. Notably, low effort on Fable 5 often beats max effort on older models.
medium — cost-conscious production tasks that still need real reasoning.
high — the sensible default for intelligence-sensitive work.
xhigh — coding and agentic workloads; this is what Claude Code defaults to.
max — correctness-critical, latency-insensitive runs where you want maximum self-verification.

Refusals, Safety Classifiers, and Server-Side Fallbacks

Fable 5 runs safety classifiers focused on research biology and most cybersecurity content. A declined request comes back as HTTP 200 with stop_reason refusal — not an HTTP error — so code that reads content unconditionally will break on refused requests.

If you work anywhere near security tooling, as I do, benign requests can occasionally trip false positives. Anthropic's answer is a beta server-side fallbacks parameter: name claude-opus-4-8 as a fallback and the API retries the declined request on Opus in the same round trip.

// Beta: retry on Opus 4.8 server-side, in one round trip
const response = await client.beta.messages.create({
  model: "claude-fable-5",
  max_tokens: 16000,
  betas: ["server-side-fallback-2026-06-01"],
  fallbacks: [{ model: "claude-opus-4-8" }],
  messages: [{ role: "user", content: "..." }],
})

// Always branch on stop_reason before reading content
if (response.stop_reason === "refusal") {
  // pre-output refusal: empty content, not billed
  // mid-stream refusal: partial output billed — discard it
}

A pre-output refusal has an empty content array and is not billed at all. A mid-stream refusal bills the already-streamed output, and you should discard the partial response rather than treating it as complete.

Behavioral Shifts Worth Re-Tuning For

Beyond the API mechanics, Fable 5 behaves differently enough that prompts tuned for Opus deserve a second look:

It excels at parallel sub-agent delegation. Prior-model guardrails that suppress sub-agent spawning actively hurt it — tell it when delegation is desirable and let it fan out.
It benefits from a memory surface. Even a plain markdown file it can write learnings to measurably improves multi-session work. Tell it where the file lives and what format to use.
It prefers the full task spec up front. One well-specified opening turn beats a drip of clarifications — ambiguous, progressive prompting wastes tokens and reduces quality.
Over-prescriptive prompts written for older models can reduce its output quality. A/B your workloads with the step-by-step scaffolding removed; state the goal and constraints, not the procedure.

Migration Checklist

Swap the model string to claude-fable-5 and delete temperature, top_p, top_k, and every thinking configuration from the request body.
Replace assistant prefills with structured outputs or system prompt instructions.
Confirm your org's data retention is at least 30 days — zero-data-retention orgs cannot use Fable 5 at all.
Add stop_reason refusal handling before reading content, and consider the server-side fallbacks beta with Opus 4.8 as the target.
Set thinking display to summarized if users watch the stream, sweep effort levels including low and medium for routine work, and re-baseline cost on your own traffic.

Sources and further reading

The Bottom Line

Fable 5 is not a drop-in upgrade and is not priced like one. Treat it as a new tier: keep Opus 4.8 as your default, route your genuinely hardest long-horizon work to Fable 5, and do the small amount of API homework — no sampling params, no thinking config, refusal handling, retention requirements — before anything touches production.

My rule of thumb after launch month: if a task fits in one sitting, Opus 4.8. If I would hand it to a contractor for a day and review the result tomorrow, that is a Fable 5 job — full spec up front, xhigh effort, and let it run.

Frequently Asked Questions

Claude Fable 5: What Developers Need to Know

Frequently Asked Questions

Claude Fable 5: What Developers Need to Know

What Exactly Is Claude Fable 5?

Pricing: The Cost Math vs Opus, Sonnet, and Haiku

Which Model Should You Use? A Practical Split

API Migration Gotchas: Where the 400s Come From

Always-On Thinking and Your Streaming UX

The Effort Parameter: Your Only Depth Dial

Refusals, Safety Classifiers, and Server-Side Fallbacks

Behavioral Shifts Worth Re-Tuning For

Migration Checklist

The Bottom Line

What Exactly Is Claude Fable 5?

Pricing: The Cost Math vs Opus, Sonnet, and Haiku

Which Model Should You Use? A Practical Split

API Migration Gotchas: Where the 400s Come From

Always-On Thinking and Your Streaming UX

The Effort Parameter: Your Only Depth Dial

Refusals, Safety Classifiers, and Server-Side Fallbacks

Behavioral Shifts Worth Re-Tuning For

Migration Checklist

The Bottom Line