Power Userintermediate15 min readLast updated June 11, 2026

Claude Fable 5 Shipped. Here's What Changes in Claude Code.

Justin Bartak

Founder & Chief AI Officer, Orbyt

Building AI-native platforms for $383M+ in enterprise value

A working engineer's read on Claude Fable 5, released June 9, 2026. What the first Mythos-class model changes inside Claude Code, what $10/$50 plus a new tokenizer does to your real costs, and what happened when I ran its multi-agent workflows on my production SaaS this week. Free on Pro and Max plans through June 22.

TL;DR: Anthropic shipped Claude Fable 5 on June 9, 2026. Model ID claude-fable-5, first of the Claude 5 family and a new Mythos-class tier above Opus. $10 per million input tokens, $50 output, 1M token context window. I have run it inside Claude Code on my production SaaS since launch day. The headline is that it changes how you specify work, not just how smart the answers are. Free on Pro, Max, Team, and seat-based Enterprise plans through June 22.

I have been building Orbyt with Claude Code as my primary tool for the four months and 1,647 logged hours since its first commit in February 2026. This is not a benchmarks roundup. It is a working engineer's Claude Fable 5 review: the first generally available Mythos-class model run against a production repo. I will not print a number I cannot trace.

What we know on day one

The Claude Fable 5 release date is June 9, 2026. As of June 11, 2026, here is the verifiable record:

Released June 9, 2026, per the official announcement at anthropic.com/news/claude-fable-5-mythos-5 and the platform docs.
Model ID claude-fable-5. First of the new Claude 5 family, in a Mythos-class tier above Claude Opus. Anthropic's framing from the announcement: "a Mythos-class model that we've made safe for general use" and "state-of-the-art on nearly all tested benchmarks of AI capability." Their claim. I am quoting it, not measuring it.
Pricing: $10 per million input tokens, $50 per million output. Opus 4.8 is $5 and $25 for comparison. 1M token context window, and the maximum is also the default. 128K max output tokens.
Claude Mythos 5 (claude-mythos-5) is the same model, same pricing, same API surface, available only through Project Glasswing, successor to the invitation-only Mythos Preview. Fable 5 is the GA version with additional safety measures for dual-use capabilities.
GA platforms at launch: the Claude API, Claude Platform on AWS, Bedrock, Vertex AI, and Microsoft Foundry, plus every Claude Code surface.
Free window: included on Pro, Max, Team, and seat-based Enterprise plans through June 22. From June 23 it requires usage credits. TechCrunch reports Anthropic plans to restore it to subscriptions later.
Requires 30-day data retention. Not available under zero-data-retention agreements.

Same rule as my Opus 4.7 day-one read: I do not have benchmark numbers and I will not invent them.

What Fable 5 changes inside Claude Code

Fable 5 is not the default model on any account type. System defaults stay Opus 4.8 on Max, Team Premium, Enterprise pay-as-you-go, and API accounts, and Sonnet 4.6 on Pro, Team Standard, and Enterprise seats. Select it with /model fable, which saves it as your user default, or the best alias, which resolves to Fable 5 where your org has access and otherwise to the latest Opus.

Two gates first. You need Claude Code v2.1.170 or later, and zero-data-retention orgs see it omitted or disabled.

Thinking cannot be turned off. The session toggle, alwaysThinkingEnabled, and MAX_THINKING_TOKENS=0 have no effect; Fable 5 decides per step how much to think based on effort. On the API, omit the thinking parameter; an explicit disabled returns a 400.

One correction, because the launch-week chatter keeps getting it wrong: the default effort on Fable 5 in Claude Code is high, not xhigh. xhigh was the Opus 4.7 default. The source is code.claude.com/docs/en/model-config and the API effort docs, which add that lower effort settings on Fable 5 often exceed the xhigh or even max performance of previous models (their wording, lightly compressed). The menu runs low, medium, high, xhigh, and max; max is session-only with no token-spend cap. The /effort menu also offers ultracode, a Claude Code setting rather than an API effort level, which sends xhigh and orchestrates dynamic multi-agent workflows.

Anthropic's positioning in the Claude Code docs: "the most capable model in Claude Code, suited to tasks larger than a single sitting... sustains long autonomous sessions, investigates before acting, and verifies its work more often than smaller models." Their guidance: describe the outcome, not the steps. Hand it ambiguous problems. Skip the verification reminders.

One more constraint: there is no fast mode on Fable 5.

The 35-dimension audit Fable 5 ran clean

Fable 5 ran Orbyt's full 35-dimension audit in one session and every dimension came back A on the first pass. Context so the receipts mean something: Orbyt is a production job search CRM plus a paid salary-data API, built by one person with Claude Code. The repo's generated stats as of June 10: 418K lines of code, 12,248 total tests, 5 projects, 1,647 hours, 2,465 commits on main.

I turned the effort dial all the way up and gave it one instruction. One session executed the entire battery: build, types, lint, 10,253 fast tests (the unit and property suite, not the 12,248 total), 15 locales in sync, the iOS app, the Safari extension, 152 marketing pages, 69 smoke routes, 182 hand-coded links, the security battery, the locked API envelope, the SLO table, the stress floor, the SAST sweep.

Zero fixes. The only diff was two regenerated stats JSON files.

For three months the audit has been where the work was found: a cache bug corrupting production API responses, doorway pages, a trial-farming loophole, hardcoded model IDs that did not exist. This pass surfaced nothing. My entire quoted contribution was "commit and push."

A clean audit is evidence the system held, not proof the model got smarter. The Fable 5 part is one session sustaining the whole battery without drifting, stalling, or asking for hand-holding. That is the "tasks larger than a single sitting" positioning with a receipt instead of a slogan.

The Fable 5 multi-agent run that caught its own fabricated statistic

From one instruction, Fable 5 orchestrated a nineteen-agent content pipeline whose adversarial review layer caught a statistic another agent had fabricated. This is the run that changed how I think about the model.

I asked for one high-value guide post, optimized for search and AI engines, with simulated tier-1 review panels. Fable 5 spun up four panelist agents, composites of Google, Anthropic, OpenAI, and Perplexity review teams. Each searched the live web and proposed three topics.

All four independently proposed the same topic: the job offer. Three judge agents scored every candidate; the winner took 45.7 out of 50 because the site owns the exact dataset an offer guide needs, 3,445 roles across 81 cities. An architect turned it into a 15-section plan and five drafters wrote it in parallel: 8,500 words, 12 FAQ answers.

Here is the centerpiece. Four adversarial reviewers plus a fact-checker ran live searches against every statistic. Thirteen of fifteen claim families confirmed against primary sources, some to the dollar. One died: "ninety four percent of HR professionals," attributed to a survey three targeted searches could not find. A drafting agent had invented it, attribution and all, inside FAQ markup that AI engines quote verbatim. It was cut. My build log: "the penalty for getting caught is the whole strategy."

A voice editor then fixed the seams between drafting agents and caught a publisher-logo URL that had 404ed in four pages of structured data for a month.

The log's close: "One page. Ten thousand two hundred fifty three tests still green."

I have no token-spend figures for this run and will not invent them. The agent counts come from the repo's log, written in the same session as the work.

The lesson: multi-agent output is only as trustworthy as its adversarial layer. Fable 5 makes the orchestration dependable. It does not make verification optional.

What Fable 5 actually costs

The sticker is $10 input and $50 output per million tokens. Exactly 2x Opus 4.8's $5 and $25.

Here is the part most coverage skips. Fable 5 uses a new tokenizer, and per Anthropic's migration guidance, the same content tokenizes to roughly 30 percent more tokens than on Opus-tier models. Token budgets and max_tokens values measured on other models do not transfer.

So the arithmetic: 2x the price multiplied by roughly 1.3x the tokens is roughly 2.6x the real-world input cost per million tokens of content versus Opus 4.8. Output stays at the 2x sticker, but a more capable model doing longer autonomous runs also produces more output.

Do not trust my multiplier or anyone else's. The count_tokens endpoint returns both input_tokens (the new tokenizer, what you are billed) and input_tokens_prior_tokenizer, so you can measure the delta on your own prompts before switching.

Two mitigations: the 1M context window runs at standard pricing with no premium beyond 200K, and requests the classifiers refuse before any output are not billed.

The market is split on whether the price is worth it. TechCrunch framed enterprises as "growing critical of AI costs" and wrote that the price "might serve as a deterrent for widespread use." Rakuten is quoted in launch coverage saying "the extra thinking pays for itself."

How to drive a long-horizon model

The biggest structural shift in long-horizon agentic coding: single requests on hard tasks run for many minutes. A 15-minute single request is normal at higher effort. The skill is no longer babysitting turns. It is writing a spec good enough to leave alone.

The docs say it outright: prompts written for prior models are often too prescriptive and reduce Fable 5's output quality. State the goal, the constraints, and what done looks like. Let the model own the steps. Treat this section as the working Claude Fable 5 prompting guide.

So delete the old scaffolding when you migrate prompts. "First read these files, then summarize, then propose a plan, then wait for approval" is now an anti-pattern, and so are forced progress-update reminders.

Effort guidance from the docs, not vibes: the default is high and the right starting point. Sweep low and medium for routine work, since lower effort on Fable 5 often exceeds xhigh on prior models. Reserve max, which is session-only and unbounded. Ultracode is the multi-agent orchestration setting; the guide pipeline two sections up is what it looks like in practice.

And give it a memory surface. The model performs notably better when it can write learnings somewhere, even a plain markdown file. My lived version: Orbyt's CLAUDE.md is a living document updated in the same session as the work, and a large part of why one-instruction runs like the audit land clean.

Here is the spec template I now use for handing Fable 5 a long-horizon job. It works in any agentic tool.

GOAL
The outcome you want, stated once. Never the steps.

CONTEXT
Where to look. What the system is. Why this matters.

BOUNDARIES
Files and systems not to touch.
Decisions reserved for the human.

DONE WHEN
Verifiable completion criteria. Tests green. Typecheck clean.
Named behaviors that must hold.

REPORTING
Lead with the outcome.
Every claim must trace to a tool result from this session.
If something is unverified, say so explicitly.

Fill each block in plain sentences. The BOUNDARIES and DONE WHEN blocks do the most work.

The REPORTING block does real work. Grounded, evidence-backed progress claims are a documented prompt pattern for this model, and the fabricated statistic two sections up is what happens when verification is skipped entirely.

The audit was one turn. The guide pipeline was one instruction that became a workflow of nineteen agents. The leverage moved from prompt frequency to spec quality.

Refusals, the Opus fallback, and the security gray zone

The mechanism, from the docs: Fable 5 runs safety classifiers targeting research biology and most cybersecurity content. A declined request returns HTTP 200 with stop reason refusal, not an error, and includes a stop_details object naming the policy category (cyber or bio) when one applies; the field can be null, so branch on the stop reason, not the category. Pre-output refusals are not billed.

What Claude Code does about it matters more than the refusal. A flagged request is automatically rerun on the default Opus model, Opus 4.8 on the Anthropic API and Opus 4.7 on Claude Platform on AWS, and the session continues on Opus until you run /model fable again. That silent continuation is the trap. The first request of a session can even trip the classifier on workspace context alone, before you have asked anything. claude --safe-mode diagnoses this, and /config can switch the behavior to ask first.

Here is the gray zone. A normal SaaS codebase is full of security-adjacent code: auth flows, rate limiting, webhook signature verification, an audit battery with a SAST sweep and an exfiltration guard. That is exactly the territory the cyber classifier patrols.

My records contain no classifier fallback event yet. The full 35-dimension audit, security battery and SAST sweep included, ran to completion on Fable 5. Two days of running it against a codebase full of auth, rate-limit, and webhook-signature code has not tripped it for me yet, and the docs tell me to expect that it eventually will.

Scale context: TechCrunch, reporting Anthropic's data, says at least 95 percent of sessions run entirely on Fable 5, and an external bug bounty produced no universal jailbreaks in over 1,000 hours of testing.

The habit: glance at which model answered. If a security-flavored session quietly became Opus 4.8 an hour ago, /model fable brings you back.

Claude Fable 5 vs Opus 4.8: when each one wins

Claude Fable 5 vs Opus 4.8 for coding comes down to task shape: Fable 5 wins anything bigger than one sitting, Opus 4.8 wins routine, latency-sensitive, and review-heavy work at half the sticker price.

Model	Reach for it when	Why
Fable 5	Tasks larger than one sitting: multi-file refactors, overnight autonomous runs, hard debugging, ambiguous problems, multi-agent orchestration	Sustains long runs, investigates before acting, dependable parallel sub-agents
Opus 4.8	Routine edits, latency-sensitive work, code review harnesses, security-adjacent sessions that would trip the classifiers anyway	Half the sticker, no tokenizer premium
Sonnet 4.6	High-volume drafts, summarization, well-scoped feature work	Speed and cost
Haiku 4.5	Parsing, classification, labeling at scale	Cheapest per call

CodeRabbit's independent 105-EP code-review benchmark scored Fable 5 at 65 of 105 actionable EPs, just behind Opus 4.8 at 66. Their read matches mine: better at writing code than reviewing it. If review harnesses are your workload, staying on Opus 4.8 is defensible today.

My actual split this week: Fable 5 took the audit and the guide pipeline. The routine commits in the same git log, an env-var validation fix, type tightening, a webhook handler refactor, did not need it.

Use Opus 4.8 if your work is high-volume, latency-sensitive, or review-heavy. Use Fable 5 when the task is bigger than a sitting and the spec is solid. If you came here asking for the best AI coding model in 2026, that routing table is the honest answer: there is no single name, there is a dispatch decision.

What I cannot tell you yet

No benchmark figures beyond what Anthropic or named third parties published under their own names. The "80.3% SWE-Bench Pro" and "88.0% Terminal-Bench 2.1" numbers circulating in SEO roundups are not confirmed against an Anthropic primary source, so they do not appear here. I do not have those numbers and I will not invent them.

No formal side-by-side A/B against Opus 4.8 on identical tasks. Two days of heavy production use is a strong signal and a weak verdict. The advice from the Opus 4.7 post stands: run your own workload side by side for a week, then switch.

No measured per-session dollar costs from my repo. The 2.6x figure is arithmetic, not a measurement.

What this means for your career

The 2026 hiring question has not changed since the Opus 4.7 release: not "do you use AI" but "what is the highest-leverage thing you have built with AI." Fable 5 changes what a credible answer sounds like.

The new bar, concretely: you can write a one-turn spec that survives a 15-minute autonomous run, you can orchestrate multi-agent work and audit it, and you can defend a model-routing decision in dollars, tokenizer math included. The fabricated-statistic catch above is the interview story: a drafting agent invented a survey, an adversarial layer caught it, and you can explain why that layer existed before the mistake happened.

Adversarial verification of AI output is becoming a skill interviewers ask for by name. The engineer who knows models fabricate confidently, and builds the catch into the pipeline, is worth more than the engineer who just prompts well. It is cheap to claim and expensive to demonstrate. Demonstrate it.

Is Fable 5 worth it for solo developers? The argument fits in one line of numbers: 418K lines, 12,248 tests, 5 projects, 1 human. The ceiling for one AI-fluent person rises with each release, and that rising ceiling is the career case for building this skill now. It is also why an AI-native stack compounds instead of stalling.

Recruiter screens at AI-forward companies will ask about Fable 5 within the month. Have an answer that is more than the announcement copy, and run interview prep before the screen. I wrote the narrative companion to this piece on my personal site, Every Fable Has a Moral. Mine Has Data., if you want the same week through a different lens. This one is the field manual. If you are earlier in the curve, start with AI for engineers, take the AI Skills Assessment to place yourself, and check what AI-specific skills pay in the skills premium data.

Common questions

Is Claude Fable 5 worth it for everyday coding, or should I stay on Opus 4.8?

For routine edits, review harnesses, and latency-sensitive work, Opus 4.8 at half the sticker remains the right call. Claude Fable 5 earns its cost on tasks bigger than one sitting: long refactors, autonomous runs, multi-agent orchestration. It is free on Pro, Max, Team, and seat-based Enterprise plans through June 22, 2026, so test it on your own work now.

How do I use Claude Fable 5 in Claude Code?

Update Claude Code to v2.1.170 or later, then run /model fable to save it as your user default. The best alias also works where your org has access. Fable 5 is not the default on any plan; default effort is high, and thinking is always on.

How much does Claude Fable 5 actually cost once the new tokenizer is factored in?

The sticker is $10 input and $50 output per million tokens, twice Opus 4.8. The new tokenizer produces roughly 30 percent more tokens for the same content, so effective input cost is roughly 2.6x. Measure your delta with count_tokens, which returns counts under both tokenizers.

Why does Claude Fable 5 refuse my prompts and switch to Opus 4.8?

Safety classifiers target research biology and most cybersecurity content. In Claude Code, a flagged request automatically reruns on Opus 4.8 and the session stays there until you run /model fable again. claude --safe-mode diagnoses it. Anthropic's data, reported by TechCrunch, says at least 95 percent of sessions never fall back.

Is Claude Fable 5 better than GPT-5.5 for real coding work?

I have not run a formal comparison and will not pretend otherwise. My professional bias is Claude, because it is what I ship production code with. At the frontier tier, run your own workload against both for a week and decide on your evals, not launch-day takes.

Start this week. The free window closes June 22.

Four steps.

Update Claude Code to v2.1.170 or later and run /model fable.
Leave effort at high, the documented default.
Pick one task you would normally break into pieces, write it as one well-specified turn using the template above, and let it run.
Before moving API workloads, re-baseline tokens with count_tokens against claude-fable-5 and check the prior-tokenizer delta.

Fable 5 is free on Pro, Max, Team, and seat-based Enterprise plans only through June 22, 2026. From June 23 it draws usage credits. The free window is the cheapest A/B you will ever run on a frontier model.

The bottom line. Opus 4.7 was the release where I stopped second-guessing multi-step commands. Fable 5 is the release where the unit of work stops being the prompt and becomes the spec. That is a workflow change, not a feature.

Take the AI Skills Assessment to see where you sit against the 2026 bar. Browse the AI Skills Lab for the modules at your tier. Track your AI-augmented job search inside Orbyt. The bar moved again. Go.

Free Tools

Free Interview Prep

Get 5 AI-generated questions they'll likely ask and 3 smart questions to ask them. Tailored to the company and role.

Try it free

Free Resume Score

Paste your resume and a job description. Get an instant ATS match score with 3 specific fixes.

Score my resume

Share this guideX LinkedIn

Keep reading

Power User

Start your AI-powered job search

Track applications, tailor resumes with AI, and land your next role faster. Free to start, no credit card required.

Get started free