Model Signals/Model Breakdown

CLAUDE SONNET 5:
OPUS-CLASS AGENCY, WITHOUT THE GATE

A Signal + Noise model breakdown on the workhorse that nearly caught Opus 4.8 — and why the careful builder Fable 5 boxed out finally has a daily driver.

Classification: Public
Author: Isaiah Steinfeld
Published: June 30, 2026
Series: Model Signals — 004

The Arc

EVERYONE SEES CHEAPER AGENTS. IT’S ALSO THE ANTI-FABLE.

The headline read is correct and small: Sonnet 5 is the most agentic Sonnet yet, lands close to Opus 4.8, and costs a lot less — the same “cheaper agents” framing OpenAI and Google are using. True. Not the interesting part.

The interesting part is the timing. Twenty-one days after Fable 5 shipped frontier capability locked behind a broad, conservative classifier, Anthropic shipped its structural inverse: a model with near-Opus agency and deliberately low cyber capability, so it never needs Fable’s gate. Same lab, opposite philosophy, three weeks apart. If Fable’s story was “the wrapper is the product,” Sonnet 5’s is “tune the model out of the danger zone and you don’t need a wrapper.” Two answers to the same governance question, and this is the one most builders will actually live in.

Bottom Line

The Verdict

Sonnet 5 is the new default workhorse, and for most agentic work it’s the correct default. Near-Opus performance on reasoning, tool use, coding, and computer use — at roughly 40% of Opus 4.8’s price — with measurably better safety behavior than the Sonnet it replaces.

The trade is explicit and intentional. Sonnet 5 is deliberately weak at cyber and still trails Opus 4.8 on the hardest accuracy-critical tasks. If you need peak reasoning on frontier-hard problems, or sanctioned cyber depth, reach for Opus. For the everyday 80% — multi-step agentic execution that used to demand a bigger model — this is now the one to reach for first.

The Signal

THE FABLE REFUGEE JUST GOT A DAILY DRIVER

The Non-Obvious Read

The careful builder Fable 5 kept demoting — the one whose defensive, hardening, auth-adjacent work tripped Fable’s broad cyber classifier and parked the whole session on Opus — is exactly who Sonnet 5 serves. It ships with only the standard Opus 4.7/4.8 cyber safeguards, a far narrower net than Fable’s, because its cyber capability is intentionally low. It won’t aggressively fall back on you, because it was never given the capability that trips Fable’s gate in the first place.

That’s the pattern worth naming across the two launches: Anthropic is now shipping capability and safety as separate product axes. Fable gated capability up and wrapped it. Sonnet tuned capability down and didn’t have to. For anyone doing responsible systems work, the second design is simply more usable — it answers the prompt instead of demoting itself mid-task.

You can watch the Sonnet line itself get de-cybered. Sonnet 4.5’s own launch pitch led with cybersecurity — agents that “autonomously patch vulnerabilities before exploitation.” Nine months later, Sonnet 5 is deliberately weak at cyber, ships with block-on cyber safeguards, and its AWS launch post drops cyber from the industry list entirely, pitching finance and automation instead. Same family, same publisher — cyber got re-tiered out of the mid-market.

The mechanism to understand is the effort dial — several tiers up to x-high (Anthropic’s post named only “xhigh”; OpenRouter lists five: low, medium, high, max, x-high). The cost-performance curve spans wider than Sonnet 4.6 and overlaps Opus 4.8 at the top; you tune cost versus capability per request instead of swapping models. The catch: at the top of the dial, cost can exceed Opus 4.8 at similar quality — a ceiling past which you should just use the bigger model.

Model Profile

THE SPECS THAT MATTER

Model Class

Sonnet (mid-tier)

Context Window

Max Output

128K

Intro Price (thru Aug 31)

$2 / $10 per M

Standard Price (Sep 1+)

$3 / $15 per M

Effort Levels

Multiple, up to x-high

Cyber Safeguards

Opus 4.7/4.8-class (default on)

API

claude-sonnet-5

Default model for Free and Pro; available to Max, Team, and Enterprise; live in Claude Code and the Claude Platform, and day-one across Bedrock, Vertex, and Microsoft Foundry, plus Cursor, VS Code, GitHub Copilot, and OpenRouter. Adaptive thinking is on by default; the temperature, top_p, and top_k sampling knobs are no longer supported.

One thing the model card hides: Sonnet 5 isn’t one product experience across providers. On early OpenRouter telemetry, Vertex is fastest (~1s latency), Anthropic and the Claude Platform on AWS sit mid-pack, Azure is slowest, and Bedrock-US runs both pricier and slower. Treat the figures as a launch-week snapshot, but the takeaway holds: provider choice is part of the model decision.

By the numbers (Anthropic’s own figures)

Benchmark	Sonnet 4.6	Sonnet 5	Opus 4.8
SWE-bench Pro (agentic coding)	58.1%	63.2%	69.2%
Terminal-Bench 2.1	67.0%	80.4%	—
OSWorld-Verified (computer use)	78.5%	81.2%	—
Humanity’s Last Exam (w/ tools)	46.8%	57.4%	57.9%
GDPval-AA v2 (knowledge work)	—	1,618	1,615

The shape: Sonnet 5 beats its predecessor in every tested category, closes most of the gap to Opus 4.8, nearly matches it on Humanity’s Last Exam with tools, and actually edges it on the GDPval knowledge-work benchmark. Opus still leads on the hardest agentic coding.

Independent corroboration, not Anthropic grading its own homework: Artificial Analysis (via OpenRouter) ranks Sonnet 5 above roughly 93–96% of all models on its intelligence, coding, and agentic indices.

Editorial Note — The Tokenizer Catch

The sticker price isn’t the per-task price. Sonnet 5 uses the Opus 4.7 tokenizer, so the same text can map to ~1.0–1.35× more tokens. Anthropic set intro pricing so the switch is roughly cost-neutral through Aug 31 — but after that, the standard $3/$15 rate and the token multiplier both apply. Some secondary coverage claims there’s no tokenizer change or that it only affects Fable; Anthropic’s own launch footnote is explicit that it applies here. Model your real per-task cost, not the headline rate.

Editorial Note — Cross-Vendor Comparisons

Anthropic published no official comparison to GPT-5.6 or Gemini 3.5 Pro (GPT-5.6 hadn’t shipped). The system card’s cross-vendor figures are against GPT-5.5 and Gemini 3.5 Flash: Sonnet 5 leads SWE-bench Pro (63.2 vs 58.6 / 55.1) but GPT-5.5 edges Terminal-Bench 2.1 (83.4 vs 80.4). Treat any Sonnet-5-vs-GPT-5.6 table you see elsewhere as unofficial extrapolation.

Assessment

WHAT’S GOOD AND WHAT ISN’T

What’s Good

Near-Opus agency at ~40% of the price. The best cost-performance in the Claude line for agentic work — and it edges Opus on the GDPval knowledge-work benchmark.

It finishes, and it self-checks. Partners report it completes multi-step tasks that stalled Sonnet 4.6 and verifies its own output unprompted — Zapier described a two-part Salesforce-plus-announcement job it ran end to end where the prior model stalled halfway.

Safer than its predecessor. Lower hallucination, sycophancy, and undesirable-behavior rates; better at refusing malicious requests and resisting prompt-injection hijacks. Lovable’s read: it refuses unsafe requests cleanly and consistently.

Drop-in, with a dial. Same tools and platform features as Sonnet 4.6 — swap the model string — plus per-request effort control to tune cost against quality.

It won’t box out careful builders. The narrow, Opus-class cyber net means defensive and hardening work gets answered, not demoted.

What’s Not Good

Still trails Opus 4.8 on the hardest tasks. Agentic coding is 63.2% vs 69.2%. The launch-day refrain on Hacker News is fair: if the task is genuinely hard, use the bigger model.

xhigh effort can cost more than Opus 4.8 at similar quality. Past a point on the dial, Sonnet stops being the cheaper option — escalate instead of paying more for less.

Migration & cost gotchas. Cost-neutral now, but the token multiplier plus standard $3/$15 pricing shifts the math after Aug 31 — budget for Sept 1. And the temperature/top_p/top_k sampling knobs are gone, which breaks some existing harness configs on the way over.

Structured output is fixed — except on one provider. Early OpenRouter telemetry shows structured-output error down to ~5% on Anthropic and ~4% on Azure (Fable ran 16–19% everywhere), but Claude Platform on AWS still posts ~17%. Production-grade JSON on most endpoints; validate and route with care on that one.

Safer than 4.6, not the safest Claude. It shows higher misaligned-behavior rates than Opus 4.8 and Mythos Preview on Anthropic’s audit — and it’s deliberately weak at cyber, so it isn’t your tool for sanctioned defensive-cyber depth (that’s Opus).

Where It Fits

USE-CASE FITNESS

Use Case	Fitness	Notes
Multi-step agentic execution	Strong	Browsers, terminals, long tool chains — the core pitch
Everyday coding / 4.6 upgrade	Strong	Drop-in; higher scores across the board
Cost-sensitive agents at scale	Strong	~40% of Opus pricing; caching and batch on top
Knowledge work / analysis	Strong	Edges Opus 4.8 on GDPval-AA v2
Defensive engineering / hardening	Daily Driver	The Fable refugee’s home base — won’t fall back
Frontier-hard reasoning / coding	Use Opus	Still trails Opus 4.8 on the hardest slice
Sanctioned cyber depth	Weak	Intentionally low cyber; route to Opus / Cyber Verification
xhigh-effort long tasks, cost-bound	Mixed	Can exceed Opus cost at the top of the dial

Operator Implications

WHAT THIS MEANS FOR YOU

For AI Product Teams & Routers

Sonnet 5 becomes the default mid-tier, and the effort dial absorbs some of your model-swapping: dial up within Sonnet before jumping to Opus. Set a hard rule at the top of the dial — past high effort, escalate to Opus 4.8. And route the defensive, hardening, and abuse-prevention traffic you deliberately kept away from Fable here: it carries only the narrow Opus-class cyber net, so it answers instead of falling back.

For Teams on Sonnet 4.6 (Cost-Sensitive)

It’s a near-cost-neutral drop-in through Aug 31 — swap claude-sonnet-4-6 for claude-sonnet-5 and re-run your eval suite now, while the intro price holds. The offset in your favor: early OpenRouter data shows prompt caching pulls effective input from the $2 list toward ~$0.55 weighted — about $0.40 on Anthropic’s own endpoint at a ~89% cache-hit rate — while output stays pinned near $10 and cache reads run $0.20/M. Reused context is cheap; output is the tax. Then budget for Sept 1, when standard $3/$15 plus the tokenizer multiplier lands.

For the Fable Refugee (see Entry 003)

This is your daily driver. You give up peak cyber-adjacent reasoning, but you gain a model that treats a defensive prompt as a defensive prompt instead of demoting itself to Opus mid-session. Keep Opus 4.8 in the stack for the genuinely hard reasoning and for any sanctioned cyber work that needs reduced guardrails — but make Sonnet 5 the default you actually work in.

Launch-Day Pulse

IMPRESSED, PRICE-CONSCIOUS, A LITTLE RESTLESS

Same-day reaction is genuinely positive but not uncritical. The consensus is that this is a strong upgrade to the model most people already run daily — near-Opus capability, cleaner refusals, tasks that finish where 4.6 stalled — and that the value is obvious at the intro price. The open question everyone is circling is whether it still clears the bar at full $3/$15. Early traffic skews embedded and agentic, too — Descript leads on OpenRouter, ahead of Claude Code and agent runtimes — so the first wave is production tooling, not chat.

The skepticism is specific and worth respecting: value reads clearest at low and medium effort and thins out at high versus Opus 4.8; xhigh can cost more than Opus outright; and the honest builder’s rule — if it’s hard, use a bigger model — still applies. A vocal thread wants Fable back rather than a Sonnet update (Fable and Mythos access remains suspended under an export-control directive), while others note Sonnet is the bread-and-butter model and are glad to see it move.

The macro frame matters too: Sonnet 5 lands as the industry pulls back from runaway token spend and as Anthropic gears toward an expected IPO — the “cheaper agents” message is aimed at that moment, and mirrors OpenAI and Google.

The pulse in two lines: the capability jump over 4.6 is real and broad and the safety gains are a genuine selling point — but the launch is judged as much on pricing durability as capability, where the intro price is a no-brainer and the standard price is the debate.

The Close

Signal / Noise / Action

Signal

The governance question has two answers now. Fable gated capability up and wrapped it; Sonnet 5 tuned capability down and skipped the wrapper. For most agentic work, “down to safe, cheap, and predictable” is the one you want.

Noise

“Is it better than Opus 4.8 or GPT-5.5?” It’s near-Opus and benchmark-dependent against the field. Not the point. It’s the new default workhorse, priced to be one.

Action

Swap claude-sonnet-4-6 → claude-sonnet-5 and re-run your evals now, while intro pricing makes it cost-neutral. Cap effort at high; escalate to Opus past that. If Fable kept demoting you, make this your default.