Claude Opus 4.8: sharper judgment, more honesty, longer autonomy

May 28, 2026 5 min read Singrey

Anthropic shipped Claude Opus 4.8 today: four times less likely to miss flaws in its own code, more honest, and able to work autonomously for longer.

Source verified

Anthropic released Claude Opus 4.8 today, May 28, 2026 — less than two months after the previous version. The one-line summary: it isn't faster, it's more honest. It tells you what it did and where it got stuck instead of papering over it.

What was announced

Anthropic framed Opus 4.8 around "sharper judgment, more honesty about its progress, and the ability to work independently for longer than its predecessors." The most concrete claim is on the coding side: the model is four times less likely than Opus 4.7 to let flaws slip through in the code it produces. Early testers also report it flags uncertainty more often and avoids unsupported claims.

Pricing is unchanged: $5 per million input tokens and $25 per million output tokens at standard rates. It's available via the `claude-opus-4-8` API, plus Amazon Bedrock and Vertex AI, with a 1M-token context window by default.

What changed

Three things stand out against the previous version.

First, effort control: on claude.ai and Cowork you now choose how much "thinking" Claude spends on a response, from Low to Max. Opus 4.8 defaults to high effort — you tune the balance between speed and quality yourself.

Second, Dynamic Workflows (in research preview): Claude writes orchestration scripts that spin up tens to hundreds of parallel subagents in one session, attacks the problem from independent angles, even deploys adversarial agents to refute its own findings, and iterates until answers converge. For now it's limited to the Enterprise/Team/Max tiers of Claude Code.

Third, fast mode: it runs at 2.5× speed. That ties directly into the point I made when I wrote about how token-based pricing is shifting — speed is now its own line item.

On benchmarks the picture is measured progress: it edges past Opus 4.7 on agentic coding, lands in the 82% range on OSWorld-Verified for computer use, and 84% on web navigation (Online-Mind2Web). Steady step, not a leap.

First impression

Let me be honest: the model shipped today and I haven't had time to benchmark it deeply on my own projects yet. So this is a first impression, not a finished test.

What caught my attention wasn't the benchmark numbers but the framing Anthropic chose: "more honest," "open about its progress," "misses fewer code flaws." For someone working solo, a model being 2% faster matters less than it telling you when it isn't sure. I wrote earlier about the dangerous case where AI silently produces wrong code — code that runs but is wrong, and nobody notices. Opus 4.8's whole pitch is reducing exactly that class of silent error. Whether it actually does, I'll see in my own code over the coming weeks.

Practical impact

For indie makers and solo developers, the most concrete win is effort control. Drop to Low for simple tasks without burning rate limits or budget; switch to Max for complex refactors and use the full brainpower. That's a practical lever.

Dynamic Workflows is more distant for most of us right now — enterprise tier, research preview. Hundreds of parallel agents make sense for large research or audit work, not a one-person project.

Stable pricing is good news. When I wrote about what I noticed working with Opus 4.7, the cost/value balance was my main theme; if 4.8 gives better judgment at the same price, there's little reason not to upgrade.

Limits and concerns

The benchmark gains are modest — the "four times fewer code flaws" headline is strong, but it's Anthropic's own measurement; independent tests will show how it holds up in real use. Dynamic Workflows being behind closed tiers means most individual users won't try it for a while.

There's also Mythos: Anthropic repeated that it plans to bring Mythos-class models, currently in limited cybersecurity deployment, to all customers "in the coming weeks." I covered this when I wrote about Mythos and its restricted-access model; the 4.8 announcement suggests that timeline is still live.

Bottom line

Opus 4.8 isn't a "changes everything" release; it's more of a maturation step. Same price, better judgment, fewer silent errors, and effort you can dial in. My plan: run it on my own codebase at high effort for a few days and see for myself whether it really misses fewer flaws. No need to rush the upgrade — but I see no reason to wait either.

A confession: the model I used to write this post is itself now 4.8. So in a way I'm live-testing the "more honest" claim — and for now, I like that it tells me where it isn't sure. The real verdict comes in a few days, from my own code.