Back to Insights
AI

Anthropic Releases Claude Opus 4.7 — Narrowly Retaking the Frontier Lead

Opus 4.7 leads on SWE-bench and agentic coding, adds an xhigh reasoning tier and /ultrareview command, and ships safeguards against cyber misuse — same price as 4.6.

S5 Labs Team April 16, 2026

Anthropic launched Claude Opus 4.7 on April 16 as its most capable generally available model. It is a targeted upgrade over Opus 4.6 — not a new generation — but the gains where they matter most (software engineering, long-horizon agents, vision) are enough to nudge Anthropic back ahead of GPT-5.4 and Gemini 3.1 Pro on several headline benchmarks.

Pricing did not move: 5permillioninputtokensand5 per million input tokens and 25 per million output tokens, identical to Opus 4.6. The model is available across Claude products, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry from day one.

The Benchmark Picture

The coding story is the clearest. Anthropic concedes that its unreleased internal model “Mythos” remains more capable overall, but Opus 4.7 is the strongest generally available model on several key engineering benchmarks.

BenchmarkOpus 4.7Opus 4.6GPT-5.4Gemini 3.1 Pro
SWE-bench Verified87.6%80.8%80.6%
SWE-bench Pro64.3%53.4%57.7%54.2%
Terminal-Bench 2.069.4%65.4%75.1%~67%
ARC-AGI-277.1%76.1%~74%
MCP-Atlas (tool orchestration)77.3%

Anthropic reports a 14% improvement over Opus 4.6 on complex multi-step workflows while using fewer tokens and producing roughly a third of the tool-call errors. Early-access partner Warp confirmed Opus 4.7 passed Terminal-Bench tasks that Opus 4.6 had failed — including a concurrency bug that tripped the prior model consistently.

GPT-5.4 still leads on raw terminal coding, and the gap on ARC-AGI-2 with the frontier is narrow. The honest read is that Opus 4.7 is the best general-purpose engineering model available today, but the leaderboard is close enough that the “winner” depends on the workload.

What’s Actually New

Four changes matter for developers building on the model.

xhigh reasoning tier. Opus 4.7 introduces an “extra high” effort level between high and max, giving finer control over the reasoning-vs-latency tradeoff on hard problems. For most agentic workflows, xhigh gets most of max’s quality at a meaningful latency improvement.

Higher image resolution. Maximum image input jumped from ~1.15 megapixels (1,568 px on the long edge) to ~3.75 megapixels (2,576 px). For anyone doing document understanding, UI screenshot analysis, or chart reasoning, this is the largest practical improvement in the release — Opus 4.6 often required tiling strategies that Opus 4.7 handles natively.

/ultrareview inside Claude Code. A new command in the Claude Code environment that runs deeper static and semantic review of a change — more rigorous than the default review pass. Pair this with the release’s lower tool-call error rate and long-running code review becomes meaningfully more reliable.

Task budgets (beta). A new mechanism that lets developers cap how much reasoning Claude spends on a long-running task. Useful for bounding cost on open-ended agent runs where you want the model to stop thinking and ship.

Cyber Safeguards

The release ships with new safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses. This follows directly from Anthropic’s work on Claude Mythos, its cybersecurity-specialized model — Mythos exposed what a dedicated cyber-capable model looks like, and Opus 4.7 is hardened against that capability being abused through the public model.

OpenAI landed on a similar calculus in its own April release: GPT-5.4-Cyber is gated behind a vetted access program rather than exposed in the base model. Both labs are now signaling that capability and access control are inseparable at the frontier.

Same Price, Better Model

Holding the price at 5/5/25 per million input/output tokens is the quietest but most consequential part of the release. For teams already running Opus 4.6 in production, the switch is a drop-in: same cost envelope, more capability. For teams comparing against GPT-5.4 on cost, Opus 4.7 is now competitive on enough coding workloads to merit a real A/B.

The bigger picture: with MCP now under Linux Foundation governance and 97 million monthly SDK installs, Anthropic’s bet that the ecosystem would converge on its protocol has paid off. Opus 4.7’s best-in-class MCP-Atlas score is not coincidental — the model is tuned against the orchestration interface that most agentic tooling now speaks.

What to Watch

The interesting open question is Mythos. Anthropic has said publicly that it outperforms Opus 4.7 and is being used under controlled access for cybersecurity research. If a broader release follows, the “generally available frontier” resets again. Until then, Opus 4.7 is the most capable model most teams can actually buy — and at Opus 4.6 prices, the upgrade path is straightforward.

Sources: Anthropic — Introducing Claude Opus 4.7 · VentureBeat · AWS Bedrock release notes · 9to5Mac

Want to discuss this topic?

We'd love to hear about your specific challenges and how we might help.