Claude Opus 4.8

Claude Opus 4.8 is Anthropic's upgrade to Opus 4.7 and its most capable general-access model at release, with improvements across software engineering, agentic tool use, reasoning, computer use, and knowledge-work benchmarks while shipping at the same price ($5/$25 per million input/output tokens). Performance gains include SWE-Bench Verified (88.6%), SWE-Bench Pro (69.2%), Terminal-Bench 2.1 (74.6%), GPQA Diamond (93.6%), USAMO 2026 (96.7%), Humanity's Last Exam with tools (57.9%), OSWorld-Verified (83.4%), BrowseComp (84.3% single-agent, 88.5% multi-agent), MCP-Atlas (82.2%), and GDPval-AA (1890 Elo). The alignment assessment reports honesty improvements with around a four-fold drop in letting flaws in self-written code pass unremarked, a 17-fold drop relative to Sonnet 4.6 on dishonest agentic code summaries, and broadly improved adherence to Claude's constitution. The model defaults to high effort and exposes new 'extra' (xhigh) and 'max' levels for harder problems. Launches alongside Claude Code dynamic workflows (parallel subagents that plan, execute, and verify codebase-scale migrations), effort control in claude.ai and Cowork, and a Messages API extension that accepts system entries inside the messages array so harnesses can update instructions mid-task without breaking the prompt cache. Fast mode runs at 2.5× speed at $10/$50 per million input/output tokens, three times cheaper than fast mode on previous models. Available across Claude products, the Claude API as `claude-opus-4-8`, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Benchmark results

Benchmark Score Tags Source
BrowseComp 84.3% self-reported llm-stats link →
CharXiv-R 89.9% self-reported llm-stats link →
CyberGym 78.8% self-reported llm-stats link →
DeepSearchQA 93.1% self-reported llm-stats link →
Finance Agent 53.9% self-reported llm-stats link →
GDPval-AA 1,890 self-reported llm-stats link →
GPQA 93.6% self-reported llm-stats link →
Graphwalks BFS >128k 68.1% self-reported llm-stats link →
Graphwalks parents >128k 83.3% self-reported llm-stats link →
HealthBench Professional 55.8% self-reported llm-stats link →
Humanity's Last Exam 57.9% self-reported llm-stats link →
Include 87.6% self-reported llm-stats link →
MCP Atlas 82.2% self-reported llm-stats link →
OfficeQA Pro 66.2% self-reported llm-stats link →
OSWorld-Verified 83.4% self-reported llm-stats link →
ScreenSpot Pro 87.9% self-reported llm-stats link →
SWE-bench Multilingual 84.4% self-reported llm-stats link →
SWE-Bench Multimodal 38.4% self-reported llm-stats link →
SWE-Bench Pro 69.2% self-reported llm-stats link →
SWE-Bench Verified 88.6% self-reported llm-stats link →
Terminal-Bench 2.0 74.6% self-reported llm-stats link →
Toolathlon 59.9% self-reported llm-stats link →