Claude Opus 4.8
Claude Opus 4.8 is Anthropic's upgrade to Opus 4.7 and its most capable general-access model at release, with improvements across software engineering, agentic tool use, reasoning, computer use, and knowledge-work benchmarks while shipping at the same price ($5/$25 per million input/output tokens). Performance gains include SWE-Bench Verified (88.6%), SWE-Bench Pro (69.2%), Terminal-Bench 2.1 (74.6%), GPQA Diamond (93.6%), USAMO 2026 (96.7%), Humanity's Last Exam with tools (57.9%), OSWorld-Verified (83.4%), BrowseComp (84.3% single-agent, 88.5% multi-agent), MCP-Atlas (82.2%), and GDPval-AA (1890 Elo). The alignment assessment reports honesty improvements with around a four-fold drop in letting flaws in self-written code pass unremarked, a 17-fold drop relative to Sonnet 4.6 on dishonest agentic code summaries, and broadly improved adherence to Claude's constitution. The model defaults to high effort and exposes new 'extra' (xhigh) and 'max' levels for harder problems. Launches alongside Claude Code dynamic workflows (parallel subagents that plan, execute, and verify codebase-scale migrations), effort control in claude.ai and Cowork, and a Messages API extension that accepts system entries inside the messages array so harnesses can update instructions mid-task without breaking the prompt cache. Fast mode runs at 2.5× speed at $10/$50 per million input/output tokens, three times cheaper than fast mode on previous models. Available across Claude products, the Claude API as `claude-opus-4-8`, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| BrowseComp | 84.3% | self-reported llm-stats | link → |
| CharXiv-R | 89.9% | self-reported llm-stats | link → |
| CyberGym | 78.8% | self-reported llm-stats | link → |
| DeepSearchQA | 93.1% | self-reported llm-stats | link → |
| Finance Agent | 53.9% | self-reported llm-stats | link → |
| GDPval-AA | 1,890 | self-reported llm-stats | link → |
| GPQA | 93.6% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 68.1% | self-reported llm-stats | link → |
| Graphwalks parents >128k | 83.3% | self-reported llm-stats | link → |
| HealthBench Professional | 55.8% | self-reported llm-stats | link → |
| Humanity's Last Exam | 57.9% | self-reported llm-stats | link → |
| Include | 87.6% | self-reported llm-stats | link → |
| MCP Atlas | 82.2% | self-reported llm-stats | link → |
| OfficeQA Pro | 66.2% | self-reported llm-stats | link → |
| OSWorld-Verified | 83.4% | self-reported llm-stats | link → |
| ScreenSpot Pro | 87.9% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 84.4% | self-reported llm-stats | link → |
| SWE-Bench Multimodal | 38.4% | self-reported llm-stats | link → |
| SWE-Bench Pro | 69.2% | self-reported llm-stats | link → |
| SWE-Bench Verified | 88.6% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 74.6% | self-reported llm-stats | link → |
| Toolathlon | 59.9% | self-reported llm-stats | link → |