Claude Opus 4.6
Claude Opus 4.6 is Anthropic's most intelligent model, improving on its predecessor's coding skills with more careful planning, longer agentic task sustenance, more reliable operation in larger codebases, and better code review and debugging skills. First Opus-class model with 1M token context window (beta), 128K output tokens, and adaptive thinking. Features effort controls (low/medium/high/max) and context compaction for long-running tasks. State-of-the-art on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp. Pricing: $5/$25 per million tokens (input/output).
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 99.8% | self-reported llm-stats | link → |
| ARC-AGI v2 | 68.8% | self-reported llm-stats | link → |
| BrowseComp | 84.0% | self-reported llm-stats | link → |
| CharXiv-R | 77.4% | self-reported llm-stats | link → |
| CyberGym | 73.8% | self-reported llm-stats | link → |
| DeepSearchQA | 91.3% | self-reported llm-stats | link → |
| FigQA | 78.3% | self-reported llm-stats | link → |
| Finance Agent | 60.7% | self-reported llm-stats | link → |
| GDPval-AA | 1,606 | self-reported llm-stats | link → |
| GPQA | 91.3% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 61.5% | self-reported llm-stats | link → |
| Graphwalks parents >128k | 95.4% | self-reported llm-stats | link → |
| Humanity's Last Exam | 53.1% | self-reported llm-stats | link → |
| MCP Atlas | 62.7% | self-reported llm-stats | link → |
| MMMLU | 91.1% | self-reported llm-stats | link → |
| MMMU-Pro | 77.3% | self-reported llm-stats | link → |
| MRCR v2 (8-needle) | 93.0% | self-reported llm-stats | link → |
| OpenRCA | 34.9% | self-reported llm-stats | link → |
| OSWorld | 72.7% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 77.8% | self-reported llm-stats | link → |
| SWE-Bench Verified | 80.8% | self-reported llm-stats | link → |
| Tau2 Retail | 91.9% | self-reported llm-stats | link → |
| Tau2 Telecom | 99.3% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 65.4% | self-reported llm-stats | link → |
| Vending-Bench 2 | 8,017.59 | self-reported llm-stats | link → |