Claude Opus 4
Claude Opus 4 is Anthropic's most powerful model and the world's best coding model, part of the Claude 4 family. It delivers sustained performance on complex, long-running tasks and agent workflows. Opus 4 excels at coding, advanced reasoning, and can use tools (like web search) during extended thinking. It supports parallel tool execution and has improved memory capabilities.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 75.5% | self-reported llm-stats | link → |
| ARC-AGI v2 | 8.6% | self-reported llm-stats | link → |
| GPQA | 79.6% | self-reported llm-stats | link → |
| MMMLU | 88.8% | self-reported llm-stats | link → |
| MMMU (validation) | 76.5% | self-reported llm-stats | link → |
| SWE-Bench Verified | 72.5% | self-reported llm-stats | link → |
| TAU-bench Airline | 59.6% | self-reported llm-stats | link → |
| TAU-bench Retail | 81.4% | self-reported llm-stats | link → |
| Terminal-Bench | 39.2% | self-reported llm-stats | link → |