Claude Opus 4.1
Claude Opus 4.1 is a hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window. It delivers superior performance and precision for real-world coding and agentic tasks, handling complex multi-step problems with rigor and attention to detail. With extended thinking capabilities, it offers instant responses or extended step-by-step thinking visible through user-friendly summaries. It advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, excels at agentic search and research, and produces human-quality content with exceptional writing abilities. It supports 32K output tokens and adapts to specific coding styles while delivering exceptional quality for extensive generation and refactoring projects.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 78.0% | self-reported llm-stats | link → |
| GPQA | 80.9% | self-reported llm-stats | link → |
| MMMLU | 89.5% | self-reported llm-stats | link → |
| MMMU (validation) | 77.1% | self-reported llm-stats | link → |
| SWE-Bench Verified | 74.5% | self-reported llm-stats | link → |
| TAU-bench Airline | 56.0% | self-reported llm-stats | link → |
| TAU-bench Retail | 82.4% | self-reported llm-stats | link → |
| Terminal-Bench | 43.3% | self-reported llm-stats | link → |