Claude Opus 4.1

Claude Opus 4.1 is a hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window. It delivers superior performance and precision for real-world coding and agentic tasks, handling complex multi-step problems with rigor and attention to detail. With extended thinking capabilities, it offers instant responses or extended step-by-step thinking visible through user-friendly summaries. It advances state-of-the-art coding performance to 74.5% on SWE-bench Verified, excels at agentic search and research, and produces human-quality content with exceptional writing abilities. It supports 32K output tokens and adapts to specific coding styles while delivering exceptional quality for extensive generation and refactoring projects.

Benchmark results

Benchmark Score Tags Source
AIME 2025 78.0% self-reported llm-stats link →
GPQA 80.9% self-reported llm-stats link →
MMMLU 89.5% self-reported llm-stats link →
MMMU (validation) 77.1% self-reported llm-stats link →
SWE-Bench Verified 74.5% self-reported llm-stats link →
TAU-bench Airline 56.0% self-reported llm-stats link →
TAU-bench Retail 82.4% self-reported llm-stats link →
Terminal-Bench 43.3% self-reported llm-stats link →