Claude Opus 4

Claude Opus 4 is Anthropic's most powerful model and the world's best coding model, part of the Claude 4 family. It delivers sustained performance on complex, long-running tasks and agent workflows. Opus 4 excels at coding, advanced reasoning, and can use tools (like web search) during extended thinking. It supports parallel tool execution and has improved memory capabilities.

Benchmark results

Benchmark Score Tags Source
AIME 2025 75.5% self-reported llm-stats link →
ARC-AGI v2 8.6% self-reported llm-stats link →
GPQA 79.6% self-reported llm-stats link →
MMMLU 88.8% self-reported llm-stats link →
MMMU (validation) 76.5% self-reported llm-stats link →
SWE-Bench Verified 72.5% self-reported llm-stats link →
TAU-bench Airline 59.6% self-reported llm-stats link →
TAU-bench Retail 81.4% self-reported llm-stats link →
Terminal-Bench 39.2% self-reported llm-stats link →