Claude 3.7 Sonnet

The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. Shows particularly strong improvements in coding and front-end web development.

Benchmark results

Benchmark Score Tags Source
AIME 2024 80.0% self-reported llm-stats link →
AIME 2025 54.8% self-reported llm-stats link →
GPQA 84.8% self-reported llm-stats link →
IFEval 93.2% self-reported llm-stats link →
MATH-500 96.2% self-reported llm-stats link →
MMMLU 86.1% self-reported llm-stats link →
MMMU 75.0% self-reported llm-stats link →
SWE-Bench Verified 70.3% self-reported llm-stats link →
TAU-bench Airline 58.4% self-reported llm-stats link →
TAU-bench Retail 81.2% self-reported llm-stats link →
Terminal-Bench 35.2% self-reported llm-stats link →