Claude 3.7 Sonnet
The most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. Shows particularly strong improvements in coding and front-end web development.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 80.0% | self-reported llm-stats | link → |
| AIME 2025 | 54.8% | self-reported llm-stats | link → |
| GPQA | 84.8% | self-reported llm-stats | link → |
| IFEval | 93.2% | self-reported llm-stats | link → |
| MATH-500 | 96.2% | self-reported llm-stats | link → |
| MMMLU | 86.1% | self-reported llm-stats | link → |
| MMMU | 75.0% | self-reported llm-stats | link → |
| SWE-Bench Verified | 70.3% | self-reported llm-stats | link → |
| TAU-bench Airline | 58.4% | self-reported llm-stats | link → |
| TAU-bench Retail | 81.2% | self-reported llm-stats | link → |
| Terminal-Bench | 35.2% | self-reported llm-stats | link → |