Claude Sonnet 4.6
Claude Sonnet 4.6 is a full upgrade of the model's skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Users preferred Sonnet 4.6 over Sonnet 4.5 approximately 70% of the time. First Sonnet-class model with 1M token context window (beta) and context compaction. Major improvement in computer use skills compared to prior Sonnet models. Default model on Free and Pro plans. Pricing: $3/$15 per million tokens (input/output).
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ARC-AGI v2 | 58.3% | self-reported llm-stats | link → |
| BrowseComp | 74.7% | self-reported llm-stats | link → |
| Finance Agent | 63.3% | self-reported llm-stats | link → |
| GDPval-AA | 1,633 | self-reported llm-stats | link → |
| GPQA | 89.9% | self-reported llm-stats | link → |
| Humanity's Last Exam | 49.0% | self-reported llm-stats | link → |
| MCP Atlas | 61.3% | self-reported llm-stats | link → |
| MMMLU | 89.3% | self-reported llm-stats | link → |
| MMMU-Pro | 75.6% | self-reported llm-stats | link → |
| OSWorld | 72.5% | self-reported llm-stats | link → |
| SWE-Bench Verified | 79.6% | self-reported llm-stats | link → |
| Tau2 Retail | 91.7% | self-reported llm-stats | link → |
| Tau2 Telecom | 97.9% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 59.1% | self-reported llm-stats | link → |