Claude 3.5 Sonnet

Claude 3.5 Sonnet is a powerful AI model with industry-leading software engineering skills. It excels in coding, planning, and problem-solving, with significant improvements in agentic coding and tool use tasks. The model includes computer use capabilities in public beta, allowing it to interact with computer interfaces like a human user.

Benchmark results

Benchmark Score Tags Source
AI2D 94.7% self-reported llm-stats link →
BIG-Bench Hard 93.1% self-reported llm-stats link →
ChartQA 90.8% self-reported llm-stats link →
DocVQA 95.2% self-reported llm-stats link →
DROP 87.1% self-reported llm-stats link →
GPQA 67.2% self-reported llm-stats link →
GSM8k 96.4% self-reported llm-stats link →
HumanEval 93.7% self-reported llm-stats link →
MATH 78.3% self-reported llm-stats link →
MathVista 67.7% self-reported llm-stats link →
MGSM 91.6% self-reported llm-stats link →
MMLU 90.4% self-reported llm-stats link →
MMLU-Pro 77.6% self-reported llm-stats link →
MMMU 68.3% self-reported llm-stats link →
OSWorld Extended 22.0% self-reported llm-stats link →
OSWorld Screenshot-only 14.9% self-reported llm-stats link →
SWE-Bench Verified 49.0% self-reported llm-stats link →
TAU-bench Airline 46.0% self-reported llm-stats link →
TAU-bench Retail 69.2% self-reported llm-stats link →