Claude 3 Opus

Claude 3 Opus is Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding, showing the outer limits of what's possible with generative AI.

Benchmark results

Benchmark Score Tags Source
ARC-C 96.4% self-reported llm-stats link →
BIG-Bench Hard 86.8% self-reported llm-stats link →
DROP 83.1% self-reported llm-stats link →
GPQA 50.4% self-reported llm-stats link →
GSM8k 95.0% self-reported llm-stats link →
HellaSwag 95.4% self-reported llm-stats link →
HumanEval 84.9% self-reported llm-stats link →
MATH 60.1% self-reported llm-stats link →
MGSM 90.7% self-reported llm-stats link →
MMLU 86.8% self-reported llm-stats link →
MMLU-Pro 68.5% self-reported llm-stats link →