GPT-4

GPT-4 is a large multimodal model capable of processing both image and text inputs and generating human-like text outputs. It demonstrates human-level performance on various professional and academic benchmarks.

Benchmark results

Benchmark Score Tags Source
AI2 Reasoning Challenge (ARC) 96.3% self-reported llm-stats link →
DROP 80.9% self-reported llm-stats link →
GPQA 35.7% self-reported llm-stats link →
HellaSwag 95.3% self-reported llm-stats link →
HumanEval 67.0% self-reported llm-stats link →
LSAT 88.0% self-reported llm-stats link →
MATH 42.0% self-reported llm-stats link →
MGSM 74.5% self-reported llm-stats link →
MMLU 86.4% self-reported llm-stats link →
SAT Math 89.0% self-reported llm-stats link →
Uniform Bar Exam 90.0% self-reported llm-stats link →
Winogrande 87.5% self-reported llm-stats link →