GPT-4o

GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.

Benchmark results

Benchmark Score Tags Source
ActivityNet 61.9% self-reported llm-stats link →
AI2D 94.2% self-reported llm-stats link →
Aider-Polyglot 30.7% self-reported llm-stats link →
Aider-Polyglot Edit 18.2% self-reported llm-stats link →
AIME 2024 13.1% self-reported llm-stats link →
ChartQA 85.7% self-reported llm-stats link →
CharXiv-D 85.3% self-reported llm-stats link →
CharXiv-R 58.8% self-reported llm-stats link →
COLLIE 61.0% self-reported llm-stats link →
ComplexFuncBench 66.5% self-reported llm-stats link →
DocVQA 92.8% self-reported llm-stats link →
EgoSchema 72.2% self-reported llm-stats link →
ERQA 35.2% self-reported llm-stats link →
GPQA 70.1% self-reported llm-stats link →
Graphwalks BFS <128k 41.7% self-reported llm-stats link →
Graphwalks parents <128k 35.4% self-reported llm-stats link →
Humanity's Last Exam 5.3% self-reported llm-stats link →
IFEval 81.0% self-reported llm-stats link →
Internal API instruction following (hard) 29.2% self-reported llm-stats link →
MathVista 61.4% self-reported llm-stats link →
MMLU 85.7% self-reported llm-stats link →
MMLU-Pro 74.7% self-reported llm-stats link →
MMMLU 81.4% self-reported llm-stats link →
MMMU 72.2% self-reported llm-stats link →
MMMU-Pro 59.9% self-reported llm-stats link →
Multi-Challenge 40.3% self-reported llm-stats link →
Multi-IF 60.9% self-reported llm-stats link →
MultiChallenge (o3-mini grader) 39.9% self-reported llm-stats link →
OpenAI-MRCR: 2 needle 128k 31.9% self-reported llm-stats link →
Scale MultiChallenge 40.3% self-reported llm-stats link →
SimpleQA 38.2% self-reported llm-stats link →
SWE-Bench Verified 33.2% self-reported llm-stats link →
SWE-Lancer 32.6% self-reported llm-stats link →
SWE-Lancer (IC-Diamond subset) 12.4% self-reported llm-stats link →
TAU-bench Airline 42.8% self-reported llm-stats link →
TAU-bench Retail 60.3% self-reported llm-stats link →
Tau2 Airline 45.5% self-reported llm-stats link →
Tau2 Retail 63.4% self-reported llm-stats link →
Tau2 Telecom 23.5% self-reported llm-stats link →
VideoMMMU 61.2% self-reported llm-stats link →