GPT-4o

GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.

Benchmark results

Benchmark Score Tags Source
DROP 83.4% self-reported llm-stats link →
GPQA 53.6% self-reported llm-stats link →
HumanEval 90.2% self-reported llm-stats link →
MATH 76.6% self-reported llm-stats link →
MathVista 63.8% self-reported llm-stats link →
MGSM 90.5% self-reported llm-stats link →
MMLU 88.7% self-reported llm-stats link →
MMLU-Pro 72.6% self-reported llm-stats link →