GPT-4o
GPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| DROP | 83.4% | self-reported llm-stats | link → |
| GPQA | 53.6% | self-reported llm-stats | link → |
| HumanEval | 90.2% | self-reported llm-stats | link → |
| MATH | 76.6% | self-reported llm-stats | link → |
| MathVista | 63.8% | self-reported llm-stats | link → |
| MGSM | 90.5% | self-reported llm-stats | link → |
| MMLU | 88.7% | self-reported llm-stats | link → |
| MMLU-Pro | 72.6% | self-reported llm-stats | link → |