GPT-4.1
GPT-4.1 is OpenAI's latest and most advanced flagship model, significantly improving upon GPT-4 Turbo in performance across benchmarks, speed, and cost-effectiveness.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot | 51.6% | self-reported llm-stats | link → |
| Aider-Polyglot Edit | 52.9% | self-reported llm-stats | link → |
| AIME 2024 | 48.1% | self-reported llm-stats | link → |
| AIME 2025 | 46.4% | self-reported llm-stats | link → |
| CharXiv-D | 87.9% | self-reported llm-stats | link → |
| CharXiv-R | 56.7% | self-reported llm-stats | link → |
| COLLIE | 65.8% | self-reported llm-stats | link → |
| ComplexFuncBench | 65.5% | self-reported llm-stats | link → |
| GPQA | 66.3% | self-reported llm-stats | link → |
| Graphwalks BFS <128k | 61.7% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 19.0% | self-reported llm-stats | link → |
| Graphwalks parents <128k | 58.0% | self-reported llm-stats | link → |
| Graphwalks parents >128k | 25.0% | self-reported llm-stats | link → |
| HMMT 2025 | 28.9% | self-reported llm-stats | link → |
| Humanity's Last Exam | 5.4% | self-reported llm-stats | link → |
| IFEval | 87.4% | self-reported llm-stats | link → |
| Internal API instruction following (hard) | 49.1% | self-reported llm-stats | link → |
| MathVista | 72.2% | self-reported llm-stats | link → |
| MMLU | 90.2% | self-reported llm-stats | link → |
| MMMLU | 87.3% | self-reported llm-stats | link → |
| MMMU | 74.8% | self-reported llm-stats | link → |
| Multi-Challenge | 38.3% | self-reported llm-stats | link → |
| Multi-IF | 70.8% | self-reported llm-stats | link → |
| MultiChallenge (o3-mini grader) | 46.2% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 128k | 57.2% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 1M | 46.3% | self-reported llm-stats | link → |
| SWE-Bench Verified | 54.6% | self-reported llm-stats | link → |
| TAU-bench Airline | 49.4% | self-reported llm-stats | link → |
| TAU-bench Retail | 68.0% | self-reported llm-stats | link → |
| Video-MME (long, no subtitles) | 72.0% | self-reported llm-stats | link → |