GPT-4.1 mini
GPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot | 34.7% | self-reported llm-stats | link → |
| Aider-Polyglot Edit | 31.6% | self-reported llm-stats | link → |
| AIME 2024 | 49.6% | self-reported llm-stats | link → |
| AIME 2025 | 40.2% | self-reported llm-stats | link → |
| CharXiv-D | 88.4% | self-reported llm-stats | link → |
| CharXiv-R | 56.8% | self-reported llm-stats | link → |
| COLLIE | 54.6% | self-reported llm-stats | link → |
| ComplexFuncBench | 49.3% | self-reported llm-stats | link → |
| GPQA | 65.0% | self-reported llm-stats | link → |
| Graphwalks BFS <128k | 61.7% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 15.0% | self-reported llm-stats | link → |
| Graphwalks parents <128k | 60.5% | self-reported llm-stats | link → |
| Graphwalks parents >128k | 11.0% | self-reported llm-stats | link → |
| HMMT 2025 | 35.0% | self-reported llm-stats | link → |
| Humanity's Last Exam | 3.7% | self-reported llm-stats | link → |
| IFEval | 84.1% | self-reported llm-stats | link → |
| Internal API instruction following (hard) | 45.1% | self-reported llm-stats | link → |
| MathVista | 73.1% | self-reported llm-stats | link → |
| MMLU | 87.5% | self-reported llm-stats | link → |
| MMMLU | 78.5% | self-reported llm-stats | link → |
| MMMU | 72.7% | self-reported llm-stats | link → |
| Multi-Challenge | 35.8% | self-reported llm-stats | link → |
| Multi-IF | 67.0% | self-reported llm-stats | link → |
| MultiChallenge (o3-mini grader) | 42.2% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 128k | 47.2% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 1M | 33.3% | self-reported llm-stats | link → |
| SWE-Bench Verified | 23.6% | self-reported llm-stats | link → |
| TAU-bench Airline | 36.0% | self-reported llm-stats | link → |
| TAU-bench Retail | 55.8% | self-reported llm-stats | link → |