o3-mini
A smaller variant of O3, expected to offer enhanced multimodal capabilities, improved reasoning, and more efficient resource utilization compared to previous models while maintaining strong performance on core tasks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot | 66.7% | self-reported llm-stats | link → |
| Aider-Polyglot Edit | 60.4% | self-reported llm-stats | link → |
| AIME 2024 | 87.3% | self-reported llm-stats | link → |
| COLLIE | 98.7% | self-reported llm-stats | link → |
| ComplexFuncBench | 17.6% | self-reported llm-stats | link → |
| FrontierMath | 9.2% | self-reported llm-stats | link → |
| GPQA | 77.2% | self-reported llm-stats | link → |
| Graphwalks BFS <128k | 51.0% | self-reported llm-stats | link → |
| Graphwalks parents <128k | 58.3% | self-reported llm-stats | link → |
| IFEval | 93.9% | self-reported llm-stats | link → |
| Internal API instruction following (hard) | 50.0% | self-reported llm-stats | link → |
| LiveBench | 84.6% | self-reported llm-stats | link → |
| MATH | 97.9% | self-reported llm-stats | link → |
| MGSM | 92.0% | self-reported llm-stats | link → |
| MMLU | 86.9% | self-reported llm-stats | link → |
| Multi-Challenge | 39.9% | self-reported llm-stats | link → |
| Multi-IF | 79.5% | self-reported llm-stats | link → |
| MultiChallenge (o3-mini grader) | 50.2% | self-reported llm-stats | link → |
| Multilingual MMLU | 80.7% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 128k | 18.7% | self-reported llm-stats | link → |
| SimpleQA | 15.0% | self-reported llm-stats | link → |
| SWE-Bench Verified | 49.3% | self-reported llm-stats | link → |
| SWE-Lancer | 18.0% | self-reported llm-stats | link → |
| SWE-Lancer (IC-Diamond subset) | 7.4% | self-reported llm-stats | link → |
| TAU-bench Airline | 32.4% | self-reported llm-stats | link → |
| TAU-bench Retail | 57.6% | self-reported llm-stats | link → |