GPT-4.1 mini

GPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost.

Benchmark results

Benchmark Score Tags Source
Aider-Polyglot 34.7% self-reported llm-stats link →
Aider-Polyglot Edit 31.6% self-reported llm-stats link →
AIME 2024 49.6% self-reported llm-stats link →
AIME 2025 40.2% self-reported llm-stats link →
CharXiv-D 88.4% self-reported llm-stats link →
CharXiv-R 56.8% self-reported llm-stats link →
COLLIE 54.6% self-reported llm-stats link →
ComplexFuncBench 49.3% self-reported llm-stats link →
GPQA 65.0% self-reported llm-stats link →
Graphwalks BFS <128k 61.7% self-reported llm-stats link →
Graphwalks BFS >128k 15.0% self-reported llm-stats link →
Graphwalks parents <128k 60.5% self-reported llm-stats link →
Graphwalks parents >128k 11.0% self-reported llm-stats link →
HMMT 2025 35.0% self-reported llm-stats link →
Humanity's Last Exam 3.7% self-reported llm-stats link →
IFEval 84.1% self-reported llm-stats link →
Internal API instruction following (hard) 45.1% self-reported llm-stats link →
MathVista 73.1% self-reported llm-stats link →
MMLU 87.5% self-reported llm-stats link →
MMMLU 78.5% self-reported llm-stats link →
MMMU 72.7% self-reported llm-stats link →
Multi-Challenge 35.8% self-reported llm-stats link →
Multi-IF 67.0% self-reported llm-stats link →
MultiChallenge (o3-mini grader) 42.2% self-reported llm-stats link →
OpenAI-MRCR: 2 needle 128k 47.2% self-reported llm-stats link →
OpenAI-MRCR: 2 needle 1M 33.3% self-reported llm-stats link →
SWE-Bench Verified 23.6% self-reported llm-stats link →
TAU-bench Airline 36.0% self-reported llm-stats link →
TAU-bench Retail 55.8% self-reported llm-stats link →