Phi 4 Reasoning
Phi-4-reasoning is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning. It focuses on math, science, and coding skills.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 75.3% | self-reported llm-stats | link → |
| AIME 2025 | 62.9% | self-reported llm-stats | link → |
| Arena Hard | 73.3% | self-reported llm-stats | link → |
| FlenQA | 97.7% | self-reported llm-stats | link → |
| GPQA | 65.8% | self-reported llm-stats | link → |
| HumanEval+ | 92.9% | self-reported llm-stats | link → |
| IFEval | 83.4% | self-reported llm-stats | link → |
| LiveCodeBench | 53.8% | self-reported llm-stats | link → |
| MMLU-Pro | 74.3% | self-reported llm-stats | link → |
| OmniMath | 76.6% | self-reported llm-stats | link → |
| PhiBench | 70.6% | self-reported llm-stats | link → |