Phi 4 Reasoning Plus
Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning and reinforcement learning. It focuses on math, science, and coding skills. This 'plus' version has higher accuracy due to additional RL training but may have higher latency.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 81.3% | self-reported llm-stats | link → |
| AIME 2025 | 78.0% | self-reported llm-stats | link → |
| Arena Hard | 79.0% | self-reported llm-stats | link → |
| FlenQA | 97.9% | self-reported llm-stats | link → |
| GPQA | 68.9% | self-reported llm-stats | link → |
| HumanEval+ | 92.3% | self-reported llm-stats | link → |
| IFEval | 84.9% | self-reported llm-stats | link → |
| LiveCodeBench | 53.1% | self-reported llm-stats | link → |
| MMLU-Pro | 76.0% | self-reported llm-stats | link → |
| OmniMath | 81.9% | self-reported llm-stats | link → |
| PhiBench | 74.2% | self-reported llm-stats | link → |