Phi 4 Reasoning Plus

Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning and reinforcement learning. It focuses on math, science, and coding skills. This 'plus' version has higher accuracy due to additional RL training but may have higher latency.

Benchmark results

Benchmark Score Tags Source
AIME 2024 81.3% self-reported llm-stats link →
AIME 2025 78.0% self-reported llm-stats link →
Arena Hard 79.0% self-reported llm-stats link →
FlenQA 97.9% self-reported llm-stats link →
GPQA 68.9% self-reported llm-stats link →
HumanEval+ 92.3% self-reported llm-stats link →
IFEval 84.9% self-reported llm-stats link →
LiveCodeBench 53.1% self-reported llm-stats link →
MMLU-Pro 76.0% self-reported llm-stats link →
OmniMath 81.9% self-reported llm-stats link →
PhiBench 74.2% self-reported llm-stats link →