Phi 4 Reasoning

Phi-4-reasoning is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning. It focuses on math, science, and coding skills.

Benchmark results

Benchmark Score Tags Source
AIME 2024 75.3% self-reported llm-stats link →
AIME 2025 62.9% self-reported llm-stats link →
Arena Hard 73.3% self-reported llm-stats link →
FlenQA 97.7% self-reported llm-stats link →
GPQA 65.8% self-reported llm-stats link →
HumanEval+ 92.9% self-reported llm-stats link →
IFEval 83.4% self-reported llm-stats link →
LiveCodeBench 53.8% self-reported llm-stats link →
MMLU-Pro 74.3% self-reported llm-stats link →
OmniMath 76.6% self-reported llm-stats link →
PhiBench 70.6% self-reported llm-stats link →