Phi 4

phi-4 is a state-of-the-art open model built to excel at advanced reasoning, coding, and knowledge tasks. It leverages a blend of synthetic data, filtered web data, academic texts, and supervised fine-tuning for precision, alignment, and safety.

Benchmark results

Benchmark Score Tags Source
Arena Hard 75.4% self-reported llm-stats link →
DROP 75.5% self-reported llm-stats link →
GPQA 56.1% self-reported llm-stats link →
HumanEval 82.6% self-reported llm-stats link →
HumanEval+ 82.8% self-reported llm-stats link →
IFEval 63.0% self-reported llm-stats link →
LiveBench 47.6% self-reported llm-stats link →
MATH 80.4% self-reported llm-stats link →
MGSM 80.6% self-reported llm-stats link →
MMLU 84.8% self-reported llm-stats link →
MMLU-Pro 70.4% self-reported llm-stats link →
PhiBench 56.2% self-reported llm-stats link →
SimpleQA 3.0% self-reported llm-stats link →