Phi 4
phi-4 is a state-of-the-art open model built to excel at advanced reasoning, coding, and knowledge tasks. It leverages a blend of synthetic data, filtered web data, academic texts, and supervised fine-tuning for precision, alignment, and safety.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Arena Hard | 75.4% | self-reported llm-stats | link → |
| DROP | 75.5% | self-reported llm-stats | link → |
| GPQA | 56.1% | self-reported llm-stats | link → |
| HumanEval | 82.6% | self-reported llm-stats | link → |
| HumanEval+ | 82.8% | self-reported llm-stats | link → |
| IFEval | 63.0% | self-reported llm-stats | link → |
| LiveBench | 47.6% | self-reported llm-stats | link → |
| MATH | 80.4% | self-reported llm-stats | link → |
| MGSM | 80.6% | self-reported llm-stats | link → |
| MMLU | 84.8% | self-reported llm-stats | link → |
| MMLU-Pro | 70.4% | self-reported llm-stats | link → |
| PhiBench | 56.2% | self-reported llm-stats | link → |
| SimpleQA | 3.0% | self-reported llm-stats | link → |