Phi-3.5-MoE-instruct
Phi-3.5-MoE-instruct is a mixture-of-experts model with ~42B total parameters (6.6B active) and a 128K context window. It excels at reasoning, math, coding, and multilingual tasks, outperforming larger dense models in many benchmarks. It underwent a thorough safety post-training process (SFT + DPO) and is licensed under MIT. This model is ideal for scenarios where efficiency and high performance are both required, particularly in multi-lingual or reasoning-intensive tasks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ARC-C | 91.0% | self-reported llm-stats | link → |
| Arena Hard | 37.9% | self-reported llm-stats | link → |
| BIG-Bench Hard | 79.1% | self-reported llm-stats | link → |
| BoolQ | 84.6% | self-reported llm-stats | link → |
| GovReport | 26.4% | self-reported llm-stats | link → |
| GPQA | 36.8% | self-reported llm-stats | link → |
| GSM8k | 88.7% | self-reported llm-stats | link → |
| HellaSwag | 83.8% | self-reported llm-stats | link → |
| HumanEval | 70.7% | self-reported llm-stats | link → |
| MATH | 59.5% | self-reported llm-stats | link → |
| MBPP | 80.8% | self-reported llm-stats | link → |
| MEGA MLQA | 65.3% | self-reported llm-stats | link → |
| MEGA TyDi QA | 67.1% | self-reported llm-stats | link → |
| MEGA UDPOS | 60.4% | self-reported llm-stats | link → |
| MEGA XCOPA | 76.6% | self-reported llm-stats | link → |
| MEGA XStoryCloze | 82.8% | self-reported llm-stats | link → |
| MGSM | 58.7% | self-reported llm-stats | link → |
| MMLU | 78.9% | self-reported llm-stats | link → |
| MMLU-Pro | 45.3% | self-reported llm-stats | link → |
| MMMLU | 69.9% | self-reported llm-stats | link → |
| OpenBookQA | 89.6% | self-reported llm-stats | link → |
| PIQA | 88.6% | self-reported llm-stats | link → |
| Qasper | 40.0% | self-reported llm-stats | link → |
| QMSum | 19.9% | self-reported llm-stats | link → |
| RepoQA | 85.0% | self-reported llm-stats | link → |
| RULER | 87.1% | self-reported llm-stats | link → |
| Social IQa | 78.0% | self-reported llm-stats | link → |
| SQuALITY | 24.1% | self-reported llm-stats | link → |
| SummScreenFD | 16.9% | self-reported llm-stats | link → |
| TruthfulQA | 77.5% | self-reported llm-stats | link → |
| Winogrande | 81.3% | self-reported llm-stats | link → |