Phi-3.5-MoE-instruct

Phi-3.5-MoE-instruct is a mixture-of-experts model with ~42B total parameters (6.6B active) and a 128K context window. It excels at reasoning, math, coding, and multilingual tasks, outperforming larger dense models in many benchmarks. It underwent a thorough safety post-training process (SFT + DPO) and is licensed under MIT. This model is ideal for scenarios where efficiency and high performance are both required, particularly in multi-lingual or reasoning-intensive tasks.

Benchmark results

Benchmark Score Tags Source
ARC-C 91.0% self-reported llm-stats link →
Arena Hard 37.9% self-reported llm-stats link →
BIG-Bench Hard 79.1% self-reported llm-stats link →
BoolQ 84.6% self-reported llm-stats link →
GovReport 26.4% self-reported llm-stats link →
GPQA 36.8% self-reported llm-stats link →
GSM8k 88.7% self-reported llm-stats link →
HellaSwag 83.8% self-reported llm-stats link →
HumanEval 70.7% self-reported llm-stats link →
MATH 59.5% self-reported llm-stats link →
MBPP 80.8% self-reported llm-stats link →
MEGA MLQA 65.3% self-reported llm-stats link →
MEGA TyDi QA 67.1% self-reported llm-stats link →
MEGA UDPOS 60.4% self-reported llm-stats link →
MEGA XCOPA 76.6% self-reported llm-stats link →
MEGA XStoryCloze 82.8% self-reported llm-stats link →
MGSM 58.7% self-reported llm-stats link →
MMLU 78.9% self-reported llm-stats link →
MMLU-Pro 45.3% self-reported llm-stats link →
MMMLU 69.9% self-reported llm-stats link →
OpenBookQA 89.6% self-reported llm-stats link →
PIQA 88.6% self-reported llm-stats link →
Qasper 40.0% self-reported llm-stats link →
QMSum 19.9% self-reported llm-stats link →
RepoQA 85.0% self-reported llm-stats link →
RULER 87.1% self-reported llm-stats link →
Social IQa 78.0% self-reported llm-stats link →
SQuALITY 24.1% self-reported llm-stats link →
SummScreenFD 16.9% self-reported llm-stats link →
TruthfulQA 77.5% self-reported llm-stats link →
Winogrande 81.3% self-reported llm-stats link →