Phi-3.5-mini-instruct

Phi-3.5-mini-instruct is a 3.8B-parameter model that supports up to 128K context tokens, with improved multilingual capabilities across over 20 languages. It underwent additional training and safety post-training to enhance instruction-following, reasoning, math, and code generation. Ideal for environments with memory or latency constraints, it uses an MIT license.

Benchmark results

Benchmark Score Tags Source
ARC-C 84.6% self-reported llm-stats link →
Arena Hard 37.0% self-reported llm-stats link →
BIG-Bench Hard 69.0% self-reported llm-stats link →
BoolQ 78.0% self-reported llm-stats link →
GovReport 25.9% self-reported llm-stats link →
GPQA 30.4% self-reported llm-stats link →
GSM8k 86.2% self-reported llm-stats link →
HellaSwag 69.4% self-reported llm-stats link →
HumanEval 62.8% self-reported llm-stats link →
MATH 48.5% self-reported llm-stats link →
MBPP 69.6% self-reported llm-stats link →
MEGA MLQA 61.7% self-reported llm-stats link →
MEGA TyDi QA 62.2% self-reported llm-stats link →
MEGA UDPOS 46.5% self-reported llm-stats link →
MEGA XCOPA 63.1% self-reported llm-stats link →
MEGA XStoryCloze 73.5% self-reported llm-stats link →
MGSM 47.9% self-reported llm-stats link →
MMLU 69.0% self-reported llm-stats link →
MMLU-Pro 47.4% self-reported llm-stats link →
MMMLU 55.4% self-reported llm-stats link →
OpenBookQA 79.2% self-reported llm-stats link →
PIQA 81.0% self-reported llm-stats link →
Qasper 41.9% self-reported llm-stats link →
QMSum 21.3% self-reported llm-stats link →
RepoQA 77.0% self-reported llm-stats link →
RULER 84.1% self-reported llm-stats link →
Social IQa 74.7% self-reported llm-stats link →
SQuALITY 24.3% self-reported llm-stats link →
SummScreenFD 16.0% self-reported llm-stats link →
TruthfulQA 64.0% self-reported llm-stats link →
Winogrande 68.5% self-reported llm-stats link →