Phi-3.5-mini-instruct
Phi-3.5-mini-instruct is a 3.8B-parameter model that supports up to 128K context tokens, with improved multilingual capabilities across over 20 languages. It underwent additional training and safety post-training to enhance instruction-following, reasoning, math, and code generation. Ideal for environments with memory or latency constraints, it uses an MIT license.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ARC-C | 84.6% | self-reported llm-stats | link → |
| Arena Hard | 37.0% | self-reported llm-stats | link → |
| BIG-Bench Hard | 69.0% | self-reported llm-stats | link → |
| BoolQ | 78.0% | self-reported llm-stats | link → |
| GovReport | 25.9% | self-reported llm-stats | link → |
| GPQA | 30.4% | self-reported llm-stats | link → |
| GSM8k | 86.2% | self-reported llm-stats | link → |
| HellaSwag | 69.4% | self-reported llm-stats | link → |
| HumanEval | 62.8% | self-reported llm-stats | link → |
| MATH | 48.5% | self-reported llm-stats | link → |
| MBPP | 69.6% | self-reported llm-stats | link → |
| MEGA MLQA | 61.7% | self-reported llm-stats | link → |
| MEGA TyDi QA | 62.2% | self-reported llm-stats | link → |
| MEGA UDPOS | 46.5% | self-reported llm-stats | link → |
| MEGA XCOPA | 63.1% | self-reported llm-stats | link → |
| MEGA XStoryCloze | 73.5% | self-reported llm-stats | link → |
| MGSM | 47.9% | self-reported llm-stats | link → |
| MMLU | 69.0% | self-reported llm-stats | link → |
| MMLU-Pro | 47.4% | self-reported llm-stats | link → |
| MMMLU | 55.4% | self-reported llm-stats | link → |
| OpenBookQA | 79.2% | self-reported llm-stats | link → |
| PIQA | 81.0% | self-reported llm-stats | link → |
| Qasper | 41.9% | self-reported llm-stats | link → |
| QMSum | 21.3% | self-reported llm-stats | link → |
| RepoQA | 77.0% | self-reported llm-stats | link → |
| RULER | 84.1% | self-reported llm-stats | link → |
| Social IQa | 74.7% | self-reported llm-stats | link → |
| SQuALITY | 24.3% | self-reported llm-stats | link → |
| SummScreenFD | 16.0% | self-reported llm-stats | link → |
| TruthfulQA | 64.0% | self-reported llm-stats | link → |
| Winogrande | 68.5% | self-reported llm-stats | link → |