Qwen2.5 14B Instruct
Qwen2.5-14B-Instruct is an instruction-tuned 14.7B parameter language model, part of the Qwen2.5 series. It features significant improvements in instruction following, long text generation (8K+ tokens), structured data understanding, and JSON output generation. The model supports a 128K token context length and multilingual capabilities across 29+ languages including Chinese, English, French, Spanish, and more.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ARC-C | 67.3% | self-reported llm-stats | link → |
| BBH | 78.2% | self-reported llm-stats | link → |
| GPQA | 45.5% | self-reported llm-stats | link → |
| GSM8k | 94.8% | self-reported llm-stats | link → |
| HumanEval | 83.5% | self-reported llm-stats | link → |
| HumanEval+ | 51.2% | self-reported llm-stats | link → |
| MATH | 80.0% | self-reported llm-stats | link → |
| MBPP | 82.0% | self-reported llm-stats | link → |
| MBPP+ | 63.2% | self-reported llm-stats | link → |
| MMLU | 79.7% | self-reported llm-stats | link → |
| MMLU-Pro | 63.7% | self-reported llm-stats | link → |
| MMLU-Redux | 80.0% | self-reported llm-stats | link → |
| MMLU-STEM | 76.4% | self-reported llm-stats | link → |
| MultiPL-E | 72.8% | self-reported llm-stats | link → |
| TheoremQA | 43.0% | self-reported llm-stats | link → |
| TruthfulQA | 58.4% | self-reported llm-stats | link → |