Qwen2.5 32B Instruct

Qwen2.5-32B-Instruct is an instruction-tuned 32 billion parameter language model, part of the Qwen2.5 series. It is designed to follow instructions, generate long texts (over 8K tokens), understand structured data (e.g., tables), and generate structured outputs, especially JSON. The model supports multilingual capabilities across over 29 languages.

Benchmark results

Benchmark Score Tags Source
ARC-C 70.4% self-reported llm-stats link →
BBH 84.5% self-reported llm-stats link →
GPQA 49.5% self-reported llm-stats link →
GSM8k 95.9% self-reported llm-stats link →
HellaSwag 85.2% self-reported llm-stats link →
HumanEval 88.4% self-reported llm-stats link →
HumanEval+ 52.4% self-reported llm-stats link →
MATH 83.1% self-reported llm-stats link →
MBPP 84.0% self-reported llm-stats link →
MBPP+ 67.2% self-reported llm-stats link →
MMLU 83.3% self-reported llm-stats link →
MMLU-Pro 69.0% self-reported llm-stats link →
MMLU-Redux 83.9% self-reported llm-stats link →
MMLU-STEM 80.9% self-reported llm-stats link →
MultiPL-E 75.4% self-reported llm-stats link →
TheoremQA 44.1% self-reported llm-stats link →
TruthfulQA 57.8% self-reported llm-stats link →
Winogrande 82.0% self-reported llm-stats link →