Qwen2.5-Coder 7B Instruct

Qwen2.5-Coder is a specialized coding model trained on 5.5 trillion tokens of code data, supporting 92 programming languages with a 128K context window. It excels in code generation, completion, and repair while maintaining strong performance in math and general tasks. The model demonstrates exceptional capabilities in multi-programming language tasks and code reasoning.

Benchmark results

Benchmark Score Tags Source
Aider 55.6% self-reported llm-stats link →
ARC-C 60.9% self-reported llm-stats link →
BigCodeBench 41.0% self-reported llm-stats link →
CRUXEval-Input-CoT 56.5% self-reported llm-stats link →
CRUXEval-Output-CoT 56.0% self-reported llm-stats link →
GSM8k 83.9% self-reported llm-stats link →
HellaSwag 76.8% self-reported llm-stats link →
HumanEval 88.4% self-reported llm-stats link →
LiveCodeBench 18.2% self-reported llm-stats link →
MATH 46.6% self-reported llm-stats link →
MBPP 83.5% self-reported llm-stats link →
MMLU 67.6% self-reported llm-stats link →
MMLU-Base 68.0% self-reported llm-stats link →
MMLU-Pro 40.1% self-reported llm-stats link →
MMLU-Redux 66.6% self-reported llm-stats link →
STEM 34.0% self-reported llm-stats link →
TheoremQA 34.0% self-reported llm-stats link →
TruthfulQA 50.6% self-reported llm-stats link →
Winogrande 72.9% self-reported llm-stats link →