Qwen2.5-Coder 7B Instruct
Qwen2.5-Coder is a specialized coding model trained on 5.5 trillion tokens of code data, supporting 92 programming languages with a 128K context window. It excels in code generation, completion, and repair while maintaining strong performance in math and general tasks. The model demonstrates exceptional capabilities in multi-programming language tasks and code reasoning.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider | 55.6% | self-reported llm-stats | link → |
| ARC-C | 60.9% | self-reported llm-stats | link → |
| BigCodeBench | 41.0% | self-reported llm-stats | link → |
| CRUXEval-Input-CoT | 56.5% | self-reported llm-stats | link → |
| CRUXEval-Output-CoT | 56.0% | self-reported llm-stats | link → |
| GSM8k | 83.9% | self-reported llm-stats | link → |
| HellaSwag | 76.8% | self-reported llm-stats | link → |
| HumanEval | 88.4% | self-reported llm-stats | link → |
| LiveCodeBench | 18.2% | self-reported llm-stats | link → |
| MATH | 46.6% | self-reported llm-stats | link → |
| MBPP | 83.5% | self-reported llm-stats | link → |
| MMLU | 67.6% | self-reported llm-stats | link → |
| MMLU-Base | 68.0% | self-reported llm-stats | link → |
| MMLU-Pro | 40.1% | self-reported llm-stats | link → |
| MMLU-Redux | 66.6% | self-reported llm-stats | link → |
| STEM | 34.0% | self-reported llm-stats | link → |
| TheoremQA | 34.0% | self-reported llm-stats | link → |
| TruthfulQA | 50.6% | self-reported llm-stats | link → |
| Winogrande | 72.9% | self-reported llm-stats | link → |