DeepSeek-V2.5
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider | 72.2% | self-reported llm-stats | link → |
| AlignBench | 80.4% | self-reported llm-stats | link → |
| AlpacaEval 2.0 | 50.5% | self-reported llm-stats | link → |
| Arena Hard | 76.2% | self-reported llm-stats | link → |
| BBH | 84.3% | self-reported llm-stats | link → |
| DS-Arena-Code | 63.1% | self-reported llm-stats | link → |
| DS-FIM-Eval | 78.3% | self-reported llm-stats | link → |
| GSM8k | 95.1% | self-reported llm-stats | link → |
| HumanEval | 89.0% | self-reported llm-stats | link → |
| HumanEval-Mul | 73.8% | self-reported llm-stats | link → |
| LiveCodeBench(01-09) | 41.8% | self-reported llm-stats | link → |
| MATH | 74.7% | self-reported llm-stats | link → |
| MMLU | 80.4% | self-reported llm-stats | link → |
| MT-Bench | 0.902 | self-reported llm-stats | link → |
| SWE-Bench Verified | 16.8% | self-reported llm-stats | link → |