DeepSeek-V2.5

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.

Benchmark results

Benchmark Score Tags Source
Aider 72.2% self-reported llm-stats link →
AlignBench 80.4% self-reported llm-stats link →
AlpacaEval 2.0 50.5% self-reported llm-stats link →
Arena Hard 76.2% self-reported llm-stats link →
BBH 84.3% self-reported llm-stats link →
DS-Arena-Code 63.1% self-reported llm-stats link →
DS-FIM-Eval 78.3% self-reported llm-stats link →
GSM8k 95.1% self-reported llm-stats link →
HumanEval 89.0% self-reported llm-stats link →
HumanEval-Mul 73.8% self-reported llm-stats link →
LiveCodeBench(01-09) 41.8% self-reported llm-stats link →
MATH 74.7% self-reported llm-stats link →
MMLU 80.4% self-reported llm-stats link →
MT-Bench 0.902 self-reported llm-stats link →
SWE-Bench Verified 16.8% self-reported llm-stats link →