DeepSeek-V3

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.

Benchmark results

Benchmark Score Tags Source
Aider-Polyglot 49.6% self-reported llm-stats link →
Aider-Polyglot Edit 79.7% self-reported llm-stats link →
AIME 2024 39.2% self-reported llm-stats link →
C-Eval 86.5% self-reported llm-stats link →
CLUEWSC 90.9% self-reported llm-stats link →
CNMO 2024 43.2% self-reported llm-stats link →
CSimpleQA 64.8% self-reported llm-stats link →
DROP 91.6% self-reported llm-stats link →
FRAMES 73.3% self-reported llm-stats link →
GPQA 59.1% self-reported llm-stats link →
HumanEval-Mul 82.6% self-reported llm-stats link →
IFEval 86.1% self-reported llm-stats link →
LiveCodeBench 37.6% self-reported llm-stats link →
LongBench v2 48.7% self-reported llm-stats link →
MATH-500 90.2% self-reported llm-stats link →
MMLU 88.5% self-reported llm-stats link →
MMLU-Pro 75.9% self-reported llm-stats link →
MMLU-Redux 89.1% self-reported llm-stats link →
SimpleQA 24.9% self-reported llm-stats link →
SWE-Bench Verified 42.0% self-reported llm-stats link →