DeepSeek-V3

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training.

DROP

91.6%

i
CLUEWSC

90.9%

i
MATH-500

90.2%

i
MMLU-Redux

89.1%

i
MMLU

88.5%

i
C-Eval

86.5%

i
IFEval

86.1%

i
HumanEval-Mul

82.6%

i
Aider-Polyglot Edit

79.7%

i
MMLU-Pro

75.9%

i
FRAMES

73.3%

i
CSimpleQA

64.8%

i
GPQA

59.1%

i
Aider-Polyglot

49.6%

i
LongBench v2

48.7%

i
CNMO 2024

43.2%

i
SWE-Bench Verified

42.0%

i
AIME 2024

39.2%

i
LiveCodeBench

37.6%

i
SimpleQA

24.9%

i