DeepSeek-V3 0324

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training.

MATH-500

94.0%

i
MMLU-Pro

81.2%

i
GPQA

68.4%

i
AIME 2024

59.4%

i
LiveCodeBench

49.2%

i