DeepSeek-V3 0324

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.

Benchmark results

Benchmark Score Tags Source
AIME 2024 59.4% self-reported llm-stats link →
GPQA 68.4% self-reported llm-stats link →
LiveCodeBench 49.2% self-reported llm-stats link →
MATH-500 94.0% self-reported llm-stats link →
MMLU-Pro 81.2% self-reported llm-stats link →