DeepSeek-V4-Flash-Max

DeepSeek-V4-Flash-Max is the maximum reasoning effort mode of DeepSeek-V4-Flash, a 284B-parameter MoE model with 13B activated parameters and a 1M-token context window. Sharing the V4 series' hybrid attention architecture (Compressed Sparse Attention combined with Heavily Compressed Attention), Manifold-Constrained Hyper-Connections, and Muon optimizer, V4-Flash-Max delivers reasoning performance comparable to V4-Pro when given a larger thinking budget while operating at a fraction of the parameter scale. It is pre-trained on more than 32T tokens and post-trained with a two-stage paradigm of domain-specific expert cultivation followed by on-policy distillation.

Benchmark results

Benchmark Score Tags Source
BrowseComp 73.2% self-reported llm-stats link →
CodeForces 100.0% self-reported llm-stats link →
CorpusQA 1M 60.5% self-reported llm-stats link →
CSimpleQA 78.9% self-reported llm-stats link →
GDPval-AA 1,395 self-reported llm-stats link →
GPQA 88.1% self-reported llm-stats link →
HMMT Feb 26 94.8% self-reported llm-stats link →
Humanity's Last Exam 45.1% self-reported llm-stats link →
IMO-AnswerBench 88.4% self-reported llm-stats link →
LiveCodeBench 91.6% self-reported llm-stats link →
MathArena Apex 85.7% self-reported llm-stats link →
MCP Atlas 69.0% self-reported llm-stats link →
MMLU-Pro 86.2% self-reported llm-stats link →
MRCR 1M 78.7% self-reported llm-stats link →
SimpleQA 34.1% self-reported llm-stats link →
SWE-bench Multilingual 73.3% self-reported llm-stats link →
SWE-Bench Pro 52.6% self-reported llm-stats link →
SWE-Bench Verified 79.0% self-reported llm-stats link →
Terminal-Bench 2.0 56.9% self-reported llm-stats link →
Toolathlon 47.8% self-reported llm-stats link →