DeepSeek-V3.1

DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.

Benchmark results

Benchmark	Score	Tags	Source
Aider-Polyglot	68.4%	self-reported llm-stats	link →
AIME 2024	66.3%	self-reported llm-stats	link →
AIME 2025	49.8%	self-reported llm-stats	link →
BrowseComp	30.0%	self-reported llm-stats	link →
BrowseComp-zh	49.2%	self-reported llm-stats	link →
CodeForces	69.7%	self-reported llm-stats	link →
GPQA	74.9%	self-reported llm-stats	link →
HMMT 2025	33.5%	self-reported llm-stats	link →
Humanity's Last Exam	15.9%	self-reported llm-stats	link →
LiveCodeBench	56.4%	self-reported llm-stats	link →
MMLU-Pro	83.7%	self-reported llm-stats	link →
MMLU-Redux	91.8%	self-reported llm-stats	link →
SimpleQA	93.4%	self-reported llm-stats	link →
SWE-bench Multilingual	54.5%	self-reported llm-stats	link →
SWE-Bench Verified	66.0%	self-reported llm-stats	link →
Terminal-Bench	31.3%	self-reported llm-stats	link →