DeepSeek R1 Zero

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Benchmark results

Benchmark Score Tags Source
AIME 2024 86.7% self-reported llm-stats link →
GPQA 73.3% self-reported llm-stats link →
LiveCodeBench 50.0% self-reported llm-stats link →
MATH-500 95.9% self-reported llm-stats link →