DeepSeek-R1-0528
DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot | 71.6% | self-reported llm-stats | link → |
| AIME 2024 | 91.4% | self-reported llm-stats | link → |
| AIME 2025 | 87.5% | self-reported llm-stats | link → |
| BrowseComp | 8.9% | self-reported llm-stats | link → |
| BrowseComp-zh | 35.7% | self-reported llm-stats | link → |
| CodeForces | 64.3% | self-reported llm-stats | link → |
| GPQA | 81.0% | self-reported llm-stats | link → |
| HMMT 2025 | 79.4% | self-reported llm-stats | link → |
| Humanity's Last Exam | 17.7% | self-reported llm-stats | link → |
| LiveCodeBench | 73.3% | self-reported llm-stats | link → |
| MMLU-Pro | 85.0% | self-reported llm-stats | link → |
| MMLU-Redux | 93.4% | self-reported llm-stats | link → |
| SimpleQA | 92.3% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 30.5% | self-reported llm-stats | link → |
| SWE-Bench Verified | 44.6% | self-reported llm-stats | link → |
| Terminal-Bench | 5.7% | self-reported llm-stats | link → |