Command Palette
Search for a command to run...
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero andDeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcementlearning (RL) without supervised fine-tuning (SFT) as a preliminary step,demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zeronaturally emerges with numerous powerful and intriguing reasoning behaviors.However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, weintroduce DeepSeek-R1, which incorporates multi-stage training and cold-startdata before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217on reasoning tasks. To support the research community, we open-sourceDeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B,70B) distilled from DeepSeek-R1 based on Qwen and Llama.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| mathematical-reasoning-on-aime24 | DeepSeek-r1 | Acc: 79.8 |
| multi-task-language-understanding-on-mmlu | ds-r1(671b) | Average (%): 87.5 |
| question-answering-on-newsqa | deepseek-r1 | EM: 80.57 F1: 86.13 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.