a month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang

Abstract

We introduce our first-generation reasoning models, DeepSeek-R1-Zero andDeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcementlearning (RL) without supervised fine-tuning (SFT) as a preliminary step,demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zeronaturally emerges with numerous powerful and intriguing reasoning behaviors.However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, weintroduce DeepSeek-R1, which incorporates multi-stage training and cold-startdata before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217on reasoning tasks. To support the research community, we open-sourceDeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B,70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Code Repositories

deepseek-ai/deepseek-r1

Official

Mentioned in GitHub

turningpoint-ai/visualthinker-r1-zero

pytorch

Mentioned in GitHub

vlm-rl/ocean-r1

pytorch

Mentioned in GitHub

zhaoolee/garss

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
mathematical-reasoning-on-aime24	DeepSeek-r1	Acc: 79.8
multi-task-language-understanding-on-mmlu	ds-r1(671b)	Average (%): 87.5
question-answering-on-newsqa	deepseek-r1	EM: 80.57 F1: 86.13

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang190 more

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

DeepSeek-AI Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang