HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
  Reinforcement Learning

Abstract

We introduce our first-generation reasoning models, DeepSeek-R1-Zero andDeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcementlearning (RL) without supervised fine-tuning (SFT) as a preliminary step,demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zeronaturally emerges with numerous powerful and intriguing reasoning behaviors.However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, weintroduce DeepSeek-R1, which incorporates multi-stage training and cold-startdata before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217on reasoning tasks. To support the research community, we open-sourceDeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B,70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Code Repositories

deepseek-ai/deepseek-r1
Official
Mentioned in GitHub
turningpoint-ai/visualthinker-r1-zero
pytorch
Mentioned in GitHub
vlm-rl/ocean-r1
pytorch
Mentioned in GitHub
zhaoolee/garss
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
mathematical-reasoning-on-aime24DeepSeek-r1
Acc: 79.8
multi-task-language-understanding-on-mmluds-r1(671b)
Average (%): 87.5
question-answering-on-newsqadeepseek-r1
EM: 80.57
F1: 86.13

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | Papers | HyperAI