DeepSeek-AIDaya GuoDejian YangHaowei ZhangJunxiao SongRuoyu ZhangRunxin XuQihao ZhuShirong MaPeiyi WangXiao BiXiaokang ZhangXingkai YuYu WuZ. F. WuZhibin GouZhihong ShaoZhuoshu LiZiyi GaoAixin LiuBing XueBingxuan WangBochao WuBei FengChengda LuChenggang ZhaoChengqi DengChenyu ZhangChong RuanDamai DaiDeli ChenDongjie JiErhang LiFangyun LinFucong DaiFuli LuoGuangbo HaoGuanting ChenGuowei LiH. ZhangHan BaoHanwei XuHaocheng WangHonghui DingHuajian XinHuazuo GaoHui QuHui LiJianzhong GuoJiashi LiJiawei WangJingchang ChenJingyang YuanJunjie QiuJunlong LiJ. L. CaiJiaqi NiJian LiangJin ChenKai DongKai HuKaige GaoKang GuanKexin HuangKuai YuLean WangLecong ZhangLiang ZhaoLitong WangLiyue ZhangLei XuLeyi XiaMingchuan ZhangMinghua ZhangMinghui TangMeng LiMiaojun WangMingming LiNing TianPanpan HuangPeng ZhangQiancheng WangQinyu ChenQiushi DuRuiqi GeRuisong ZhangRuizhe PanRunji WangR. J. ChenR. L. JinRuyi ChenShanghao LuShangyan ZhouShanhuang ChenShengfeng YeShiyu WangShuiping YuShunfeng ZhouShuting PanS. S. LiShuang ZhouShaoqing WuShengfeng YeTao YunTian PeiTianyu SunT. WangWangding ZengWanjia ZhaoWen LiuWenfeng LiangWenjun GaoWenqin YuWentao ZhangW. L. XiaoWei AnXiaodong LiuXiaohan WangXiaokang ChenXiaotao NieXin ChengXin LiuXin XieXingchao LiuXinyu YangXinyuan LiXuecheng SuXuheng LinX. Q. LiXiangyue JinXiaojin ShenXiaosha ChenXiaowen SunXiaoxiang WangXinnan SongXinyi ZhouXianzu WangXinxia ShanY. K. LiY. Q. WangY. X. WeiYang ZhangYanhong XuYao LiYao ZhaoYaofeng SunYaohui WangYi YuYichao ZhangYifan ShiYiliang XiongYing HeYishi PiaoYisong WangYixuan TanYiyang MaYiyuan LiuYongqiang GuoYuan OuYuduan WangYue GongYuheng ZouYujia HeYunfan XiongYuxiang LuoYuxiang YouYuxuan LiuYuyang ZhouY. X. ZhuYanhong XuYanping HuangYaohui LiYi ZhengYuchen ZhuYunxian MaYing TangYukun ZhaYuting YanZ. Z. RenZehui RenZhangli ShaZhe FuZhean XuZhenda XieZhengyan ZhangZhewen HaoZhicheng MaZhigang YanZhiyu WuZihui GuZijia ZhuZijun LiuZilin LiZiwei XieZiyang SongZizheng PanZhen HuangZhipeng XuZhongyu ZhangZhen Zhang

摘要
我们推出了首款推理模型——DeepSeek-R1-Zero 和 DeepSeek-R1。DeepSeek-R1-Zero 是通过大规模强化学习(Reinforcement Learning, RL)训练而成,无需监督微调(Supervised Fine-Tuning, SFT)作为前置步骤,展现出卓越的推理能力。在强化学习过程中,DeepSeek-R1-Zero 自然涌现出多种强大且引人注目的推理行为。然而,该模型也面临可读性差、语言混杂等挑战。为解决上述问题并进一步提升推理性能,我们提出了 DeepSeek-R1,该模型在强化学习前引入了多阶段训练和冷启动数据。在推理任务上,DeepSeek-R1 的表现可与 OpenAI-o1-1217 相媲美。为支持学术研究,我们开源了 DeepSeek-R1-Zero、DeepSeek-R1 以及基于 Qwen 和 Llama 构建的六款密集型模型(1.5B、7B、8B、14B、32B、70B),这些模型均从 DeepSeek-R1 中蒸馏而来。
代码仓库
deepseek-ai/deepseek-r1
官方
GitHub 中提及
turningpoint-ai/visualthinker-r1-zero
pytorch
GitHub 中提及
vlm-rl/ocean-r1
pytorch
GitHub 中提及
zhaoolee/garss
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| mathematical-reasoning-on-aime24 | DeepSeek-r1 | Acc: 79.8 |
| multi-task-language-understanding-on-mmlu | ds-r1(671b) | Average (%): 87.5 |
| question-answering-on-newsqa | deepseek-r1 | EM: 80.57 F1: 86.13 |