Command Palette
Search for a command to run...
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Abstract
We introduce DeepSeek-Prover-V1.5, an open-source language model designed fortheorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing bothtraining and inference processes. Pre-trained on DeepSeekMath-Base withspecialization in formal mathematical languages, the model undergoes supervisedfine-tuning using an enhanced formal theorem proving dataset derived fromDeepSeek-Prover-V1. Further refinement is achieved through reinforcementlearning from proof assistant feedback (RLPAF). Beyond the single-passwhole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, avariant of Monte-Carlo tree search that employs an intrinsic-reward-drivenexploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5demonstrates significant improvements over DeepSeek-Prover-V1, achieving newstate-of-the-art results on the test set of the high school level miniF2Fbenchmark (63.5%) and the undergraduate level ProofNet benchmark (25.3%).
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| automated-theorem-proving-on-minif2f-test | DeepSeek-Prover-V1.5 | ITP: Lean Pass@32: 50.0 Pass@64: 50.7 cumulative: 63.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.