DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

🔊

💬 Ask

DeepSeek-Prover-V1.5 is an open-source language model (LM) developed by DeepSeek-AI for theorem proving in Lean 4. It significantly improves upon its predecessor, DeepSeek-Prover-V1, by incorporating several key advancements, including an enhanced training regime and an innovative Monte-Carlo tree search (MCTS) algorithm. These improvements result in a new state-of-the-art performance on both the miniF2F (high school level) and ProofNet (undergraduate level) benchmarks, achieving pass rates of 63.5% and 25.3%, respectively.

The new model tackles the challenges of formal theorem proving, where LMs must not only comprehend formal mathematical languages but also align their reasoning with rigorous specifications. Prior work on formal theorem provers using LMs typically focuses on two strategies:

Proof-step generation: This approach generates individual proof steps, verifying each step using a formal verifier and relying on search techniques to construct a complete proof.
Whole-proof generation: This strategy aims to generate an entire proof code directly, requiring less communication but presenting challenges in navigating long-horizon sequences and handling potential compounding errors.

DeepSeek-Prover-V1.5 takes a unified approach by combining the strengths of both strategies. Its key innovations include:

Truncate-and-resume mechanism: The model initially generates a complete proof code, and if an error is detected, it truncates the code at the first error message and uses the successfully generated portion as a prompt to generate the next proof segment. This allows the model to incorporate intermediate tactic states into its proof generation, mitigating the issue of compounding errors.
Monte-Carlo Tree Search (MCTS) with RMaxTS: The model employs MCTS with a novel reward-free exploration strategy called RMaxTS. This strategy assigns intrinsic rewards to the model whenever it expands the search tree, encouraging exploration and the discovery of diverse solution candidates.

In addition to these key algorithmic innovations, DeepSeek-Prover-V1.5 also incorporates several enhancements in its training process:

Pre-training: The model is pre-trained on a large corpus of formal mathematical languages, including Lean, Isabelle, and Metamath, to enhance its reasoning abilities in formal mathematics.
Supervised fine-tuning: The model is fine-tuned on an enhanced dataset of Lean 4 code completion, where intermediate tactic state information is incorporated to support the truncate-and-resume mechanism.
Reinforcement learning from proof assistant feedback (RLPAF): The model is further refined using reinforcement learning, where verification feedback from the Lean 4 prover serves as the reward signal. This aligns the model’s outputs with the formal specifications of the verification system.

DeepSeek-Prover-V1.5 is a significant advancement in formal theorem proving, pushing the boundaries of what is possible with LMs. Its unified approach to proof generation, combined with its innovative MCTS algorithm and enhanced training regime, makes it a powerful tool for tackling complex formal mathematics problems. It is expected to have a significant impact on future research in the field.

AI Papers Reader

Personalized digests of latest AI research

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Chat about this paper