AI Papers Reader

Personalized digests of latest AI research

View on GitHub

AI Learns to Prove Math Theorems by Breaking Down Complex Problems

Tencent AI Lab researchers have developed a novel AI framework that significantly advances automated theorem proving by decoupling high-level reasoning from low-level proof generation. This approach has successfully solved five challenging problems from the International Mathematical Olympiad (IMO) that had previously stumped open-source provers.

For decades, mathematicians and computer scientists have sought to create AI systems capable of proving complex mathematical theorems. While recent advancements in Large Language Models (LLMs) have shown promise, a significant gap has persisted: LLMs excel at generating intuitive, informal mathematical explanations, but struggle to produce rigorous, machine-verifiable formal proofs. In contrast, traditional Automated Theorem Proving (ATP) systems excel at formal verification but lack the sophisticated reasoning capabilities of LLMs.

This new research, published by a team at Tencent AI Lab, addresses this “reasoning-proving gap” by introducing a “decoupled” framework. The core idea is to separate the process into two distinct, specialized AI models: a powerful “Reasoner” and an efficient “Prover.”

The Reasoner: This LLM-based model is tasked with understanding the core mathematical problem and devising a high-level strategy. It does this by generating a series of intermediate, formally stated mathematical steps, known as “lemmas” or “subgoals.” For instance, when faced with a complex equation, the Reasoner might first deduce key identities, then identify an underlying structure (like an additive property), and subsequently propose a general form for the solution. This mirrors how human mathematicians break down a difficult problem into smaller, manageable parts.

The Prover: This specialized model takes the lemmas generated by the Reasoner and attempts to formally verify them using a theorem proving system like Lean 4. If a lemma cannot be proven, it’s discarded. The Prover then uses the verified lemmas as building blocks to construct the final formal proof for the original theorem.

“We argue that by tightly coupling reasoning and proving within a single model, current state-of-the-art provers inadvertently suppress deep reasoning in favor of shallow, tactic-based strategies,” the researchers explain. “Our decoupled framework liberates the model’s full reasoning potential and bypasses the pitfalls of end-to-end training.”

A Real-World Example: IMO 2019 Problem 1

Consider the IMO 2019 Problem 1, which asks for all functions $f: \mathbb{Z} \to \mathbb{Z}$ satisfying the functional equation $f(2a) + 2f(b) = f(f(a+b))$.

The Reasoner in this new framework would break down the problem as follows:

  1. Identify properties: It might deduce that $f(f(x)) = 2f(x) + f(0)$ and $f(2x) = 2f(x) - f(0)$ are key identities.
  2. Uncover structure: Combining these, it deduces an additive property: $f(x+y) = f(x) + f(y) - f(0)$.
  3. Characterize solution: From this, it infers that the function must be of the form $f(x) = cx + f(0)$.
  4. Constrain parameters: By plugging this form back into the original equation, it narrows down the possibilities for $c$ to either 0 or 2.
  5. Verify solutions: Finally, it confirms that $f(x) = 0$ and $f(x) = 2x + c$ are indeed the solutions.

This structured approach, the paper argues, contrasts with traditional provers that might resort to brute-force exploration of equation instances without uncovering the underlying mathematical insights.

The framework’s success on five challenging IMO problems—IMO 2000 Problem 2, IMO 2005 Problem 3, IMO 2011 Problem 3, IMO 2019 Problem 1, and IMO 2020 Problem 2—demonstrates a significant step forward in AI’s ability to tackle complex mathematical challenges. The researchers are making their dataset of generated and verified lemmas publicly available to foster further research in this area.