A New Model Uses Process Supervision to Enhance AI Reasoning
đź“„ Full Paper
đź’¬ Ask
Large language models (LLMs) have shown impressive capabilities in reasoning, but their reasoning steps can be incomplete or inaccurate. Often, LLMs skip over important logical steps because they mimic “logical leaps” common in their pre-training data. To address this issue, researchers at Johns Hopkins University have developed a new model called RATIONALYST, which aims to improve AI reasoning by supervising the reasoning process through explicit guidance from extracted “implicit rationales.”
Imagine, for example, you’re training an AI model on the sentence “Harry used magic outside of the school of Hogwarts to inflate Aunt Marge… He is punished to attend a disciplinary hearing at the Ministry of Magic…”. While this sentence doesn’t explicitly state the rule “When someone breaks the rule, he will be punished!”, a human would easily infer it, and this is the type of information that can be used to improve AI reasoning. RATIONALYST is designed to extract and incorporate these implicit rationales, thereby enhancing the model’s ability to follow a logical chain of thought.
The paper highlights the fact that existing LLMs often struggle to surface these implicit connections. To tackle this, RATIONALYST employs a three-stage process:
- Rationale extraction: Using a pre-trained language model, RATIONALYST extracts implicit rationales from large unlabelled datasets like The Pile, a massive dataset of text and code. The model also learns to filter out irrelevant rationales to ensure high quality.
- Rationale prediction: RATIONALYST is trained to predict these implicit rationales based on the preceding text. This helps the model understand the underlying logic connecting different parts of a text.
- Process supervision: During inference, RATIONALYST generates a rationale at each step in the reasoning process, and then provides supervision to a reasoning agent (such as another LLM) to guide it towards more accurate conclusions.
The researchers tested RATIONALYST on several reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Their results showed that RATIONALYST consistently improved the accuracy of reasoning by an average of 3.9% across these diverse tasks, outperforming even larger models like GPT-4 and similarly sized models fine-tuned on matching training sets.
This breakthrough research highlights the potential of “process supervision” for enhancing AI reasoning. By utilizing implicit rationales, RATIONALYST not only improves the accuracy of AI reasoning but also enhances the model’s ability to provide human-understandable explanations for its reasoning process. This development holds promise for building AI systems that can reason more effectively and transparently, paving the way for more robust and reliable AI applications across various domains.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.