Language Models Get a 'Self-Steering' Upgrade, Outperforming Even GPT-4
A new method called DISCIPL is enabling language models to “self-steer,” letting them tackle complex tasks with greater efficiency and accuracy than ever before. The key innovation: DISCIPL uses one language model to plan how another language model should approach a problem.
Think of it like this: you’re planning a road trip. Instead of just hitting the road and hoping for the best (like traditional language models), DISCIPL first uses a “Planner” language model to create a detailed itinerary – what routes to take, where to stop, and what to see. Then, a “Follower” language model executes that plan, one step at a time.
The significance? This separation of planning and execution allows for much more complex reasoning and controlled output.
Here’s a Concrete Example:
Imagine asking a language model to generate a sentence that:
- Has exactly 18 words.
- Has the 4th word as “Glasgow,” the 8th as “in,” and the 11th as “and.”
Traditional language models struggle with this kind of constrained generation. But DISCIPL can excel by:
- Planning: The Planner language model creates a program that breaks down the task: first generating three words, then forcing “Glasgow,” then generating three more words, then forcing “in,” and so on.
- Execution: The Follower language model follows this program, generating words according to the plan. If a word doesn’t fit the plan’s requirements, it’s immediately corrected.
DISCIPL can use a variety of different plans, allowing it to be very flexible. For example, a planner could generate code that masks words to make sure a sentence’s words have the correct length. This gives the Follower language model guidance that lets it succeed on previously intractable tasks.
Key Advantages of DISCIPL:
- Improved Accuracy: In tasks involving constrained generation (like writing sentences with specific word counts), DISCIPL significantly outperforms even large models like GPT-4.
- Increased Efficiency: By using smaller “Follower” language models guided by the “Planner”, DISCIPL can achieve comparable or superior results with far less computational cost.
- Automation: The process of defining the inference procedure is automated by the Planner language model, reducing the need for manual engineering.
- Parallelization: The “Follower” language models can execute in parallel, further speeding up the reasoning process.
Looking Ahead:
The researchers are excited about the potential for “self-steering” language models in other areas, such as mathematical reasoning and tasks with “soft” constraints (where guidelines are preferred but not strictly enforced). They also envision a “recursive” setup where a single language model takes on both the Planner and Follower roles, continuously improving its reasoning abilities.
DISCIPL represents a major step towards more reliable, efficient, and adaptable language models. By separating planning from execution, it unlocks new possibilities for AI to tackle complex challenges.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.