LLMs Get Better at Following Complex Instructions
💬 Ask
Large language models (LLMs) are getting increasingly good at following instructions. But as the tasks we ask LLMs to perform become more complex, their ability to follow intricate instructions with multiple constraints has lagged behind. Researchers at Tongyi Lab have tackled this problem with a new benchmark and an improved training method called IOPO (Input-Output Preference Optimization). Their work was published in a paper titled “IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization.”
The team first created a new benchmark called TRACE, which contains 120,000 training examples and 1,000 evaluation examples. These examples are carefully crafted with complex instructions that include multiple constraints, such as:
- Content Constraint: The answer should be objective facts, not opinions.
- Output Format Constraint: The answer should be in JSON format.
- Numerical Constraint: The answer should include no more than 1,000 words.
TRACE pushes LLMs to perform much more nuanced tasks than traditional instruction-following benchmarks. It provides a more accurate way to evaluate the ability of LLMs to understand and execute intricate instructions.
Next, the researchers developed IOPO. Unlike other instruction-following training methods, IOPO goes beyond simply considering the output response to match a given instruction. It also examines the input instruction itself to uncover the fine-grained constraints within. IOPO analyzes the instruction and its corresponding response, learning from both to better understand and optimize for the complex constraints.
The researchers compared IOPO to other popular instruction-following training methods like SFT (Supervised Fine-tuning) and DPO (Direct Preference Optimization). Their experiments showed that IOPO significantly outperformed both methods, achieving an average increase of 7.22% and 2.66% in accuracy.
The researchers also conducted ablation studies, removing either input or output preferences from IOPO. These studies showed that both input and output preferences are crucial for the model’s ability to accurately follow complex instructions.
Overall, the work by the Tongyi Lab team represents a significant advancement in the field of LLM instruction following. Their research will help us better understand and optimize the capabilities of LLMs for complex tasks. This is essential for LLMs to effectively function as helpful assistants in a wide range of applications.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.