LLMs can now teach themselves to be better judges

🔊

💬 Ask

The latest advances in artificial intelligence (AI) are increasingly relying on large language models (LLMs) for tasks such as generating creative content, translating languages, and writing code. However, evaluating the quality of these LLMs poses a significant challenge.

Traditionally, evaluating LLMs has relied heavily on human preference judgments. These judgments are typically collected by asking human annotators to compare pairs of responses generated by the LLM and indicate which one is better. This approach is often costly and time-consuming, especially for complex tasks. Furthermore, human judgments can become outdated as LLMs improve over time.

A new paper published by Meta AI researchers proposes a novel method to train “self-taught evaluators” that do not require any human annotations. The method leverages synthetic preference data generated by the LLM itself, improving the evaluator’s performance over time.

The key idea is to iteratively train an LLM-as-a-Judge to produce judgments on pairs of contrasting LLM responses. These judgments are used to fine-tune the LLM-as-a-Judge, leading to a continuous improvement in its ability to discriminate between good and bad responses.

The researchers propose a three-step process for generating synthetic preference data:

Instruction selection: The process begins by selecting a challenging set of instructions from a pool of unlabeled data. These instructions are chosen to cover a wide range of skills, such as reasoning, coding, and safety.
Response pair construction: For each instruction, the researchers generate two contrasting responses using the LLM. One response is expected to be “better” than the other, based on the instruction and the LLM’s general capabilities.
Judgment annotation: The LLM-as-a-Judge is then prompted to generate a chain-of-thought reasoning and a final judgment, indicating which response is preferred. These judgments are used to fine-tune the LLM-as-a-Judge in the next iteration.

The researchers evaluated the performance of their self-taught evaluator on various benchmark datasets, including RewardBench and MT-Bench. Their results show that the self-taught evaluator significantly outperforms the baseline LLM, trained with no labeled data, achieving performance comparable to top-performing reward models trained on human annotations.

This method offers a promising way to address the challenges of training and evaluating LLMs without relying on expensive human annotations. This breakthrough could lead to significant advancements in the development of more reliable and efficient AI systems.

Here are some concrete examples of how the self-taught evaluator can be used:

Imagine you want to train an LLM to write code. You could provide the LLM with a set of coding instructions and ask it to generate code for each instruction. Then, you could use the self-taught evaluator to compare the generated code against a set of reference code solutions. The evaluator would identify which generated code is closer to the reference solution, providing feedback to the LLM for improvement.
You could also use the self-taught evaluator to train an LLM to write creative stories. You could provide the LLM with a set of prompts for different genres of stories. The LLM would then generate a story for each prompt. The self-taught evaluator would then compare the generated stories and provide feedback on which ones are more creative, engaging, and well-written.

The self-taught evaluator is a significant step forward in the development of more intelligent and reliable AI systems. It demonstrates the potential of LLMs to learn and improve without human intervention, leading to a future where AI systems can be trained more efficiently and effectively.

AI Papers Reader

Personalized digests of latest AI research

LLMs can now teach themselves to be better judges

Chat about this paper