COFFEE-GYM: A New Environment for Training Better Code Editing Feedback Models

🔊

💬 Ask

Large language models (LLMs) are becoming increasingly powerful tools for assisting human programmers. But while they can generate impressive code, they often make errors. A key challenge in improving LLMs is teaching them to recognize and fix their own mistakes.

One approach is to use natural language feedback—essentially, to have an LLM provide instructions on how to correct a piece of code. However, existing methods for training LLMs to provide effective feedback are limited. Often, they rely on expensive, closed-source LLMs like GPT-4 or lack rigorous evaluation.

A research team from Yonsei University and Seoul National University has developed a new environment called COFFEE-GYM, designed to address these limitations. COFFEE-GYM provides a comprehensive toolkit for training LLMs to provide accurate and helpful feedback on code editing.

COFFEE-GYM consists of two main components:

COFFEE: This dataset includes human-written code edit traces, which record the steps programmers took to fix errors in code, along with feedback from other programmers. The dataset is designed to be diverse, incorporating problems of various difficulties and encompassing a wide range of errors.
COFFEEEVAL: This reward function measures the helpfulness of feedback based on its impact on the quality of the edited code. It relies on unit tests to assess whether the edited code actually works correctly. This approach is more effective than prior methods, which typically relied on subjective human evaluations.

To evaluate COFFEE-GYM’s effectiveness, the research team conducted a series of experiments. They found that:

COFFEE-GYM can train feedback models that achieve performance comparable to closed-source LLMs. These models were able to improve the code editing skills of open-source code LLMs, making them more effective at generating high-quality code.
COFFEEEVAL provides more accurate reward signals than prior methods. This means that it can more reliably identify and promote feedback that is truly helpful in improving code.

The team also explored a variety of training strategies within COFFEE-GYM, including supervised fine-tuning and reinforcement learning. They found that reinforcement learning, particularly using the PPO algorithm, was the most effective method for training feedback models.

COFFEE-GYM offers a promising new approach for training LLMs to provide effective feedback on code editing. By leveraging a comprehensive dataset of human-written code and a robust evaluation methodology, this research makes significant progress towards developing more powerful tools for assisting human programmers.

The research team has made the COFFEE dataset and the trained feedback model publicly available, enabling others to further explore and build upon their work. This opens exciting opportunities for researchers and developers to create more advanced code editing tools, potentially leading to a future where LLMs can help programmers write more efficient and reliable code.

AI Papers Reader

Personalized digests of latest AI research

COFFEE-GYM: A New Environment for Training Better Code Editing Feedback Models

Chat about this paper