AI Papers Reader

Personalized digests of latest AI research

View on GitHub

COFFEE-GYM: A New Environment for Training Better Code Editing Feedback Models

Large language models (LLMs) are becoming increasingly powerful tools for assisting human programmers. But while they can generate impressive code, they often make errors. A key challenge in improving LLMs is teaching them to recognize and fix their own mistakes.

One approach is to use natural language feedback—essentially, to have an LLM provide instructions on how to correct a piece of code. However, existing methods for training LLMs to provide effective feedback are limited. Often, they rely on expensive, closed-source LLMs like GPT-4 or lack rigorous evaluation.

A research team from Yonsei University and Seoul National University has developed a new environment called COFFEE-GYM, designed to address these limitations. COFFEE-GYM provides a comprehensive toolkit for training LLMs to provide accurate and helpful feedback on code editing.

COFFEE-GYM consists of two main components:

To evaluate COFFEE-GYM’s effectiveness, the research team conducted a series of experiments. They found that:

The team also explored a variety of training strategies within COFFEE-GYM, including supervised fine-tuning and reinforcement learning. They found that reinforcement learning, particularly using the PPO algorithm, was the most effective method for training feedback models.

COFFEE-GYM offers a promising new approach for training LLMs to provide effective feedback on code editing. By leveraging a comprehensive dataset of human-written code and a robust evaluation methodology, this research makes significant progress towards developing more powerful tools for assisting human programmers.

The research team has made the COFFEE dataset and the trained feedback model publicly available, enabling others to further explore and build upon their work. This opens exciting opportunities for researchers and developers to create more advanced code editing tools, potentially leading to a future where LLMs can help programmers write more efficient and reliable code.