AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Closing the Last Mile of Code Generation with MGDebugger

Large language models (LLMs) have made significant strides in code generation, but the generated code often contains subtle errors that require human intervention to fix. Existing LLM-based debugging systems treat programs as monolithic units, failing to address bugs at various levels of granularity.

A research paper published in the arXiv preprint repository introduces MGDebugger, a hierarchical code debugger that decomposes problematic code into a tree structure of subfunctions. Each level represents a specific granularity of error, ranging from low-level syntax errors to high-level algorithmic flaws. During debugging, MGDebugger analyzes each subfunction and iteratively resolves bugs in a bottom-up manner.

For example, imagine you have a code snippet that aims to find the focus of a parabola. The code may contain a simple error in the formula for calculating the y-coordinate of the focus. MGDebugger would first decompose the code into subfunctions, one for calculating the x-coordinate and one for calculating the y-coordinate. Then, it would analyze each subfunction separately, identifying the error in the y-coordinate calculation and providing a fix.

The paper demonstrates that MGDebugger significantly outperforms existing debugging methods, achieving an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEval-Fix. This improvement is attributed to the effectiveness of MGDebugger’s hierarchical debugging strategy, which allows it to address bugs across different categories and difficulty levels.

This research provides a promising solution to the “last mile” challenge in code generation, paving the way for more reliable and robust AI-assisted coding tools. By tackling the issue of subtle errors in a systematic and hierarchical manner, MGDebugger helps to bridge the gap between LLM-generated code and human-level code quality.