AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Can Language Models Replace Programmers? REPOCOD Says ‘Not Yet’

Large language models (LLMs) are becoming increasingly popular for code generation. However, it’s not yet clear whether they can truly replace human programmers. To answer this question, researchers have created a new code generation benchmark called REPOCOD.

REPOCOD consists of 980 real-world code completion tasks collected from 11 popular GitHub projects. These tasks are significantly more challenging than those found in existing benchmarks, such as HumanEval and MBPP, because they require LLMs to generate code that is dependent on other functions, files, and classes within the project. For example, REPOCOD tasks may require an LLM to complete a function that needs to interact with other functions in a specific file or even in another file in the same project.

Researchers evaluated the performance of 10 LLMs on REPOCOD. None of the models achieved more than 30 pass@1 on REPOCOD. This means that no model was able to correctly generate code that passed all of the tests for more than 30% of the tasks. This suggests that LLMs still have a long way to go before they can reliably replace human programmers.

In addition, the researchers found that different retrieval approaches can have a significant impact on LLM performance. Sparse retrieval, where only the most relevant functions are retrieved from the repository, yielded worse results than dense retrieval, which retrieves all functions based on their similarity to the query. This is because sparse retrieval may miss important context that is needed for LLMs to generate correct code.

These results show that LLMs are still not capable of reliably generating code for complex, real-world projects. This is likely because LLMs are still struggling to understand the complex dependencies and interactions between different parts of a codebase.

However, the researchers are optimistic about the future of LLMs for code generation. They believe that as LLMs continue to improve, they will eventually be able to overcome these challenges and become valuable tools for programmers. In the meantime, REPOCOD provides a valuable benchmark for researchers to evaluate the progress of LLMs in this area.