AI Papers Reader

Personalized digests of latest AI research

View on GitHub

MUTAGREP: A New Approach to Code Generation That Outperforms Traditional Methods

Large Language Models (LLMs) are increasingly used to generate code, especially when leveraging existing code repositories. However, providing these LLMs with the vast amount of code within a repository poses challenges. Simply feeding the entire repository into the LLM’s context window is often inefficient, as most tasks only require a small fraction of the available code. Furthermore, studies have shown that LLMs struggle to effectively utilize long context windows.

To address these limitations, a team of researchers has developed MUTAGREP (Mutation-guided Grounded Repository Plan Search), a novel approach that mimics how human programmers navigate large codebases. Instead of overwhelming the LLM with the entire repository, MUTAGREP helps the LLM decompose a user’s coding request into a series of smaller, more manageable steps.

How MUTAGREP Works:

MUTAGREP operates by searching for a plan to fulfill a user’s request. This plan consists of a sequence of natural language steps, each grounded in specific symbols (functions, classes, etc.) from the code repository. Here’s a breakdown:

  1. Decomposition: The user’s request is broken down into a step-by-step plan expressed in natural language. For example, if a user asks to “Implement a CQL agent in Acme library”, MUTAGREP might generate steps like “Import necessary modules for CQL,” “Initialize the random seed,” and “Create a demonstration dataset.”

  2. Symbol Retrieval: For each step, MUTAGREP identifies relevant symbols from the codebase using a “symbol retriever”. This grounding ensures that each step is actually implementable using the available code. For instance, for the step “Import necessary modules for CQL,” MUTAGREP might retrieve symbols like CQLLearner, get_tfds_dataset, and CQLLearner.get_variables from the Acme library.

  3. Plan Search: MUTAGREP uses a neural tree search to explore the space of possible plans. It starts with an initial plan and then iteratively mutates it, adding, removing, or modifying steps. A scoring function guides the search, favoring plans that are complete, concise, and faithful to the original query.

Concrete Example:

Imagine you want to write code to “Fine-tune a Vision Transformer for an imbalanced medical image dataset.” MUTAGREP might produce the following plan:

  • Step 1: Load pretrained ViT-B/16 model. (Symbols: ViTForImageClassification, from_pretrained())
  • Step 2: Freeze encoder blocks except for last 2 layers. (Symbols: set_trainable_layers(), ViTEncoder, freeze_parameters())
  • Step 3: Train with weighted cross entropy for imbalance. (Symbols: WeightedCrossEntropyLoss, compute_class_weights(), backward(), step())

This detailed plan, along with the relevant symbols, provides a clear roadmap for any coding system to generate the desired code.

Key Advantages:

  • Reduced Context: MUTAGREP significantly reduces the amount of code that needs to be fed into the LLM’s context window. The research found that their plans use less than 5% of a 128K context window for GPT-40.
  • Improved Performance: By providing a well-structured plan, MUTAGREP can rival the coding performance of GPT-40 with a context window filled with the entire code repository. Furthermore, plans produced by MUTAGREP allowed smaller LLMs like Qwen 2.5 to match the performance of GPT-40.
  • Progress on Difficult Tasks: MUTAGREP enables progress on the hardest tasks within the LongCodeArena benchmark, where even state-of-the-art models struggle.

MUTAGREP presents a promising new direction for code generation, enabling more efficient and effective utilization of LLMs and large code repositories. The project page is available at zaidkhan.me/MutaGReP.