AI Papers Reader

Personalized digests of latest AI research

View on GitHub

SWERANK: A New Approach to Software Issue Localization Using Code Ranking

Software issue localization, the process of identifying the relevant code locations for a given bug report or feature request, is a critical but often time-consuming task in software development. A new research paper introduces SWERANK, a novel framework that tackles this challenge using a retrieve-and-rerank approach, achieving state-of-the-art performance with significantly lower costs compared to existing methods.

Traditionally, software issue localization has been approached in two main ways: agent-based systems and code ranking models. Agent-based systems leverage large language models (LLMs) to interact with the codebase, using commands like “read-file” or “grep” to explore and identify relevant code. While powerful, these systems are computationally expensive and can be brittle, as they rely on multiple rounds of interaction with large models. Code ranking models, on the other hand, directly rank candidate code snippets based on their relevance to the issue description. However, these models often struggle with the verbose and failure-descriptive nature of issue localization queries.

SWERANK bridges this gap by employing a two-stage architecture:

  1. SWERANKEMBED: A bi-encoder embedding model that serves as a code retriever. It generates dense vector representations for both GitHub issues and code functions, allowing for efficient similarity-based retrieval.
  2. SWERANKLLM: An instruction-tuned LLM that acts as a code reranker. It takes the top-K candidates retrieved by SWERANKEMBED and refines their ranking based on the issue description.

To train SWERANK, the researchers created SWELOC, a large-scale dataset of real-world issue descriptions paired with corresponding code modifications, curated from public GitHub repositories. This dataset is crucial because it captures the unique characteristics of issue localization queries, which are often more verbose and focus on erroneous behavior rather than desired functionality.

Example: Imagine a bug report stating, “The application crashes when trying to upload a file larger than 10MB.” SWERANKEMBED would initially retrieve a set of code functions that are semantically related to “file upload” and “size limits.” Then, SWERANKLLM would rerank these candidates, giving higher scores to functions that specifically handle large file sizes and error conditions related to file uploads.

The researchers evaluated SWERANK on established benchmarks like SWE-Bench-Lite and LocBench, demonstrating that it outperforms both prior ranking models and costly agent-based systems using closed-source LLMs like Claude-3.5. Notably, SWERANK, built on open-source models, achieves a considerably better performance-to-cost ratio. For instance, the paper shows that SWERANKEMBED and SWERANKLLM provide superior accuracy compared to agent-based systems like LocAgent while drastically reducing the cost per instance from $0.66 to just a few cents.

The paper also highlights the utility of the SWELOC dataset by showing that it improves the localization performance of various existing retriever and reranker models when used for fine-tuning. This makes SWELOC a valuable resource for the community.

In conclusion, SWERANK offers a practical and cost-effective solution for software issue localization by framing it as a specialized ranking task. By leveraging a carefully designed retrieve-and-rerank architecture and training on a dedicated dataset, SWERANK provides a significant advancement in automated software engineering.