a month ago

Table of Contents

Abstract

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

One-sentence Summary

Peking University and Bytedance propose GraphLocator, an LLM-based approach that addresses symptom-to-cause and one-to-many mismatches in issue localization through a causal issue graph (CIG) enabling dynamic disentangling and multi-hop causal reasoning, achieving up to +19.49% function-level recall and +11.89% precision improvements over baselines while enhancing downstream issue resolution by 28.74%.

Key Contributions

Issue localization faces a fundamental challenge due to two key mismatches: symptom-to-cause, where issue descriptions reveal symptoms rather than root causes, and one-to-many, where a single issue requires changes across multiple interdependent code entities, both exacerbating the semantic gap between natural language and code.
GRAPHLOCATOR introduces a causal issue graph (CIG) to model sub-issues and their causal dependencies, leveraging an agentic workflow on a repository dependency fractal structure (RDFS) to dynamically discover and disentangle complex issue structures through iterative abductive reasoning.
Evaluated on three real-world Python and Java datasets, GRAPHLOCATOR achieves +19.49% in function-level recall and +11.89% in precision over baselines, with significant gains in both mismatch scenarios and a 28.74% performance boost on downstream resolution tasks due to its disentangled causal structure.

Introduction

The issue localization task in software engineering involves mapping natural language issue descriptions to the relevant code locations that need modification, a critical step in automated debugging and maintenance. This task is challenging due to two key mismatches: symptom-to-cause, where issues describe observable symptoms rather than root causes, requiring multi-hop reasoning; and one-to-many, where a single issue spans multiple interdependent code entities. Prior approaches—embedding-based and LLM-based—struggle with these mismatches: embedding methods lack structural awareness, while LLM-based methods often rely on superficial relevance and fail to model causal dependencies or maintain coherent reasoning chains. The authors propose GRAPHLocator, an LLM-driven approach that constructs a causal issue graph (CIG) to represent sub-issues and their causal relationships. By leveraging an agentic workflow on a repository dependency fractal structure, GRAPHLocator dynamically discovers causal paths and disentangles complex issues through iterative abductive reasoning. This enables accurate localization across both mismatch types, achieving up to +19.49% recall and +11.89% precision gains over baselines, with a 28.74% improvement in downstream issue resolution performance.

Dataset

The dataset comprises three publicly available benchmarks covering Python and Java, each containing GitHub issues with issue descriptions, commit versions, and diff-based fix patches for evaluation.
SWE-bench Lite (Python): 300 issues from 11 large-scale Python projects, selected from the full SWE-bench dataset to balance evaluation cost and quality; excludes issues with non-textual content like images or hyperlinks.
LocBench (Python): 559 issues from 164 Python repositories, including diverse issue types such as bug reports, feature requests, security issues, and performance problems; one issue from the inaccessible repository NCSU-High-Powered-Rocketry-Club/AirbrakesV2 was removed.
Multi-SWE-bench (Java): 128 issues from 9 open-source Java projects, constructed via a rigorous pipeline involving repository selection based on stars and runnability, followed by dual annotation and cross-review to ensure high-quality, human-verified ground truth.
Ground-truth locations are re-extracted using RDFS (Repository Dependency and File Structure) at three levels: file-level (file paths), module-level (enclosing class, enum, or interface), and function-level (containing function/method), ensuring fine-grained and consistent supervision.
The model is trained and evaluated using a mixture of these datasets, with training splits derived from the full issue collections and balanced mixture ratios applied to reflect diverse issue types and programming languages.
All datasets are processed to extract clean, text-based inputs and structured metadata, with diff patches used to define ground-truth locations at multiple granularities. No image or external content is included in the final training or evaluation data.

Method

The authors leverage a hybrid, LLM-driven framework named GraphLocator for function-level issue localization, which operates through two distinct phases: symptom vertices locating and dynamic CIG discovering. The overall architecture is designed to address symptom-to-cause and one-to-many mismatches by explicitly modeling causal dependencies within a repository's structural context. The framework begins by constructing a Repository Dependency Fractal Structure (RDFS), a heterogeneous attributed graph that captures both hierarchical and dependency relationships among code entities across four distinct layers of granularity. This RDFS serves as the foundational knowledge base for the entire process.

In the first phase, GraphLocator employs a SearchAgent, an LLM-driven agent, to identify symptom vertices within the RDFS that correspond to the issue description. This agent iteratively invokes a set of graph-executable search tools to autonomously locate relevant code entities. The search tool set includes search_vertex, which retrieves vertices based on name and type constraints with support for wildcard matching and fuzzy string search, and search_edge, which identifies edges between vertices based on specified relational constraints, enabling the discovery of dependencies. The agent uses prompt-based semantic reasoning to filter the top-k candidates, ensuring only the most semantically aligned vertices are retained. This phase concludes when the finish tool is invoked, returning the set of symptom vertices to the next phase. The RDFS is constructed by parsing the codebase into an AST using tree-sitter, extracting code entities, and establishing a hierarchical skeleton with HasMember edges. Static analysis is then used to identify semantic dependencies, such as imports and function calls, which are encoded as ImportedBy, ExtendedBy, and UsedBy edges. A lazy-loading strategy is employed to build the graph incrementally, ensuring efficiency and facilitating updates.

In the second phase, GraphLocator incrementally constructs a Causal Issue Graph (CIG) starting from the symptom vertices identified in the first phase. The CIG is a directed graph where vertices represent sub-issues derived from the issue description and grounded to RDFS vertices, and edges represent probabilistic causal dependencies between these sub-issues. The construction process is guided by a priority-driven expansion strategy that ensures causal coherence. At each step, the sub-issue with the highest priority score, defined as $\Psi ( x ) = 1 - \prod _ { ( x , y ) \in \mathcal { Y } } \left( 1 - \psi ( x , y ) \right)$ , is selected for expansion. This score reflects the potential of a sub-issue to causally influence other issues, with $\psi(x,y)$ being the LLM-estimated probability that $x$ causes $y$ . The expansion is performed by a CausalAgent, which uses graph-guided abductive reasoning. Given a sub-issue to expand, the agent identifies its direct causal candidates as neighboring code vertices in the RDFS that have not yet been visited. A structured prompt is then constructed for the LLM, containing the issue description, the serialized CIG from the previous iteration in Mermaid format, the target sub-issue, and the list of newly observed nodes. This prompt enables the LLM to evaluate whether each candidate constitutes a plausible cause or intermediary, thereby updating the CIG with new sub-issues and causal edges. This iterative process continues until a maximum number of turns is reached, resulting in a final CIG that explicitly models the causal chain from symptoms to root causes. The final output of the framework is a set of code entities that are required for modification, derived from the vertices in the RDFS that are grounded to the sub-issues in the final CIG.

Experiment

RQ1 (Effectiveness): GRAPHLOCATOR outperforms baselines across Python (SWE-bench Lite, LocBench) and Java (Multi-SWE-bench Java) datasets, achieving up to +21.33% F1-score at function level with GPT-4o and +16.79% with Claude-3.5. It significantly improves precision (e.g., nearly 4× higher than LocAgent on function level with Claude-3.5) by leveraging causal graph-guided reasoning, while maintaining high recall. On SWE-bench Lite and LocBench, it improves F1 by 15.68–21.32% at function level over SWERank-Large, demonstrating the superiority of causal reasoning over similarity-based retrieval.
RQ2 (Generalizability): GRAPHLOCATOR maintains superior performance under increasing task complexity. On symptom-to-cause distance, it achieves the highest recall (95.59% at distance 0) and consistently outperforms baselines, especially in long-distance cases. On multi-function issues, it shows the most robust balance between recall and precision, outperforming Agentless and LocAgent, which degrade sharply with increasing function count, and CoSIL, which sacrifices recall for precision.
RQ3 (Ablation): Removing any key component—searchVertex, searchEdge, priority queue, or CIG guidance—significantly degrades performance. Function-level F1 drops from 30.91% to 14.23% without searchVertex, and precision and recall decline without CIG guidance or priority-driven expansion, confirming each component’s critical role in effective localization.
RQ4 (Cost): GRAPHLOCATOR achieves efficient graph construction, outperforming LocAgent in time when repositories exceed 20 functions due to lazy loading. It reduces token consumption by 52.9% (GPT-4o) and 32.3% (Claude-3.5) compared to LocAgent, cutting cost by up to 43.5%, while maintaining strong accuracy.
Downstream Impact: Integrating GRAPHLOCATOR into Trae Agent and Agentless improves issue resolved rates, with Mermaid-serialized CIGs providing the highest gains (e.g., +5.67% on SWE-bench Lite), confirming that structured, causal context enhances downstream reasoning.

The authors evaluate GRAPHLOCATOR's impact on downstream issue resolving by integrating it with two frameworks, Agentless and Trae Agent, and measure the resolved rate on SWE-bench Lite and Multi-SWE-bench Java. Results show that GRAPHLOCATOR consistently improves resolved rates compared to baselines, with the highest improvement observed when using the structured CIG representation, indicating that explicit causal relationships enhance the ability of LLMs to generate correct fixes.

Results show that GraphLocator consistently outperforms all baseline methods across both Python and Java datasets, achieving the highest F1 scores and maintaining superior precision and recall as symptom-to-cause distance increases and the number of ground-truth functions grows. The performance gap widens at finer granularities, indicating that GraphLocator's graph-guided causal reasoning is particularly effective in complex, multi-step localization tasks.

Results show that GraphLocator consistently outperforms all baseline approaches across all datasets and granularity levels, achieving the highest F1 scores at file, module, and function levels. It significantly improves precision, particularly at the function level, while maintaining high recall, demonstrating the effectiveness of its graph-guided causal reasoning.

Results show that GraphLocator achieves significantly lower token consumption and cost compared to LocAgent, using 52.9% fewer input tokens and reducing cost by 43.5% under GPT-4o, while maintaining competitive performance. It also consumes fewer tokens than Agentless and CoSIL, demonstrating a favorable trade-off between efficiency and accuracy.

The authors use an ablation study to evaluate the contribution of each component in GRAPHLOCATOR. Results show that removing any key component—such as search_vertex, search_edge, the priority queue, or CIG-guided reasoning—significantly degrades performance across all levels of granularity, indicating that each component is essential for effective issue localization.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

a month ago

Reasoning

Algorithm

Artificial Intelligence

Research Field

Method/Architecture

Wei Liu Chao Peng Pengfei Gao Aofan Liu Wei Zhang Haiyan Zhao Zhi Jin

Table of Contents

Abstract

One-sentence Summary

Key Contributions

Issue localization faces a fundamental challenge due to two key mismatches: symptom-to-cause, where issue descriptions reveal symptoms rather than root causes, and one-to-many, where a single issue requires changes across multiple interdependent code entities, both exacerbating the semantic gap between natural language and code.
GRAPHLOCATOR introduces a causal issue graph (CIG) to model sub-issues and their causal dependencies, leveraging an agentic workflow on a repository dependency fractal structure (RDFS) to dynamically discover and disentangle complex issue structures through iterative abductive reasoning.
Evaluated on three real-world Python and Java datasets, GRAPHLOCATOR achieves +19.49% in function-level recall and +11.89% in precision over baselines, with significant gains in both mismatch scenarios and a 28.74% performance boost on downstream resolution tasks due to its disentangled causal structure.

Introduction

Dataset

The dataset comprises three publicly available benchmarks covering Python and Java, each containing GitHub issues with issue descriptions, commit versions, and diff-based fix patches for evaluation.
SWE-bench Lite (Python): 300 issues from 11 large-scale Python projects, selected from the full SWE-bench dataset to balance evaluation cost and quality; excludes issues with non-textual content like images or hyperlinks.
LocBench (Python): 559 issues from 164 Python repositories, including diverse issue types such as bug reports, feature requests, security issues, and performance problems; one issue from the inaccessible repository NCSU-High-Powered-Rocketry-Club/AirbrakesV2 was removed.
Multi-SWE-bench (Java): 128 issues from 9 open-source Java projects, constructed via a rigorous pipeline involving repository selection based on stars and runnability, followed by dual annotation and cross-review to ensure high-quality, human-verified ground truth.
Ground-truth locations are re-extracted using RDFS (Repository Dependency and File Structure) at three levels: file-level (file paths), module-level (enclosing class, enum, or interface), and function-level (containing function/method), ensuring fine-grained and consistent supervision.
The model is trained and evaluated using a mixture of these datasets, with training splits derived from the full issue collections and balanced mixture ratios applied to reflect diverse issue types and programming languages.
All datasets are processed to extract clean, text-based inputs and structured metadata, with diff patches used to define ground-truth locations at multiple granularities. No image or external content is included in the final training or evaluation data.

Method

Experiment

RQ1 (Effectiveness): GRAPHLOCATOR outperforms baselines across Python (SWE-bench Lite, LocBench) and Java (Multi-SWE-bench Java) datasets, achieving up to +21.33% F1-score at function level with GPT-4o and +16.79% with Claude-3.5. It significantly improves precision (e.g., nearly 4× higher than LocAgent on function level with Claude-3.5) by leveraging causal graph-guided reasoning, while maintaining high recall. On SWE-bench Lite and LocBench, it improves F1 by 15.68–21.32% at function level over SWERank-Large, demonstrating the superiority of causal reasoning over similarity-based retrieval.
RQ2 (Generalizability): GRAPHLOCATOR maintains superior performance under increasing task complexity. On symptom-to-cause distance, it achieves the highest recall (95.59% at distance 0) and consistently outperforms baselines, especially in long-distance cases. On multi-function issues, it shows the most robust balance between recall and precision, outperforming Agentless and LocAgent, which degrade sharply with increasing function count, and CoSIL, which sacrifices recall for precision.
RQ3 (Ablation): Removing any key component—searchVertex, searchEdge, priority queue, or CIG guidance—significantly degrades performance. Function-level F1 drops from 30.91% to 14.23% without searchVertex, and precision and recall decline without CIG guidance or priority-driven expansion, confirming each component’s critical role in effective localization.
RQ4 (Cost): GRAPHLOCATOR achieves efficient graph construction, outperforming LocAgent in time when repositories exceed 20 functions due to lazy loading. It reduces token consumption by 52.9% (GPT-4o) and 32.3% (Claude-3.5) compared to LocAgent, cutting cost by up to 43.5%, while maintaining strong accuracy.
Downstream Impact: Integrating GRAPHLOCATOR into Trae Agent and Agentless improves issue resolved rates, with Mermaid-serialized CIGs providing the highest gains (e.g., +5.67% on SWE-bench Lite), confirming that structured, causal context enhances downstream reasoning.

Source PDF

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Wei Liu Chao Peng Pengfei Gao Aofan Liu Wei Zhang Haiyan Zhao Zhi Jin

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Wei Liu Chao Peng Pengfei Gao Aofan Liu Wei Zhang Haiyan Zhao Zhi Jin

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Wei Liu Chao Peng Pengfei Gao Aofan Liu Wei Zhang Haiyan Zhao Zhi Jin

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters