1 of 44

�

Jiawei Han

Siebel School of Computing and Data Science

University of Illinois at Urbana-Champaign

August 3, 2025

Reasoning with Structures for Large Language Models

2 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

3 of 44

Why Is Theme-Specific Knowledge Graph a Critical Structure?

LLMs: Power and limitations

LLMs are trained from massive general data
But specific problem solving often needs to go deep and current
Theme-specific retrieval should be obtained by task-specific retrieval
Will RAG (retrieval augmented generation) be sufficient for complex reasoning/problem solving?

Structures can help problem solving

Structures have helped human learning, understanding, reasoning, and discovery
The general KGs could be too general to help solving specific problems
We need to use theme-specific and task-specific structures/graphs

Research questions

How can we construct theme-specific and task-specific graphs automatically?
How can we use such graphs efficiently for structure-augmented LLM reasoning?

4 of 44

Empowering LLMs―Prompting, Fine-Tuning, RAG & Structuring

Prompt Engineering

Require low model modification & external knowledge, focusing on harnessing the capabilities of LLMs themselves

Fine-tuning: Involve further training the model
RAG: Integrating external knowledge
Active Research Directions

Structured Retrieval
RAG + Structuring
Fine-tuning + RAG + structuring

Figures adapted from Y. Gao et al, RAG Survey. arXiv:2312.10997

O. Ovadia, et al (2023), “Fine-tuning or retrieval? comparing knowledge injection in LLMs,” arXiv:2312.05934

Retrieval and Structuring?

Fine-tuning +

Retrieving + Structuring?

5 of 44

A Retrieving-Structuring-Reasoning Framework

Text & Multimodal Data

General KB

Query/task-guided Theme-focused Information Retrieval

Causal Graph

Selected, Distilled, Relevant Documents

Task-specific Structure Mining

& Graph Construction

Knowledge with Quality Reasoning

User Query/Task

LLMs

Event Structure

Multiple Theme- or Function- Specific Knowledge Graphs

Aspect Graph

Task- and Structure-based Augmentation for LLM Generation

Retrieving

Structuring

Reasoning

6 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

7 of 44

StructRAG: Motivation and Methodology

Li et al., "StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information”, ICLR 2025.

How to better leverage LLMs to transform scattered information into various structure formats
hybrid information structuring mechanism: different tasks require different knowledge structure representations for more precise reasoning

Hybrid Structure Router: select the most optimal structure type from five candidate structure types
Scattered Knowledge Structurizer: extracts the textual knowledge scattered across raw documents for reconstruction
Structured Knowledge Utilizer: LLM-based knowledge utilizer to facilitate question decomposition, precise knowledge extraction, and final answer inference

8 of 44

StructRAG: Experiments and Analyses

9 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

10 of 44

Do We Need Knowledge Graphs for LLM Reasoning?

KARE: Knowledge Aware Reasoning-Enhanced Framework

Improve healthcare predictions with retrieval and LLM reasoning
Integrate knowledge graph (KG) community-level retrieval with LLM reasoning to enhance healthcare predictions

Three steps

Medical concept knowledge graph construction and indexing

A dense medical knowledge structuring approach enables accurate retrieval of relevant information

Patient context construction and augmentation

A dynamic knowledge retrieval mechanism enriches patient contexts with focused, multi-faceted medical insights

Reasoning-enhanced precise healthcare prediction

A reasoning-enhanced prediction framework leverages these enriched contexts to produce both accurate and interpretable clinical predictions

Pengcheng Jiang, Cao Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han, "Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval", Int. Conf. on Learning Representation (ICLR’2025)

11 of 44

Retrieval and Structuring for LLM-Empowered Reasoning

KARE: Knowledge Aware Reasoning-Enhanced Framework

Improve healthcare predictions with retrieval and LLM reasoning
Integrate knowledge graph (KG) community-level retrieval with LLM reasoning to enhance healthcare predictions

Three steps

Medical concept knowledge graph construction and indexing

A dense medical knowledge structuring approach enables accurate retrieval of relevant information

Patient context construction and augmentation

A dynamic knowledge retrieval mechanism enriches patient contexts with focused, multi-faceted medical insights

Reasoning-enhanced precise healthcare prediction

A reasoning-enhanced prediction framework leverages these enriched contexts to produce both accurate and interpretable clinical predictions

12 of 44

KARE: The General Framework

13 of 44

Step 1: Medical Concept Knowledge Graph Construction and Indexing (1)

Constructs a comprehensive medical concept knowledge graph by integrating information from multiple sources, organizing it into a hierarchical community structure

Allows for the generation of community summaries that facilitate precise knowledge retrieval

For each medical concept c_i in HER system, extract a c_i-specific KG Gc_i = (Vc_i, Ec_i) from 3 sources:

Biomedical KG (e.g., UMLS)
Biomedical Corpus (e.g., PubMed)
LLMs: Prompt the LLM to identify the relationships among the concepts that are helpful to the clinical predictions

Allow LLM to add intermediate relationships between two concepts

14 of 44

Step 2: Patient Context Construction and Augmentation

Base Context Construction: For a patient p, construct a base context: (1) task description, (2) the patient’s conditions, procedures, and medications, and (3) similar patients: one has the same label as patient p and the other has a different label
Context Augmentation: Enrich p’s base context with relevant info from the KG and select the most relevant summaries for context augmentation

Patient Base Context

Ensure the augmented context includes the most relevant and diverse info from the KG, tailored to the patient’s specific conditions and the prediction task

15 of 44

Step 3: Reasoning-Enhanced Precise Healthcare Prediction

Training Sample Generation: Generate reasoning chains in a unified format for each patient p and task τ .

Entering (1) the task description, (2) the augmented patient context, and (3) the corresponding ground truth label
The LLM generates K reasoning chains along with confidence levels.
We select the reasoning chain with the highest confidence, ensuring that only the most reliable explanations are used

Multitask-based Fine-tuning and Prediction

We fine-tune a relatively small local LLM (7B) to perform both reasoning chain generation and label prediction
The model is trained using (i) task description and (ii) the augmented patient context, with a prepended instruction
Prediction: Given a new patient and task, we provide the appropriate instruction to the fine-tuned model to generate the reasoning chain or predict the label

16 of 44

Experiment Setting: Task, Data and Metrics

Tasks: EHR-based prediction

Mortality Prediction: Estimates mortality outcome for next visit (Patient’s survival status during visit x_t)
Readmission Prediction: Predicts if patient will be readmitted within σ days (σ is set to 15 in this study)

Datasets: Use the publicly available MIMIC-III (v1.4) and MIMIC-IV (v2.0) EHR datasets

Use PyHealth (Yang et al., 2023a) for preprocessing, …

Evaluation Metrics: Four key metrics:

Accuracy: Overall correct predictions across both outcomes
Macro-F1: A balanced measure, crucial for the imbalanced datasets
Sensitivity: Model’s ability to identify patients at risk of mortality or readmission
Specificity: Identify patients unlikely to experience these outcomes, helping avoid unnecessary measures

17 of 44

Performance Comparison on MIMIC-III Dataset

Results are averaged by multiple runs. asterisk (∗): important for handling imbalanced datasets.

18 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

19 of 44

RepoGraph: Background and Motivation

Real-world software engineering often extends beyond single function or self-contained code files:
navigating complex structured code bases

understanding intricate dependencies between code file
ensuring that changes integrate seamlessly without introducing new issues

Ouyang et al., "RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph", ICLR 2025

A perfect testbed for RAS in engineering domain!

In this section, we introduce RepoGraph, a RAS framework designed for github repo-level coding tasks. It is also published in ICLR this year.

The motivation comes from the nature of real-world software engineering, which rarely involves isolated functions or files. Developers always need to navigate complex codebases. They must understand complex dependencies that exist between different code files and modules.

Besides, they need to ensure that any changes they make should integrate with the existing code, without introducing new bugs or issues. These challenges make large codebase an ideal testbed for developing and evaluating RAS techniques within the coding domain.

Figure 1 compares simple coding tasks with github repo-level ones. The first part shows a standard function-level problem, solvable within a few lines. Another part, however, shows a real issue from the SWE-Bench dataset involving a concrete real-world python library.

Resolving this issue requires understanding dependencies related to many functions across multiple files. It involves navigating the complex codebase to find the relevant logic. And finally, it requires generating a code patch that fixes the issue without breaking existing functionality, verified by the corresponding test cases.

Table 1 compares RepoGraph with existing methods. While some tools focus on line or file levels, RepoGraph aims for comprehensive representation, retrieval and structuring across line, file, and repo levels.

20 of 44

RepoGraph: Methodology

Graph construction comprises of three steps: [step 1] - code line parsing using static analysis tools; [step 2] - project-dependent relation filtering; and [step 3] - graph organization
Utility includes integration with procedural and agent frameworks, making RepoGraph versatile

This slide details the methodology behind RepoGraph.

The first part in the Figure is construction of the RepoGraph itself.

Given a code repo, the process starts by parsing the code, often using Abstract Syntax Trees, to build a graph structure.

This graph consists of nodes representing code elements like classes, functions. They are classified as either 'definitions' or 'references'. Edges represent relationships such as 'invoke' or 'contain'. Each node stores rich information, including its name, file path, code kind, category, and line number.

The graph construction process has three key steps: First, detailed code line parsing using static analysis tools to identify elements and relationships. Second, project-dependent relation filtering to remove noise or irrelevant connections. Third, graph organization to structure the information effectively.

Parts (b) and (c) demonstrate RepoGraph's utility by integrating it into different workflows. For procedural frameworks like sequential debugging or patching, RepoGraph enhances steps like localization or edition by adding relevant sub-graph retrieval results as context.

For agent-based frameworks, interacting with RepoGraph using an action like 'search_repograph' becomes part of the agent's capabilities. The agent can query the graph, receive node information from the retrieved sub-graph as observation, and use this to inform its planning and subsequent actions. This flexibility makes RepoGraph adaptable to different AI software engineering systems.

21 of 44

RepoGraph: Experiments and Analyses

RepoGraph brings consistent performance gain for all combinations of frameworks and LLM model bases.
Performance gain brought by RepoGraph is slightly larger on procedural frameworks than agent ones.
Performance gain brought by RepoGraph does not rely on more costs.

The context included by RepoGraph is comprehensive.
Node and edges grow exponentially when k increases. Flattening the graph increases the tokens. Trade-off of token context comprehensiveness and the ability of LLMs to deal with it.

Recall improves at all granularities; the improvement at finer granularity is relatively smaller.

RepoGraph brings significant benefit to open-source LLMs, on traditional coding tasks

Here are the results from the RepoGraph paper.

Table 2 presents results on the SWE-Bench dataset, comparing RepoGraph integration with open-source procedural and agent frameworks. The key metric here is the percentage of issues successfully resolved. From the results, adding RepoGraph provides a consistent performance gain.

The performance improvement from RepoGraph appears slightly larger in procedural frameworks compared to agent-based ones. An important finding is that this performance gain does not require substantially higher computational cost or token usage, indicating efficient knowledge integration.

Table 4 analyzes different RepoGraph retrieval methods and representation formats. It highlights a trade-off: retrieving more context could substantially increase the number of nodes, and tokens. While providing more comprehensive context, this also challenges the LLM's ability to process longer inputs. Summarizing the retrieved graph using an LLM can reduce the token count while preserving the resolve rate.

Table 3 focuses on the accuracy of localizing the necessary code edits. RepoGraph consistently improves the ability to identify the correct file, function, and even line number for edits compared to baseline methods across recent frameworks.

Finally, Table 5 demonstrates RepoGraph's value beyond complex repo-level tasks. When applied to traditional coding tasks from the CrossCodeEval benchmark using open-source LLMs like Deepseek-Coder, RepoGraph still provides significant benefits, boosting performance on metrics like Code Match and Identifier Match.

22 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

23 of 44

Why SARG: Structure-Augmented Reasoning Generation?

Standard RAG treats evidence as flat context, lacking the structure required to model true causal dependencies
Classical causal inference focuses on statistical associations and are not well equipped to extract narratives from unstructured, implicit text, across documents
Reasoning Generation integrates zero-shot triple extraction and theme-aware graph chaining into the retrieved text, enabling structured multi-hop inference
Given a domain specific corpus, it constructs a DAG of ⟨cause, relation, effect⟩ triples and uses forward/backward chaining to guide structured answer generation
Experiments on two real-world domains: Bitcoin price (BP) & Gaucher disease (GD)

SARG outperforms standard RAG and zero-shot LLMs on chain similarity, information density, lexical diversity, LLM-as-a-Judge, and human evaluations

Explicitly modeling causal structure enables LLMs to generate more accurate and interpretable responses, especially in specialized domains where flat retrieval fails

Jash Parekh, P. Jiang, J Han, "Structured Multi-Hop Augmented Reasoning Generation“, arXiv:2508

24 of 44

The SARG Framework

SARG: Structuring and Construction of Reasoning Graphs for LLM Reasoning Generation

From a domain-specific dataset, an LLM extracts zero-shot causal triples, which are structured into a DAG. Given a query, the system identifies semantic matches, performs forward or backward traversal to extract causal chains and generates a justification-based answer using an LLM

25 of 44

Zero-Shot Extraction of Causal Triples

SARG uses GPT-4o to extract both explicit and implicit ⟨cause, relation, effect⟩ triples from unstructured text, without requiring labeled training data

Cause refers to an entity, event, or action that triggers an outcome, even if the causal connection is not explicitly stated
Relation is a causal verb or phrase (e.g., caused, led to, resulted in, triggered, influenced), or an inferred connection that is understood contextually
Effect represents the resulting entity, event, or action, regardless of whether the causal relationship is directly stated in the text

Zero-Shot Causal Triple Extraction

G represents the constructed knowledge graph
paths capture multi-hop inference trajectories

26 of 44

SARG Methodology: Graph-Based Multi-Hop Reasoning

Entity Extraction and Graph Construction

For each document 𝑑𝑖 ∈ D, we extract a set of knowledge triples 𝑇𝑖 using zero-shot prompting:

Each triple 𝑡 = ⟨𝑐, 𝑟, 𝑒⟩ ∈ 𝑇𝑖 represents a directed relationship where 𝑐 and 𝑒 are entities and 𝑟 is the relation type
The complete set of extracted triples constructs a directed knowledge graph G

Entity Clustering: Cluster semantically similar entities w. SentenceBERT embeddings

Reduce graph fragmentation & improve reasoning chain connectivity

Semantic Node Matching: To identify relevant starting points for graph traversal, perform semantic matching between query terms and graph nodes
Graph-Based Multi-Hop Traversal

Direction classification: {forward, backward, bi-directional}
Path discovery (by depth-first traversal)

27 of 44

SARG: Chain Ranking and LLM-Guided Answer Generation

Chain-Ranking and Selection: All reasoning chains are scored using semantic similarity between the original query and the aggregated evidence

This semantic scoring ensures that selected chains are both on-topic and coherent in supporting the user’s question
Top-𝑘 chains are selected for final generation

LLM-Based Answer Generation: synthesize selected reasoning chains into a coherent response

Evidence Compilation: For each selected chain 𝑐, compile supporting evidence by retrieving original text snippets that led to each triple, creating structured evidence packages E(𝑐) with source traceability
Prompt Construction: 𝑃gen = 𝑃inst ⊕ 𝑃query ⊕ 𝑃chains ⊕ 𝑃evidence

where ⊕ denotes concatenation, combining task instructions, the original query, serialized reasoning chains, and compiled evidence.

Response Generation. The final response is generated as: 𝑟 = LLMgen (𝑃gen)

The model is instructed to maintain logical coherence with provided reasoning chains while producing natural, human-readable text

28 of 44

LLM-Powered Output Generation with Justification

Case study comparing SARG w. expert-annotated triples for answering a biomedical question
SARG successfully reconstructs a multi-hop pathway, which the human-annotated KG fails to recover

SARG’s key advantages:

Plug-and-Play Compatibility: integration into any existing RAG pipeline
Interpretability: graph and chain selection provides clear reasoning visibility
Domain Adaptability: zero-shot triple extraction; no domain-specific training

29 of 44

Performance Comparison: SARG vs. RAG vs. Zero-Shot

BERTScore (Chain similarity): semantic similarity using contextual embeddings from RoBERTa-large
Conciseness (Info. density): Ratio of content words to total words, scaled by the inverse log of response length
FactCC (Factual Consistency): Ensure that generated answers remain grounded in the retrieved corpus

Data Statistics: BP (Bitcoin Price) and GD (Gaucher Disease)

Automatic Evaluation: SARG vs. RAG vs. Zero-Shot

30 of 44

Evaluation of Summarization Quality by LLM and Human

LLM-as-a-Judge: A blinded LLM-as-a-Judge evaluation

Using a panel of four LLMs: GPT-4, GPT-4o, LLaMA 3.1-8B-Instruct, and Mistral-7B-Instruct
Each judge model was prompted with a fixed template and shown anonymized answers from all systems. Models were asked to select the best response based on accuracy, interpretability, and conciseness
Use majority voting to determine the preferred answer

Human Evaluation: Aggregate the preferences of three independent reviewers for each form to assess performance

Accuracy on a HotPotQA hard-100 subset

Human evaluation results showing preferred responses

across BP and GD datasets. #s indicate votes received out of total questions evaluated

31 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

32 of 44

Aspect-based Reasoning Structure Extraction

Reasoning structure often needs to incorporate multiple aspects

Scientific claims are often nuanced: do not have a clear “yes” or “no” answer
Need to break down such claims into specific aspects

E.g., Pfizer vaccine is better than Moderna

Identify which aspects have been explored within a scientific corpus, and which have scientific consensus behind them (or the lack thereof)

Priyanka Kargupta, Runchu Tian, Jiawei Han, "Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims", ACL 2025

33 of 44

Framework of ClaimSpect: Hierarchical Aspect Discovery

Coarse-Grained Aspect Discovery: Directly prompt LLM to generate the coarse-grained aspects

Ex. Vaccine → Efficacy, Safety, Immunogenicity, Cost+Accessibility, Manufacturing+Distribution

Find relevant, diverse, specific keywords about each aspect by label-guided retrieval

Ex. Safety → adverse effects, anaphylaxis, immune response…

Corpus Segment Ranking: Find relevant, diverse, specific chunks w. discriminative ranking
Sub-aspect Discovery: Use highly ranked corpus chunks to discover the sub-aspects

Ex. Safety → safety for children, safety for elderly, …

Hierarchical Segment Classification:

Ex. TELEClass (WWW’25)

Perspective Discovery: Determine the scientific consensus (or lack thereof) behind a given aspect

Ex. Stance detection w. statistics

34 of 44

ClaimSPECT: Performance Comparison

Comparison between ClaimSpect and all baselines

Pairwise comparisons between all methods for each dataset

Incon (Inconsistent): when the position of the methods are flipped in prompt, the opposite conclusion is drawn

Dataset statistics in experiments

Promt for generating nuanced claims

Task: Generate 10 nuanced and diverse claims based on this corpus. The claims should adhere to the following criteria:

Diversity: The claims should be sufficiently varied

Complexity: The claims should be complex and controversial (and not necessarily true) …

Research Feasibility: The claims should not be too specific and should pertain to topics ...

Concision: The claims should be concise and focused in one short sentence

Completeness: The claims should be complete and not require additional context to understand.

Output: Provide the claims as a list.

35 of 44

ClaimSPECT: Case Study

The perspectives mapped to the root node are informative, providing justification behind each stance.
ClaimSpect maps segments to each perspective → can identify the original paper sources and ultimately provide a corpus-specific estimate of the consensus

36 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

37 of 44

Synergizing Unsupervised Episode Detection with LLMs

Episodes are the most interpretable granularity for evolving events

Most event detection & analysis is either too coarse or fine-grained

Existing methods focus on document-level key events or phrase-level actions. As events evolve, humans typically comprehend them at the episode-level

Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao, Siru Ouyang, Jiawei Han, "Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events", ACL 2025

38 of 44

Challenges of Mining Unsupervised Episodes with LLM

Challenge 1: No clear timestamps

Episodes occur within articles and lack clear temporal markers, implied based on sequence

Writer tends to naturally partition articles by episode (putting one episode together)

Challenge 2: Semantically diverse

Actions within an episode may look very different

Challenge 3: Articles only cover partial events

Need to merge overlapping episode fragments to reconstruct full episodes

39 of 44

EpiMine: Unsupervised Episode Detection

EpiMine: term mining, segment-level partitioning, LL-enhanced episode estimation, classification

40 of 44

EpiMine: Experiments and Performance Comparison

EpiMine: Data Statistics

Results averaged across each theme (the mean # of episodes that EpiMine identifies per theme is in parenthesis). Results are computed on each key event corpus using the top-5 documents for each detected episode. We run it 10 times and report the average of each measure.

41 of 44

EpiMine: Case Study

Gold and detected episodes (a max. of five are included for brevity) for the “2019 Hong Kong Legislative Protests” key event

42 of 44

Outline

Reasoning with knowledge structures
StructRAG: Boosting Knowledge Intensive Reasoning with Hybrid Information
KARE: A Knowledge Aware Reasoning-Enhanced Framework
RepoGraph: Enhancing AI Software Engineering with Repository-level Coding Graph
SARG: Structure-Augmented Reasoning Generation
Aspect-based Reasoning Structure Extraction
Sequence-based Reasoning Structure Extraction
Looking forward: Multi-Structure-Augmented Reasoning Generation

43 of 44

Looking forward: Graph Mining & Structure-Guided LLM Generation

Text & Multimodal Data

General KB

Query/task-guided Theme-focused Information Retrieval

Causal Graph

Selected, Distilled, Relevant Documents

Task-specific Structure Mining

& Graph Construction

Knowledge with Quality Reasoning

User Query/Task

LLMs

Event Structure

Multiple Theme- or Function- Specific Knowledge Graphs

Aspect Graph

Task- and Structure-based Augmentation for LLM Generation

Retrieving

Structuring

Reasoning

Data Mining could be an important step for LLM!!

44 of 44

References for Part 4: “Reasoning with Structures for LLMs

Pengcheng Jiang, Cao Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han, "Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval", ICLR’2025
Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao, Siru Ouyang, Jiawei Han, "Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events", ACL 2025
Priyanka Kargupta, Runchu Tian, Jiawei Han, "Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims", ACL 2025
Zhuoqun Li, Xuanang Chen, Haiyang Yu, Hongyu Lin, Yaojie Lu, Qiaoyu Tang, Fei Huang, Xianpei Han, Le Sun, Yongbin Li, "StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information”, ICLR 2025
Jash Parekh, Pengcheng Jiang, Jiawei Han, "CC-RAG: Structured Multi-Hop Reasoning via Theme-Based Causal Graphs“, arXiv:2506.08364
Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, Dong Yu, "RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph", ICLR'25