1 of 9

GraphRAG for Exploring

Heterogeneous�Medical Knowledge

Giuseppe Futia, PhD

1

2 of 9

Building

Ontology Integration, Information Extraction, Data Enrichment, Virtualized Access

Our journey

Retrieving

GraphRAG Agent: from General Medical Knowledge to Virtualized Patient Details

Understanding

Medical Domain Challenges, Data Integration Strategies, Technological Solutions

https://github.com/giuseppefutia/cdl2025

2

3 of 9

Medical Domain - Data Integration Challenges

Diverse and disconnected data sources, each capturing different aspects of patient health and clinical practice.

Their differing formats, structures, and purposes make data integration and analysis both challenging and essential.

Heterogeneous medical data:

  • Structured information: EHR data capturing diagnoses, lab results, medications, vital signs, and procedures
  • Unstructured contents: Research publications, reports containing diverse medical information
  • Semantic data sources: Biomedical vocabularies and ontologies like ICD-10, SNOMED, and UMLS enabling standardized concepts

3

4 of 9

Medical Domain - Technological Challenges

Strict privacy requirements prevent the use of external services or third-party models, demanding fully self-contained infrastructures.

Sensitive medical data must remain on-premise, and systems must ensure secure access without duplicating information across storage layers.

Key features :

  • Privacy-centric architecture: Data must be processed locally, while avoiding exposure to external systems
  • Data virtualization: Instead of replicating sensitive data across pipelines, virtualized access layers reduce redundancy and minimizing privacy risks
  • Secure environments: Isolated execution environments and allow analytical workflows while keeping raw data protected and contained

4

5 of 9

Building - Knowledge Graph-based Architecture

Knowledge Graph

Ontology Integration

Information Extraction

Structured

Data Enrichment

Biomedical ontologies

Publications and Reports

Clinical Data

LLM

LLM

LLM

On premises

LLM

Chat

Embedding

Virtualization

5

6 of 9

Ontology Mapping and Information Extraction - A General Framework

Annotated

Data with Entities

Input �Data

Information Background

Multiple

Candidates

for Each Entity

Disambiguated

Entities

for each Entity

Candidate

Selection

Entity

Recognition

(Not always required)

Candidate Disambiguation

6

7 of 9

Retrieving - Knowledge Graph-based Architecture

Knowledge Graph

On premises

LLM

Chat

Embedding

Clinical Data

Direct Access

Virtualized

Access

GraphRAG

Agent

7

8 of 9

Retrieving - Knowledge Graph-based Architecture

Exploring Ontology-based Knowledge

Query Generation

GraphRAG

Agent

EXPLAIN Test

Relation Correction

Query Diagnostics

Query Execution

Runtime Data Materialization

Answer Generation

Exploring Virtualized Patient Data

Virtualized Data Analysis

Ontology Knowledge Analysis

Ontology-

based Answer on Patient Data

Analyzing Patient Date Through Ontologies

8

9 of 9

Thank You!

Discount code (45% off all Manning products) for the Connected Data London 2025 conference: connectdl25

9