1 of 24

Bringing KGs Together For Space Life Sciences

Presented By:

HBISS 3/12/2026

Amanda M. Saravia-Butler, Ph.D.

NASA AI4LS & OSDR

Contractor: Amentum

Peter Rose, Ph.D.

UCSD, Proto-OKN

2 of 24

Definition

A Knowledge Graph represents information as a network of entities and relationships.

Nodes (entities): genes, proteins, diseases

Edges (relationships): connections between entities

Example

Gene → associated with → Disease

Gene → encodes → Protein

Protein → participates in → Pathway

Drug → targets → Protein

SPOKE KG: https://spoke.ucsf.edu/

Knowledge graphs can integrate diverse biomedical data into a single connected structure that can be queried, visualized, and analyzed.

What is a Knowledge Graph (KG)?

3 of 24

Nodes and edges have properties

grounded by ontologies

KG with Properties

4 of 24

KGs for NSF Proto-OKN Project

  • SPOKE-GeneLab - Making space biology data accessible to AI applications
    • Integrates gene expression, DNA methylation, and metagenomics data across organisms and tissues
    • Enables comparisons of spaceflight samples with various control conditions (e.g., ground control)
    • Facilitates cross-species mapping to human genes for biological interpretation
    • Supports federated analysis with SPOKE to explore health effects of spaceflight

  • SPOKE-OKN - Connecting space biology to human health data
    • Subset of nodes and relationships extracted from the full SPOKE KG
    • Integrates Social Determinants of Health linked to disease prevalence
    • Captures environmental contaminant locations and exposure data
    • Includes U.S. geospatial hierarchy from state → county → city → ZIP code
    • Gene - disease relationships

5 of 24

NASA OSDR to KG

Data preparation

and upload into KG, incremental updates

Data query

(Cypher)

Dataset selection and download

Repository for spaceflight related OMICS data

OSDR API

GeneLab KG

(Neo4j Graph DB)

OSDR

Custom Pipeline

https://github.com/BaranziniLab/spoke_genelab

https://osdr.nasa.gov/

https://visualization.osdr.nasa.gov/biodata/api/

6 of 24

Currently Supported Omics Data from OSDR

7 of 24

GeneLab KG Schema v0.3.1

190 K nodes

50 M relationships

Ontology-Mapper

Ortholog-Mapper

SPOKE

entry points

GeneLab

Anatomical structures and cell types are

mapped to UBERON and CL ontologies

Model organism genes are mapped to human genes

8 of 24

GeneLab KG Node and Relationship Metadata

MetaNodes and MetaRelationships

  • Define KG schema
  • Define node and rel properties
  • Self-defining KG -> AI ready
  • Provides context to LLM

Example: MetaNode for Assay

Example: MetaRelationship for MEASURED_DIFFERENTIAL_EXPRESSION

9 of 24

GeneLab KG Cypher Query: Gene Methylation and Downregulation

This query finds datasets where a gene is differentially hypermethylated in the promoter region and downregulated for a Space Flight vs. Ground Control comparison.

10 of 24

SPOKE-GeneLab Composite KG Schema

SPOKE

GeneLab

Proxy nodes with corresponding identifiers

Composite KG used in MSV GUI

https://pmc.ncbi.nlm.nih.gov/articles/PMC7828077/

11 of 24

KG Query and Connection Considerations

  • Issues
    • SPOKE-GeneLab Composite KG is only available on MSV GUI
      • This is not publicly available
    • Physically connecting the GeneLab KG to external KGs, would need to be done for every KG we want to connect to, and to maintain the larger KG, it would need to be re-generated anytime one of the subgraphs are updated
    • Most users do not know how to make cypher queries rendering the GeneLab neo4j KG inaccessible to most users

12 of 24

KG Query and Connection Considerations

  • Possible Solutions
    • Create MCP server to query the GeneLab KG using natural language
    • Find (or create) MCP servers for publicly available KGs that have overlapping nodes with GeneLab KG
    • Short-term: Test natural language queries specifying which KG-MCP server to use, update MCP server(s) as needed
    • Long-term: Create larger MCP server for space life sciences with instructions for how to connect all relevant KGs using common nodes
      • Benefits:
        • Will only need to host/maintain the GeneLab KG, GeneLab KG MCP, and the larger space life sciences (SLS) MCP server
        • The SLS MCP server will always pull information from the current KG
        • Users will not have to specify which KG to call information from
        • The SLS MCP server can be registered in PyPI
          • Publicly available connectors can be created for each LLM

13 of 24

ChatGPT, Claude, …

query/prompt

response

MCP

MCP (Model Context Protocol)

Open standard for connecting tools and data to LLMs

https://modelcontextprotocol.io/

Web

AI Assistant + Tools

14 of 24

MCP-GeneLab Server for Querying GeneLab KG

GeneLab KG

Neo4j Property Graph

mcp-genelab

MCP Server

MCP Client

Claude Desktop

query/prompt

response

https://github.com/sbl-sdsc/mcp-genelab

https://github.com/BaranziniLab/spoke_genelab

Sonnet 4.6

Opus 4.6

Cypher

query

15 of 24

Public KGs To Consider: SPOKE-OKN

https://frink.renci.org/kg-stats/spoke-okn/

16 of 24

Public KGs To Consider: Prime KG (Harvard)

https://zitniklab.hms.harvard.edu/projects/PrimeKG/

17 of 24

Public KGs To Consider: Monarch KG (Multi-Institute)

https://monarchinitiative.org/kg/about

18 of 24

MCP Servers for Querying External KGs

MCP Server

MCP Client

SPOKE-OKN KG (Frink)

mcp-proto-okn

Claude Desktop

query/prompt

response

https://github.com/sbl-sdsc/mcp-proto-okn

https://frink.renci.org/kg-stats/spoke-okn/

SPARQL query

Prime KG (Neo4j)

mcp-harvard-primeKG

query/prompt

response

https://github.com/asaravia-butler/MCP_Harvard_PrimeKG

https://zitniklab.hms.harvard.edu/projects/PrimeKG/

Cypher query

Monarch KG (Neo4j)

mcp-monarchKG

query/prompt

response

https://github.com/asaravia-butler/MCP_MonarchKG

https://monarchinitiative.org/kg/about

Cypher query

19 of 24

Space Life Sciences Integrated KGs

20 of 24

Space Life Sciences MCP Server for Querying Multiple KGs

Integrated SLS KGs

mcp-SLS-KGs

MCP Server

MCP Client

Claude Desktop

query/prompt

response

https://github.com/asaravia-butler/MCP_Space_Life_Science_KGs

Sonnet 4.5

Opus 4.5

Cypher

query

21 of 24

Implementation Plan

Create publicly available connectors to the GeneLab KG MCP server and the SLS MCP server that can be connected to your favorite LLM.

genelabkg

URL (from OSDR)

22 of 24

Implementation Plan

Once the GeneLab KG MCP server and the SLS MCP server connectors are added, they can be activated in your favorite LLM to enable queries.

SLSkg

23 of 24

DEMOs

GeneLab KG + PubMed (OSD-244)

GeneLab KG + Monarch KG + PubMed (OSD-48)

GeneLab KG + SPOKE-OKN KG (OSD-161)

GeneLab KG + SPOKE-OKN KG + Monarch KG + PubMed (OSD-267)

24 of 24

ACKNOWLEDGEMENTS

SPOKE TEAM

PI: Sergio Baranzini (UC San Francisco)

  • Peter Rose (UCSD)
  • Sui Huang (ISB)
  • Scooter Morris (UCSF)
  • Angela Rizk-Jackson (UCSF)
  • Yongmei Shi (UCSF)
  • Sam Gebre (NASA)
  • Amanda Saravia-Butler (NASA)
  • Kirill Grigorev (NASA)
  • Aenor Sawyer (UCSF/NASA)
  • Charlotte Nelson (Mate Bioservices)

FRINK TEAM

  • Christopher Bizon (RENCI)
  • Jim Balhoff (RENCI)

FUNDING

  • NSF Award #2333819
  • NASA Science Mission Directorate
  • NASA Biological and Physical Sciences