1 of 18

The GA4GH Phenopacket Schema

A computable representation of clinical data

Monica Munoz-Torres, PhD, Jules Jacobsen, PhD, Peter Robinson, MD, MS- on behalf of all our coauthors.

Associate Professor, Department of Biomedical Informatics

Translational and Integrative Sciences Lab, Center for Health Artificial Intelligence

University of Colorado Anschutz Medical Campus

GA4GH

Bioinformatics Open Source Conference at Intelligent Systems for Molecular Biology | 13 July, 2022

monarchinitiative.org | @monimunozto | These slides: bit.ly/pp-ismb22

2 of 18

Global Alliance for Genomics & Health (GA4GH)

Aims to accelerate progress in genomic science and human health by developing standards �and framing policy for responsible genomic and health-related data sharing.

3 of 18

Standard exchange formats existed for �genome sequences but not for phenotypes

M. Munoz-Torres. BOSC at ISMB 2022.

Genes

Phenotypes

VCF

GFF

BED

We needed a standard way to share case-level phenotypic information -

not free text, a candidate diagnosis proxy, or full EHR data exported via PDF

PXF

Phenopacket

NEW

4 of 18

Phenopackets improve phenotype description

  • Diagnostics
  • Mechanism Discovery
  • Integration with environmental health data

M. Munoz-Torres. BOSC at ISMB 2022.

How severe

are these?

Are some more severe than others?

When were

they first

observed?

Were they

NOT observed?

How are these

linked to

a patient?

What about

the parents

and siblings??

5 of 18

A Community Effort

Requirements and specifications were established with a community of researchers and clinicians.

Underwent a rigorous peer review and product approval process.

v 1.0 was released in 2019. And v 2.0 was developed on the basis of the feedback we received from the community; expanded the data model to include better representation of temporality, medical actions, and quantitative measures.

M. Munoz-Torres. BOSC at ISMB 2022.

6 of 18

The Phenopacket Schema

  • Provides sufficient, shareable information of the data outside the EHR - enabling capture of structured data at point of care, to be shared with other labs or for computational analysis in clinical or research environments.
  • Provides a complete computational object model of the data as opposed to a relational table schema - e.g., JSON objects rather than TSV files.
  • Supports exchange of computable, longitudinal case-level phenotypic information.
  • Supports diagnosis of, and research on, all types of disease.

M. Munoz-Torres. BOSC at ISMB 2022.

7 of 18

Schema Definition

Formally defined using Google’s Protocol Buffers; protobuf3 - https://developers.google.com/protocol-buffers

  • protobuf3 is language-neutral and allows faster (de)serialization than many other schema languages (XML, JSON). It is simpler to use due to automatic validation of data objects.
  • protobuf schemas can be compiled into many different language implementations (e.g. Python, Java, Rust, JavaScript, etc.) allowing efficient object transfer between services without developers needing to write their own implementation.

M. Munoz-Torres. BOSC at ISMB 2022.

8 of 18

Phenopacket Schema Overview

M. Munoz-Torres. BOSC at ISMB 2022.

9 of 18

What’s in a Phenopacket?

  • Individual
  • Phenotypic Features
  • Measurements
  • Biosamples
  • Interpretation
  • Diseases
  • Medical Actions
  • Files
  • Meta Data

M. Munoz-Torres. BOSC at ISMB 2022. Slides at bit.ly/pp-ismb22

https://phenopacket-schema.readthedocs.io/en/latest/phenopacket.html

10 of 18

Individual

Identifier, date of birth, age (time range), sex of the patient and their vital status – whether alive or not and, if deceased, the reason for their death.

M. Munoz-Torres. BOSC at ISMB 2022.

individual:

id: "patient:0"

dateOfBirth: "1937-03-01T00:00:00Z"

sex: "MALE"

vitalStatus:

status: "DECEASED"

timeOfDeath:

timestamp: "2019-10-06T10:54:20.021Z"

causeOfDeath:

id: "NCIT:C36263"

label: "Metastatic Malignant Neoplasm"

11 of 18

Phenotypic Features

Typically, characteristics which are more descriptive �than quantifiable such as ‘anosmia’, ‘fever’, and ‘dyspnea’

M. Munoz-Torres. BOSC at ISMB 2022.

g. evidence

12 of 18

Measurements

Quantitative and qualitative (yes/no, red/white/blue…) descriptions of a patient or biosample, with temporality (timestamp, time range, age, age range, ontology term)

M. Munoz-Torres. BOSC at ISMB 2022.

13 of 18

Medical Actions

Covers pharmaceutical treatments and surgical procedures, radiation therapy, and therapeutic regimens.

M. Munoz-Torres. BOSC at ISMB 2022.

14 of 18

Phenopackets and other clinical data standards

M. Munoz-Torres. BOSC at ISMB 2022.

Variation Representation Specification (VRS)

  • Molecular variation
  • Precision and expressivity
  • Data-driven design
  • + VRSATILE
  • Wagner Lab et al.

GA4GH

15 of 18

Documentation, Repo, Users, & Use Cases

M. Munoz-Torres. BOSC at ISMB 2022.

16 of 18

Phenotype data exchange �in the biomedical ecosystem

M. Munoz-Torres. BOSC at ISMB 2022.

Phenopackets can improve the speed and accuracy of diagnosis as well as treatment effectiveness

17 of 18

State-of-the-art of patient phenotyping

M. Munoz-Torres. BOSC at ISMB 2022.

18 of 18

Thank you!

The GAGH Phenopacket Modeling Consortium

Julius O. B. Jacobsen, Michael Baudis, Gareth S. Baynam , Jacques S. Beckmann , Sergi Beltran, Orion J. Buske, Tiffany J. Callahan, Christopher G. Chute  , Mélanie Courtot  , Daniel Danis , Olivier Elemento  , Andrea Essenwanger, Robert R. Freimuth, Michael A. Gargano, Tudor Groza, Ada Hamosh  , Nomi L. Harris  , Rajaram Kaliyaperumal, Kevin C. Kent Lloyd  , Aly Khalifa  , Peter M. Krawitz, Sebastian Köhler, Brian J. Laraway, Heikki Lehväslaiho, Leslie Matalonga, Julie A. McMurry, Alejandro Metke-Jimenez, Christopher J. Mungall, Monica C. Munoz-Torres, Soichi Ogishima, Anastasios Papakonstantinou, Davide Piscia, Nikolas Pontikos, Núria Queralt-Rosinach, Marco Roos, Julian Sass, Paul N. Schofield  , Dominik Seelow, Anastasios Siapos, Damian Smedley, Lindsay D. Smith, Robin Steinhaus , Jagadish Chandrabose Sundaramurthi , Emilia M. Swietlik, Sylvia Thun , Nicole A. Vasilevsky , Alex H. Wagner, Jeremy L. Warner, Claus Weiland , Melissa A. Haendel and Peter N. Robinson .

JOBJ, MCMT*, CGC, TG, AH, NLH, JAM, CJM, DS, NAV, MAH*, and PNR* are funded by NIH at NHGRI RM1 HG010860, OD R24OD011883, and *NLM 75N97019P00280.

M. Munoz-Torres. BOSC at ISMB 2022.

GA4GH