1 of 39

Using ontologies to standardize

rare disease data collection

Nicole Vasilevsky

University of Colorado Anschutz Medical Campus

June 15, 2022

2 of 39

Overview of biomedical ontologies

How ontologies can be used for standardizing and integrating data and downstream analyses

Why you should contribute to ontology development efforts

How to contribute to ontologies

Overview

3 of 39

tislab.org/ontologycontributor

Documentation:

4 of 39

Overview of biomedical ontologies

5 of 39

Prevailing clinical diagnostic pipelines leverage only a tiny fraction of the available data

6 of 39

Ontologies are systematic representations of knowledge that can be used to integrate and analyze large amounts of heterogeneous data

DOI: 10.1056/NEJMra1615014

Image credit: https://kyndi.com/blog/creating-knowledge-and-maximizing-the-value-of-data-with-ontologies/

Defining entities and the relationships between them in a way that allows computational logical reasoning

What

How

What is an ontology?

7 of 39

A community of ontologists committed to a shared set of principles to build open biomedical ontologies.

http://obofoundry.org/

8 of 39

https://www.ebi.ac.uk/ols/index

9 of 39

Which is the right ontology to use?

  • Start by selecting the appropriate ontology, then search and restrict your search to that ontology
  • Recommend using ontologies that are open and interoperable. Focusing on OBO Foundry ontologies is a good place to start
  • Make informed decision about which ontology to use
  • Maybe the ontology you want to use does not have the term you want, so make a term request to that ontology

Image credit: https://solutionsreview.com/business-process-management/why-is-process-improvement-so-important/

10 of 39

How ontologies can be used for standardizing and integrating data and downstream analyses

11 of 39

11

What is the Monarch Knowledge Graph?

  • monarchinitiative.org
  • Integrator of cross species genotype-phenotype data
  • Uses OBO Foundry ontologies

doi.org/10.1093/nar/gkz997

12 of 39

Human Phenotype Ontology (HPO)

Over 15,000 phenotype terms

13 of 39

13

Ontologies are used in disease diagnostics

  • Integrating data across species
  • Fuzzy matching
  • Rare disease diagnostics
  • Make inferences about humans

doi.org/10.1093/nar/gkz997

14 of 39

Legend

Perfect Match

Fuzzy Match

No Match

  • Two patients with mutation in KMT2A
  • Not same variant
  • Phenotype similarity metric helped inform diagnosis of same disease

Differential diagnosis with similar but non-matching phenotypes is difficult

15 of 39

Mondo Disease Ontology

  • Over 20,000 disease terms
  • 17 source terminologies

https://www.medrxiv.org/content/10.1101/2022.04.13.22273750v3

16 of 39

Aligning disease knowledge across sources

Mondo aggregates synonyms and provides semantic mappings to source ontologies

17 of 39

How many rare diseases are there?

Overlap and unique rare disease concepts in 5 selected knowledge sources

Only 333 shared disease concepts in all five sources

Many diseases are in only one source

Intersection size

18 of 39

Why you should contribute to ontology development efforts?

19 of 39

Image credit: Nomi Harris, Monarch Initiative

Ontology development is a community effort

20 of 39

Ontologies are continuously iterated upon and improved

tislab.org/ontologycontributor

21 of 39

How to contribute to ontologies

22 of 39

How to contribute to ontologies

  • GitHub issue tracker: Report bugs, new term requests, change requests, etc. on the GitHub issue tracker
  • Edit the ontology file: make changes on a branch and do a pull request (advanced)
  • Join the discussion: Comment on tickets or discussion board
  • Join the conversation: Attend ontology calls

23 of 39

  • Online platform for hosting for software development projects

  • Provides version control using Git

  • Widely used in the ontology development community to:
          • store ontology files
          • version control
          • issue tracker

Sign up for free at: www.github.com

24 of 39

Ontology issue trackers

24

1

2

3

4

A tracker is a place to put a formal ontology request

Trackers have long been used in the software community for keeping track of bugs, feature requests, etc

Advantages:

  • Open
  • Documentation
  • Community can comment

Tracker IDs can be referenced in ontology metadata, such as in an editor note or definition annotation

25 of 39

How to request new terms (Mondo IDs) & changes

  1. GitHub tracker: New issue
  2. Pick appropriate template
  3. Fill in the information that is requested on the template below each header
  4. Please include:
    1. A definition in the proper format
    2. Sources/cross references for synonyms
    3. Your ORCID or the URL for your ClinGen working group
    4. Add any additional comments at the end
  5. Nicole will automatically be tagged
  6. Please email Nicole or comment on the ticket (Nicole will be emailed) if you have any additional questions or need the ticket is high priority

26 of 39

Synonym types

https://oboacademy.github.io/obook/reference/synonyms-obo/

Exact

Related

Narrow

Broad

An exact match

A word or phrase that has been used synonymously with the primary term name in the literature, but the usage is not strictly correct

A more specific term

A more general term

Excluded

Deprecated

Some synonyms are annotated with EXCLUDE, e.g. “NOS” (not otherwise specified) synonyms. It is useful to have these in the edit version, but these are filtered on release.

We may also mark synonyms with DEPRECATED. E.g. all occurrences of “mental retardation” should be “intellectual disability”

Scope

Type

E.g. hereditary Wilms' tumor

exact synonym: familial Wilms’ tumor

E.g. asthma

narrow synonym: exercise-induced asthma

E.g. autoimmune hepatitis

broad synonym, autoimmune liver disease

E.g. AGAT deficiency

related synonym: disorder of glycine amidinotransferase activity

27 of 39

Requesting changes to an OBO ontology

Request new term (or changes) on GitHub

Curators adds term to ontology, creates a Pull Request (PR)

Pull requests undergo review and are merged; changes are added to ontology-edit.obo file

Term is available in next release (varies between ontologies)

OLS is updated approx 7 days after release

Community and expert advice

YOU!

28 of 39

Application of ontologies in RDCA-DAP

29 of 39

  • Ontologies are used to standardize disparate data types
  • Standardized data is FAIR:
    • Findable
    • Accessible
    • Interoperable
    • Reusable
  • Enables better search:
    • Can search on ontology terms
    • Synonym search
    • Layperson synonyms
  • Computational analysis
    • Semantic similarity analysis
  • New hypotheses
  • Derive new knowledge

Use of ontologies in the RDCA-DAP

Disparate data types

  • Clinical trial data
  • Genomic data
  • Imaging data
  • etc.

Data standardization using ontologies

Standardized data is accessible in a cloud interface

Actionable rare disease drug development solutions

RDCA-DAP Workflow

30 of 39

Summary

Overview of biomedical ontologies

Structured knowledge covering a specific domain

Ontologies can be used to standardize and integrate data

E.g. phenotype, disease, genotype data for rare diseases

Why you should contribute to ontologies

Community resource, need expertise from various areas

How you should contribute to ontologies

GitHub - open a free account

31 of 39

Chris Mungall

Lawrence Berkeley National Lab

Acknowledgements

Melissa Haendel

University of Colorado

David Osumi-SutherlandEuropean Bioinformatics Institute

Nico Matentzoglu

Semanticly

This content is available at: https://oboacademy.github.io/obook/

Anne Thessen

University of Colorado

32 of 39

Thanks!

You can find me at:

nicole@tislab.org

@n_vasilevsky

Documentation:

https://oboacademy.github.io/obook/pathways/ontolgoy-contributor-c-path/

33 of 39

10% of the US population has a rare disease

34 of 39

80% of rare disease cases are genetic

35 of 39

Phenotype to disease annotations can be used for rare disease diagnoses

Disease to phenotype annotations

155,624 rare disease - phenotype annotations

136,268 common disease -phenotype annotations

36 of 39

Phenotype-driven Exome Analysis

Validated for the most difficult GEL diagnoses; top candidate correct in 67% of cases and executes in under 1 minute.

Exomiser

V 10.0

March 2018

bit.ly/exomiser-10

37 of 39

Recommendations for GitHub tickets/new term requests

General Recommendations:

  1. New term requests should not match existing terms or synonyms
  2. Write a concise definition in the definition field. More info about writing definitions is here
  3. Synonyms - please provide a source/cross-reference
  4. Check OMIM for children classes

We appreciate your contributions to extending and improving our ontologies

Formatting:

  • Preferred term labels should be lowercase (unless it is a proper name or abbreviation)
  • Write the request below the prompts on the template so the Markdown formatting displays properly
  • Synonyms should be lowercase (with exceptions above)
  • Definition source - if from PubMed, please use the format PMID:XXXXXX (no space)
  • Include the Mondo ID and label for the parent term
  • List the children terms with Mondo ID and label in a bulleted list

38 of 39

Writing Ontology Definitions

https://philpapers.org/archive/SEPGFW.pdf

39 of 39

Term search �and request workflow

Adapted from:

https://douroucouli.wordpress.com/2021/07/03/how-select-and-request-terms-from-ontologies/

Search using search strings

Return term list

Assume no results…

Examine list

Confirm, these are analogous to what you need

Check the parent ontology

Examine tickets

This is the term you are looking for

Check definitions and parent ontologies

Reads ticket

Make new term locally

Search for similar terms

Sibling terms list

Search for relevant discussion

Return tickets

Post term request

You

Portal

GitHub

Curator

You

Portal

GitHub

Curator

New term ID

Check

Term ID

Acknowledge

Make new ontology release