1 of 53

Enhancing the Unified Phenotype Ontology to Support Cross-Species Phenotype Interoperability

4th uPheno Workshop, Biocuration, 5th April 2025

Organizers: Susan Bello, Arwa Ibrahim

2 of 53

Welcome to the 4th Unified Phenotype Ontology workshop!

  • Welcome
  • Introduction to uPheno (Nico Matentzoglu)
  • uPheno Patterns - What are they and how do you use them (Sue Bello)
  • Introduction of Challenges for the Workshop
    • Making the Upper Level Structure Work
    • Identify Structure and Integration Issues
  • Breakout Groups to Work on the Challenges
  • Wrap up of Breakout Group Results

3 of 53

uPheno in 2025 - Overview and Vision

4 of 53

An extreme simplification of the phenotype data workflow

Integration/Use

Observation

Coding/Curation

Clinician

Patient

Documentation

Clinical coder

EHR

Researcher

Specimen

Documentation

Biocurator

Scientific database

Code system

Ontology or vocabulary

?

More: https://obophenotype.github.io/upheno/reference/phenotype-data/#shape

5 of 53

Inconsistent language within the community

Source: Mammalian Phenotype Ontology

https://www.informatics.jax.org/vocab/mp_ontology/MP:0000274

macrocardia

enlarged heart

Image source: https://commons.wikimedia.org/wiki/File:Human_Heart.png

6 of 53

More inconsistent language across communities

Mouse community

Zebrafish community

MP:0000274

enlarged heart

ZFA:0000114, heart

PATO:0000586, increased size

Image source: https://commons.wikimedia.org/wiki/File:Human_Heart.png

7 of 53

Even more inconsistencies across the translational divide

Cardiomegaly

Mouse domain

Clinical domain

Zebrafish domain

Image source: https://commons.wikimedia.org/wiki/File:Human_Heart.png

8 of 53

Despite ontologies the inconsistent terminology problem got worse

ZFA/ZP

MP

DPO

WPO

HP

OMIA

VT

FYPO

APO

SNOMED

WB

PB

FB

OMIA

MGI

RGD

ZFIN

SGD

IMPC

OMIM

…ICD10

QTLdb

HPOA

EHR

XPO

XenBase

9 of 53

Phenotype core concepts 101

Characteristic

“Bearer”

Trait / biological attribute

phenotypic effect / phenotype

disease != phenotype

measurement != trait

https://obophenotype.github.io/upheno/reference/core-concepts/

10 of 53

Not just did ontologies proliferate, but also curation practices

  • Different levels of standardisation
  • Quantitative vs qualitative
  • Post-coordinated vs pre-coordinated

More: https://obophenotype.github.io/upheno/reference/phenotype-data/#shape

11 of 53

An extreme simplification of the phenotype data workflow

Integration

Observation

Coding/Curation

Clinician

Patient

Documentation

Clinical coder

EHR

Researcher

Specimen

Documentation

Biocurator

Scientific database

Code system

Ontology or vocabulary

?

12 of 53

An extreme simplification of the phenotype data workflow

Integration

Observation

Coding/Curation

Clinician

Patient

Documentation

Clinical coder

EHR

Researcher

Specimen

Documentation

Biocurator

Scientific database

The Unified Phenotype Ontology

Code system

Ontology or vocabulary

13 of 53

The Unified Phenotype Ontology (uPheno)

  • Integrates species specific phenotype ontologies using ontology design patterns
  • Currently focussed on vertebrates, but some integration for yeast and slime molds
  • Provides a computational framework for comparing phenotypes across species

14 of 53

The triumph of the Entity-Quality (EQ) pattern (Washington, 2009)

Quality

Entity

14

Hypolysinemia

=

Human phenotype

decreased amount

lysine

part of

blood

decreased circulating lysine level

=

Mouse phenotype

15 of 53

https://obophenotype.github.io/upheno/reference/data-integration/

16 of 53

uPheno is an ontology… and more

  • Post-composed annotations are easily be turned into fully classified pre-composed ontologies

PMID:35317743

abnormal

Modifier

decreased amount

lysine

part of

Characteris.

Entity

blood

“decreased level of lysine in blood” (UPHENO:0034327)

Same for ZP..

17 of 53

.. but most importantly, it is a community that facilitates collective growth and development of shared best practices

We need a pattern for an abnormal biological process in a location.

More than 5 community members have rated this pattern as finished.

What is the best way to share the mappings?

How can I build a pattern driven ontology?

How can we support international phenotyping?

18 of 53

Participating ontologies

Ontology

Taxon

Term Count

Mammalian Phenotype Ontology (MP)

Mammalia

14,206 (v2024-08-08)

Human Phenotype Ontology (HPO)

Homo sapiens

18,987 (v2024-08-13)

Zebrafish Phenotype Ontology (ZP)

Danio rerio

47,443 (v2024-04-18)

C. elegans Phenotype Ontology (WBPhenotype)

Nematoda

2,649 (v2024-06-05)

Drosophila Phenotype Ontology (DPO)

Drosophilidae

253 (v2024-04-25)

Dictyostelium Phenotype Ontology (DDPHENO)

Dictyostelium discoideum

1,017 (v2023-08-26)

Planarian Phenotype Ontology (PLANP)

Planaria

647 (v2020-03-28)

Xenopus Phenotype Ontology (XPO)

Xenopus

20,340 (v2024-04-18)

Fission Yeast Phenotype Ontology (FYPO)

Schizosaccha romyces pombe

8,056 (v2024-08-01)

Pathogen-host interaction phenotype ontology (PHIPO)

General

1,104 (v2024-04-04)

Molecular glyco-phenotype ontology (MGPO)

General

120 (v2024-04-18)

Ascomycete Phenotype Ontology (APO)

Ascomycota

342 (v2024-04-26)

19 of 53

The three core use cases

  1. Group phenotype data across species
  2. Map between taxon-specific phenotypes and traits
  3. Enable phenotypic profile comparisons across species (e.g. for diagnostics)

A

C

B

20 of 53

Where are we now, where are we going?

  • uPheno has been deployed on OLS and in other applications for browsing and retrieval
  • uPheno identifiers are finally stable and usable
  • Publication out: PMID:39345458
  • In progress:
    • The hierarchy is still a bit wonky in places (today!)
    • Some auto-generated classes are not very useful
    • The integration with species specific phenotype ontologies is not as good as it could be
    • (decide what to do about the scope)

21 of 53

uPheno Patterns - �What are they and how do you use them

22 of 53

Useful Links

Workshop channel in the ISB slack

#biocuration2025_upheno_workshop

23 of 53

uPheno Patterns

What are they and how do you use them

  • Logical definitions (also called EQs)
    • Machine readable definitions of phenotype terms built up using terms from other ontologies and connected by OBO Relations Ontology (RO) terms
    • Example: enlarged heart (MP:0000274)

24 of 53

uPheno Patterns

  • Variability in EQs makes it harder to integrate terms across ontologies
    • Different developers may pick different PATO, anatomy, GO terms when creating an EQ
  • Manually updating EQs, when decisions are made to change the way an EQ is constructed, is time consuming and error prone
  • A shared pattern lets us
    • Keep EQs consistent within and across ontologies
    • Updates may be done using ROBOT either on demand or as part of an ontology build pipeline like those in ODK
    • Updating the pattern file lets you update all terms using that pattern
  • Pattern directory - https://github.com/obophenotype/upheno/tree/master/src/patterns/dosdp-patterns#pattern-directory

25 of 53

Useful background: Osumi-Sutherland, D., Courtot, M., Balhoff, J. et al. Dead simple OWL design patterns. J Biomed Semant 8, 18 (2017). https://doi.org/10.1186/s13326-017-0126-0

26 of 53

27 of 53

‘Characteristic of’ vs ‘Characteristic of part of’

  • Prolonged discussions of when to use each of these relations
  • ‘Characteristic of part of’
    • Using this infers child terms down the tree(s) of the other ontology
    • Example ‘morphology (PATO) and characteristic of part of (RO) some heart (UBERON)’
      • Will include as child terms abnormal morphology of
        • Cardiac valve (anatomy)
        • Cardiomyocyte (Cell ontology)
        • Mitochondrion located in Cardiomyocyte (GO cellular component combined with anatomy)
      • We have agreed that use of ‘Characteristic of part of’ should be restricted to
        • Morphology
        • Physiology

28 of 53

29 of 53

30 of 53

31 of 53

32 of 53

Workshop Challenges

33 of 53

Workshop Challenges

  1. Improve the upper level structure of the ontology

  • Identify integration and organization issues

34 of 53

Challenge 1: Improve the upper level structure of the ontology

  • These terms are the first 2-3 layers in the ontology

  • Want these to help users browsing the ontology to intuitively find the correct branch for their terms of interest

  • Often used for grouping annotations for display

35 of 53

Upper Level Structure

36 of 53

Upper Level Structure

Preferred Root

High level/header/grouping terms

37 of 53

Upper Level Structure

Preferred Root

1st level of division

2nd level of division

38 of 53

Single species example

Source: MGI

39 of 53

Single species example

Source: Monarch Initiative

40 of 53

Multi-species example

Source: MGI - HMDC tool

41 of 53

Multi-species example (not phenotype)

Source: Alliance of Genome Resources

42 of 53

Possible Approach

  • Use specific subset tags to create multiple high level structures
  • Allows customized sets of high level terms to coexist to serve multiple different use cases
    • Users would need to agree on a set of terms and submit a GitHub ticket to get these added
  • To work the terms need to be present and with the correct set of descendants
  • We still want to avoid having redundant high level terms and we want terms to have the correct relationships to each other

43 of 53

Challenge 1 Questions

  • In the re-ordered header version are there redundant top level terms?

  • In the re-ordered header version are there missing top level terms?

44 of 53

Example

Why are all these absent terms directly under cell phenotype and not grouped in more useful ways?

45 of 53

Example

Why isn’t vocal organ phenotype a child of craniofacial/craniocervical phenotype?

46 of 53

Challenge 2: Identify integration and organization issues

  • The various species specific ontologies have been integrated using the patterns and lexical mappings
    • Not all terms have logical definitions, some complex phenotypes may be very hard/impossible to define logically
    • Integration relies on anatomy integration in Uberon
  • This integration is not perfect and we need help identifying areas where errors have occurred

47 of 53

Incorrect parent-child relationship example

48 of 53

49 of 53

Failure to group similar terms example 1

  • EQs both use GO ‘ossification’ but have differences in how they are constructed
  • Ideally these would be grouped under a uPheno term for ‘abnormal bone ossification’

50 of 53

Failure to group similar terms example 2

  • Both terms use the same EQ
  • Failure seems to be that the integration does not integrate abnormalities in a GO process that are part of the system

51 of 53

Break Out Sessions

  • The main goal today is to identify as many problems as possible with uPheno.. and brainstorm ideas for solutions
  • Every breakout group should prepare a short report to be presented at 4:30 to the whole group
  • Please call out any issues you see with uPheno, independent of your respective tasks

Please Add Notes to Shared Docs and Create Issues on the uPheno GitHub Repo

52 of 53

Summary From Breakouts

  • Impossible to browse as is (virtual and in-person), can’t find your way through the maze, can’t find the label you want
    • Need better search support
    • Need really better synonyms
    • Need better organization search results
  • Classification philosophy in plant world - upper level is designed to meet use cases
    • uPheno currently built using structures of other ontologies
  • Can we get measures of how closely related some pairs of terms are

53 of 53

Keeping the Momentum Going

  • Monthly editors meeting - contact Sarah (sarah@tislab.org) to receive an invitation to join
  • Join the Phenotype-Ontologies Slack to keep in contact https://join.slack.com/t/phenotype-ontologies/shared_invite/zt-31bgamfz8-P~T5AIZMe4pH64OnwMd5cw
  • Do you have a phenotype ontology you would like to see integrated into uPheno? Let us know on Slack!
  • Do you have a use case you’d like to present? Let us know on Slack!