1 of 25

�����Computational identification of disease models through cross-species phenotype comparison�

�Diego A. Pava, Pilar Cacheiro, Damian Smedley

2 of 25

What is IMPC

Created with BioRender.com

Mendelian diseases

�Hepatic steatosis MP:0002628

IMPC’s objectives:

  • Mouse KO catalogue
  • Elucidate function of all mouse protein coding genes
  • Find new models for human disease

Standardised phenotyping

  • Initiative to characterise the function of every protein coding gene

    • Produces resources for external research:

              • KO mouse lines
              • Mouse data

3 of 25

QMUL and IMPC

Our role:

  • Data analysis of the data generated by the IMPC pipeline�
  • Predict human diseases associated with a gene�
  • Aid in finding novel disease models�
  • PhenoDigm algorithm implementation

4 of 25

PhenoDigm and application within IMPC

  • Phenotype comparisons for DIsease Genes and Models (PhenoDigm)
  • This quantitative measure can tell us how well our mouse models recapitulate the clinical features of a disease 1
  • Ontologies: Mammalian Phenotype Ontology (MP) and Human Phenotype Ontology (HPO)
  • Disease HPO terms aligned with MP terms and produce a percentage similarity score

1Smedley et al. 2013 Database

PhenoDigm

5 of 25

Example

Hepatic steatosis

HP:0001397

Fatty liver

disease model

PhenoDigm score

Hepatic steatosis MP:0002628

Phenotype matching using orthologous human disease genes

MP annotations

HPO annotations

Predicted disease model

PD algorithm

6 of 25

Pipeline application overview

PhenoDigm implementation within IMPC

7 of 25

Dissemination methods

  • IMPC website - disease section

  • Shiny App (soon to be incorporated to IMPC website)
    • PhenoDigm scores for pipeline genes
    • Information available:
      • Focus on known disease genes
      • Novel gene discovery

Pilar Cacheiro

8 of 25

Why is this resource important?

9 of 25

10 of 25

IMPC disease models across diverse biological systems

Biological system

Disease Gene

Human Mendelian disease

Relevant Human Phenotype

Overlapping Mouse phenotype

Bone

SCARF2

Van Den Ende-Gupta Syndrome

Long metacarpals

Increased length of long bones

Cardiovascular

LMNA

Cardiomyopathy Dilated 1a

Dilated cardiomyopathy

Increased heart weight

Craniofacial

MSX1

Orofacial Cleft 5

Cleft palate

Cleft palate

Embryo

PSPH

Phosphoserine Phosphatase Deficiency

Intrauterine growth retardation

Abnormal embryo size

Growth/Body size

GHRHR

Isolated Growth Hormone Deficiency, Type Ib

Short stature

Decreased body length

Hearing

SLC52A2

Brown-Vialetto-Van Laere Syndrome 2

Sensorineural hearing impairment

Increased or absent threshold for auditory brainstem response

Hematopoietic

GP9

Bernard-Soulier Syndrome

Thrombocytopenia

Thrombocytopenia

Metabolism

KCNJ11

Diabetes Mellitus, Noninsulin-Dependent

Type II diabetes mellitus

Impaired glucose tolerance

Muscle

COL6A2

Bethlem Myopathy

Distal muscle weakness

Decreased grip strength

Neurological

GOSR2

Epilepsy, Progressive Myoclonic, 6

Difficulty walking

Abnormal gait

Reproductive System

RNF216

Gordon Holmes Syndrome

Infertility

Male infertility

Retina

BBS5

Bardet-Biedl Syndrome 5

Rod-cone dystrophy

Abnormal retina morphology

11 of 25

Bardet-Biedl syndrome-5 and BBS5

Autosomal recessive ciliopathies caused by mutations in one of 19 genes forming the BBSome

Bbs5 mice recapitulate retinal dystrophy and obesity as well as exhibiting additional features such as impaired glucose homeostasis

12 of 25

KDELR2 – gene discovery using IMPC data

  • Osteogenesis imperfecta (MIM:166200)
  • Neurodevelopmental features
  • Dysmorphic features

  • Incompletely penetrant preweaning lethality
  • Bone structural abnormalities in early adult
  • Decreased size
  • Abnormalities in head shape and size
  • Facial dysmorphology
  • Abnormal embryonic body wall structure

Efthymiou, S et al., 2021 Am J Med Genet A

13 of 25

Limitations

  • Certain disorders/physiological systems do not score well
    • Certain human phenotypes are hard to capture
    • Some assays in mice do not capture human phenotypes

  • Abnormal phenotype availability:
    • Disorders in humans have more annotations compared to mice

  • Limitations of matching embryo/prenatal phenotypes in humans

14 of 25

Pipeline

15 of 25

PhenoDigm – IMPC pipeline implementation

    • Download
    • Build
    • Owltools
    • Score
    • Solr (export a solr core for IMPC website)

16 of 25

Download

  • Downloads data resources needed by the pipeline

  • Data dependencies:
    • Ontologies (hp and mp in .obo format)
    • IMPC statistical results, phenotype annotations
    • JAX misc genotype-phenotype associations (MGI)
    • OMIM disease annotations (morbidmap, mimTitles, mim2gene)
    • Orphanet disease annotations
    • HGNC human gene identifiers
    • HPO genotype-phenotype associations (phenotype.hpoa)
    • bioMart human-mice gene mappings from Ensembl (currently broken, fix in April/May)

  • Ontology dependencies: catalog.xml (former catalog-v001.xml):
    • Ontologies in .owl format:
      • uPheno
      • Mp
      • Hp

17 of 25

Build

  • Builds and populates a SQL database

  • Loads data:
    • Genes & orthologues (from HGNC, MGI, Ensembl)

    • Ontologies:
      • Human Phenotype Ontology and Mammalian Phenotype Ontology
      • Parses elements in hp.obo and mp.obo, terms and synonyms

    • Diseases:
      • Known disease genes from OMIM and Orphanet

    • Models:
      • IMPC and MGI models

18 of 25

Owtools semantic similarity

  • Commands to generate support ontologies
  • owltools –-catalog-xml catalog.xml mp-importer.owl --merge-imports-closure --load-instances Mm-gene-to-phenotype-O.txt --load-labels Mm-gene-labels.txt --merge-support-ontologies -o Mm-all.owl
  • owltools –-catalog-xml catalog.xml hp-importer.owl --merge-imports-closure --load-instances Hs-disease-to-phenotype-O.txt --load-labels Hs-disease-labels.txt --merge-support-ontologies –o Hs-all.owl

  • These are very similar to Exomiser too…

19 of 25

Owltools – set up support ontologies

Catalog.xml

mp-importer.owl

Mm-gene-to-phenotype-0.txt

Mm-gene-labels.txt

Owltools

Mm-all.owl

Same process for the hp ontology: hp-importer.owl, Hs-disease-to-phenotype-0, and Hs-disease-labels.txt

Hs-all.owl

This is mp.obo in owl format

20 of 25

Owltools – semantic similarity calculation using support ontologies

  • 3 main files:
    • Hs-all.owl
    • Mm-all.owl
    • mp_hp-align-equiv.owl → mapping

  • Command:
    • owltools Hs-all.owl Mm-all.owl mp_hp-align-equiv.owl --merge-support-ontologies --sim-save-phenodigm-class-scores -m 2.5 -x MP,HP -a owltools-cache-hp-mp.txt

  • Calculates Jaccard Index (simJ) and Information content (IC) of the Least Common subsuming (LCS) phenotype between a pair of concepts

Query term

Match term

simJ

IC

LCS

21 of 25

Score

  •  

Query (disease)

Match (gene)

Avg norm

Avg raw

Max norm

Max raw

Query phenotype

Match phenotype

22 of 25

Packaging and deployment

  • The SQL database → solr core → EBI → New data release

23 of 25

Implementation limitations

  • Pipeline relies on many separate data sources + overlap with build for Exomiser

  • Semantic similarity calculation is done using an old, local copy of owltools on QM’s HPC cluster. If it breaks/deprecated, the pipeline will need major changes. Might need to use semsimian

24 of 25

Acknowledgements

  • Damian Smedley
  • Pilar Cacheiro
  • Tomasz Konopka
  • International Mouse Phenotyping Consortium

25 of 25

Owltools Exomiser vs IMPC pipeline

Exomiser:

  • owltools --catalog-xml upheno/catalog-v001.xml mp.owl hp.owl zp.owl Mm_gene_phenotype.txt Hs_disease_phenotype.txt Dr_gene_phenotype.txt --merge-imports-closure --load-instances Mm_gene_phenotype.txt --load-labels Mm_gene_labels.txt --merge-support-ontologies -o Mus_musculus-all.owl

  • owltools --catalog-xml upheno/catalog-v001.xml mp.owl hp.owl zp.owl Mm_gene_phenotype.txt Hs_disease_phenotype.txt Dr_gene_phenotype.txt --merge-imports-closure --load-instances Hs_disease_phenotype.txt --load-labels Hs_disease_labels.txt --merge-support-ontologies -o Homo_sapiens-all.owl

  • owltools Homo_sapiens-all.owl --merge-import-closure --remove-disjoints --remove-equivalent-to-nothing-axioms -o Homo_sapiens-all-merged.owl

  • owltools Mus_musculus-all.owl --merge-import-closure --remove-disjoints --remove-equivalent-to-nothing-axioms -o Mus_musculus-all-merged.owl

  • OWLTOOLS_MEMORY=80G owltools Mus_musculus-all-merged.owl Homo_sapiens-all-merged.owl upheno/hp-mp/mp_hp-align-equiv.owl --merge-support-ontologies --sim-save-phenodigm-class-scores -m 2.5 -x HP,MP -a hp-mp-phenodigm-cache.txt

IMPC:

  • owltools –-catalog-xml catalog.xml mp-importer.owl --merge-imports-closure --load-instances Mm-gene-to-phenotype-O.txt --load-labels Mm-gene-labels.txt --merge-support-ontologies -o Mm-all.owl
  • owltools –-catalog-xml catalog.xml hp-importer.owl --merge-imports-closure --load-instances Hs-disease-to-phenotype-O.txt --load-labels Hs-disease-labels.txt --merge-support-ontologies –o Hs-all.owl
  • owltools Hs-all.owl Mm-all.owl mp_hp-align-equiv.owl --merge-support-ontologies --sim-save-phenodigm-class-scores -m 2.5 -x MP,HP -a owltools-cache-hp-mp.txt