1 of 15

A Simple Standard for Ontological Mappings 2023: Updates on data model, collaborations and tooling (OM 2023, Athens, Greece)

Anita Caron (EMBL-EBI), Benjamin Gyori (Harvard Medical School), Cassia Trojahn (Universite Toulouse 2), Charles Tapley Hoyt (Northeastern University), Chris Mungall (LBNL), Damien Goutte-Gattat (FlyBase), Emily Hartley (C-Path), Harshad Hegde (LBNL), Huanyu Li (Linköping University), Hyeongsik Kim (Bosch), Ian Braun (C-Path), James McLaughlin (EMBL-EBI), Nicolas Matentzoglu (Semanticly), Nicole Vasilevsky (CU Anschutz), Nomi Harris (LBNL), Sven Hertling (University of Mannheim)

* SSSOM can be pronounced “sessom”

https://w3id.org/sssom

Biomedical Data

Open SSSOM*!

I don’t think he is pronouncing it right…

2 of 15

What are entity mappings?

2

“Friedreich's Ataxia”

OMOP:441554

Entities are symbols, such as codes in a terminology, classes in an ontology, permissible values in a data model, identifiers in a database or simply strings in a text field that are intended to refer to a real world thing.

3 of 15

The anatomy of a semantic entity mapping

are insufficient

3

SUBJECT

PREDICATE

OBJECT

�subject_id:

EFO:10000070

�object_id:

MONDO:0006071�

�object_label:

adenofibroma�

�subject_label:

Adenofibroma�

�predicate_id:

skos:exactMatch

JUSTIFICATION

mapping_justification: semapv:LexicalMatching

subject_match_field: rdfs:label�object_match_field: oio:hasExactSynonym

match_string: adenofibroma

mapping_date: 2022-12-13

reviewer_id: orcid:0000-0002-7356-1779

mapping_tool: wikidata:Q64360017

confidence: 0.8

4 of 15

Recap: What problem do we solve? �Provide a simple, spreadsheet-based format to facilitate widespread adoption

4

4

#mapping_set_id: https://w3id.org/sssom/commons/mouse-human/mappings/mp_hp_mgi_all.sssom.tsv

#mapping_set_title: All mappings of MP terms to HPO terms generated by MGI

#mapping_set_description: "Consolidated list of all HPO to MP mappings done by MGI…."

#creator_id:

# - orcid:0000-0003-4606-0597

# - ror:021sy4w91

#license: https://creativecommons.org/licenses/by/4.0/

#object_source: obo:hp

#subject_source: obo:mp

#curie_map:

# HP: http://purl.obolibrary.org/obo/HP_

# MP: http://purl.obolibrary.org/obo/MP_

Mapping Table

5 of 15

Recap: What problem do we solve? �Document rich mapping justifications to facilitate well-informed re-use decisions across use cases

5

:A

:B

skos:exactMatch

Lexical matching

“C”

rdfs:label

skos:prefLabel

subject_match_field

object_match_field

match_string

mapping_justification

Manual mapping curation

mapping_justification

orcid:123

author_id

confidence

0.7

Other examples of justifications:

  • Lexical similarity
  • Semantic similarity threshold
  • Mapping chaining

subject_preprocessing

semapv:CaseNormalization

lexmatch

mapping_tool

https://mapping-commons.github.io/semantic-mapping-vocabulary/

6 of 15

Recap: What problem do we solve? �A well defined data model for mappings and justifications

Rich YAML schema powered by

Shex shapes for validating rdf

JSON Schema

Markdown docs

https://w3id.org/sssom/spec

7 of 15

Recap: What problem do we solve? �Promoting the creation of interoperable FAIR mapping registries

  • Increased organisation-level mapping commons development
  • Initial template for mapping registries is under development
  • Advanced metadata model

7

m1.sssom.tsv

m2.sssom.tsv

m3.sssom.tsv

b.sssom.tsv

Registry

Shared QC, � automatic reconciliation

Wrong mapping!

Collaborative curation

8 of 15

Updates 2023: SSSOM Model and Documentation

  • Relatively few changes to model! Version 1?
  • New documentation:
    • Page listing all SSSOM talks
    • Reference for “chaining rules”
    • New tutorials
  • New elements:
    • mapping_set_title (a human-readable title for the mapping set)
    • issue_tracker (issue tracker to be used for reporting problems)
    • issue_tracker_item (a field to track ongoing discussions about a mapping)
    • curation_rule (next slide)

8

9 of 15

Curation rules: Capture a (potentially) complex (set of) condition(s) executed by an agent (usually human) that led to the establishment of a mapping.

9

:A

:B

skos:exactMatch

Manual mapping curation

mapping_justification

orcid:123

author_id

confidence

0.7

curation_rule

DISEASE_MAPPING_COMMONS_RULES:MPR3

“Two diseases are considered exact matches if they share both phenotypic presentation and genetic underpinnings.”

10 of 15

Updates 2023: Tooling

10

11 of 15

Open Mapping Justification widget

Open Mapping page

Open Mappings page

https://github.com/EBISPOT/oxo2

11

Updates 2023: Tooling (OxO 2)

12 of 15

Updates 2023: SSSOM @ OAEI

  • SSSOM will be introduced at OAEI in stages.
  • 2023: adding SSSOM as an optional output format
    • Matchers can not only output in alignment format (XML) but also in SSSOM CSV format
    • MELT evaluation client
      • supports SSSOM as a matcher output
      • allows conversion from alignment api format to SSSOM and back

12

13 of 15

Updates 2023: User Radar (selected)

13

14 of 15

Discussion 2023: What about other types of entity mappings?

14

MONDO:0006071

Type 1: lexical token - identifier

Type 2: identifier - identifier

Type 3: complex

EFO:1000070

MONDO:0006071

adenofibroma

SSSOM 2023 Workshop: The Limits of SSSOM.

Hypertensive heart disease without congestive heart failure

modifies

Not

Congestive heart failure

AND

Hypertensive heart disease

15 of 15

Acknowledgements (SSSOM Work)

15

Funding

Phenomics First (NIH / NHGRI #1RM1HG010860-01): Spec, Mondo integration, sssom-py CLI��Monarch (NIH / OD #5R24OD011883): Cross-species mappings, outreach, knowledge graph integration

Bosch Gift to LBNL: sssom-py IO, testing, converters, tutorials

DARPA: Young Faculty Award W911NF2010255�(PI: Benjamin M. Gyori)

Community contributions: https://w3id.org/sssom

Core Team, alphabetical order (https://github.com/orgs/mapping-commons/teams/sssom-core)

  • Alex H. Wagner (Nationwide Children's Hospital)
  • Anita Caron (EMBL-EBI)
  • Cassia Trojahn (Universite Toulouse 2)
  • Charlie Hoyt (Harvard Medical School)
  • Chris Mungall (LBNL)
  • Damien Goutte-Gattat (FlyBase)
  • David Osumi-Sutherland (Sanger Centre)
  • Emily Hartley (C-Path)
  • Ernesto Jimenez-Ruiz (City, Univ. of London)
  • Harshad Hegde (LBNL)
  • Henriette Harmse (EMBL-EBI)
  • Hyeongsik Kim (Bosch)
  • Ian Braun (C-Path)
  • Ian Harrow (Pistoia Alliance)
  • James McLaughlin (EMBL-EBI)
  • Jim Balhoff (RENCI)
  • John Graybeal (Stanford)
  • Melissa Haendel (CU Anschutz)
  • Nicolas Matentzoglu (EMBL-EBI)
  • Nicole Vasilevsky (C-Path)
  • Nomi Harris (LBNL)
  • Núria Queralt Rosinach (Leiden University)
  • Simon Jupp (SciBite)
  • Sophie Aubin (INRAE)
  • Thomas Liener (Pistoia Alliance)
  • Tiffany Callahan (IBM Research
  • Tim Putman (CU Anschutz)
  • Vinicius de Souza (EMBL-EBI)
  • William Duncan (UFlorida)
  • ...many more contributors, see publication

Database (Oxford), Volume 2022, baac035, https://doi.org/10.1093/database/baac035

*recent joiners highlighted in bold