1 of 21

Closing the gap between effective biocuration and meaningful ontology development

Annual General Meeting of the International Society for Biocuration, 18th October 2023

Nicolas Matentzoglu

2 of 21

The role of ontologies in Biocuration

  • Standardization and Consistency
  • Data Integration and Interoperability
  • Semantic Search and Querying
  • Enhanced Data Discovery and Interpretation

3 of 21

The role of ontologies in Biocuration

  • Standardization and Consistency
  • Data Integration and Interoperability
  • Semantic Search and Querying
  • Enhanced Data Discovery and Interpretation

MP:0009906

  • Enlarged tongue
  • Megaloglossia
  • Megaglossia
  • Macroglossia
  • Increased tongue size

4 of 21

The role of ontologies in Biocuration

  • Standardization and Consistency
  • Data Integration and Interoperability
  • Semantic Search and Querying
  • Enhanced Data Discovery and Interpretation

MP:0009906

MP:0009906

5 of 21

The role of ontologies in Biocuration

  • Standardization and Consistency
  • Data Integration and Interoperability
  • Semantic Search and Querying
  • Enhanced Data Discovery and Interpretation

increased tongue size

MP:0009906

abnormal tongue morphology

MP:0000762

What genes are associated with abnormal tongue morphology?

6 of 21

The role of ontologies in Biocuration

  • Standardization and Consistency
  • Data Integration and Interoperability
  • Semantic Search and Querying
  • Enhanced Data Discovery

Example of phenotypic profile matching, see doi: 10.1002/cphg.92

7 of 21

Mutual dependence

needs to deal with changes to ontology, needs updates to terms

needs feedback and integrated data to be truly valuable

Database

Ontology

8 of 21

The Gap

We need 5000 new terms for protein level measurements in urine, e.g.

“glucose level in urine”.

Ehm, until when?

Asap.

9 of 21

The Gap

Ahhhh, until when?

In 4 weeks.

We are going to obsolete 500 classes.

10 of 21

Closing the gap

  1. Technical: Build tools that support community curation
  2. Socio-technical: Organise events to “community curate”
  3. Sociological: Raising awareness and community building

11 of 21

What can we do to make direct contributions easier?

  • Indirect contributions are unstructured communications that need to be manually processed by ontology editors
  • Direct contributions can be immediately translated into edits to the ontology
    • Options in 2023 to improve:
      • GitHub
      • Templates and tables
      • Change languages

12 of 21

Push ontology curation as far down the expert hierarchy as possible

12

4

Domain Experts

  • labels
  • reference publications
  • synonyms

All Biocurators

  • description (note this is not yet “definition”)
  • suggest cross-references
  • suggest parents

3

Ontology Engineers

  • Patterns for logical definitions
  • Upper-level alignment

1

Ontology Developers

  • formal definition
  • semantic mappings
  • cross-references
  • entity references
    • related entities
    • parents in ontology

availability

Required familiarity with ontology engineering

2

Thank you Sue Bello for advising!

13 of 21

The amazingness of standardised open ontology development systems

Generate standard git repository

editors file

release files

imports

Social workflows:

  • Issues
  • Pull requests

CI/CD:

  • Automatic testing
  • Automatic updates
  • Diffs

Executable workflows:

  • Imports, releases
  • Testing

doi:0.1093/database/baac087

At least 104 ontologies use ODK!

14 of 21

Design Patterns and spreadsheets

  • Decouple the curation of entities from the actual logical representation
  • Systems:
    • ROBOT template
    • DOSDP
    • LinkML
  • Stay tuned:
    • Nanobot
    • Data Harmonizer

defined_class

cargo

membrane

start

end

GO:0098713

leucine

plasma membrane

extracellular membrane

cytosol

15 of 21

Change languages and widgets in curation interfaces

1

2

3

Embed curation widgets directly in ontology browsers!

Any proposal opens an issue on the ontologies issue tracker!

The issue gets automatically translated into a pull request.

https://incatools.github.io/kgcl/

16 of 21

Using hackathons with domain experts as “sprints” to curate sections of the ontology

“Based on the survey results it seems that most people did not contribute because no one had asked them.” (R. Mazumder)

The Socio-technological side…

From [isb-biocuration] Examples of successful community curation models for databases? (Fri, 6 Oct

17 of 21

OBO Academy: Training materials for bio-ontologists

Running online seminar series and ontology trainings to increase confidence

https://bit.ly/obo-academy

Seed funding from:

ISB also has similar great resource: https://www.biocuration.org/dissemination/biocuration-training-materials/

18 of 21

Raising awareness for FAIR, open data and ontologies and its impact on the world

  • Open Ontologies and FAIR Biocuration are fundamental to Open Science and play a central role in battling the problems of our world at a global scale
    • Rare Disease
    • Climate Change
    • Ethical AI
  • We need to have more conversations about how much of a catalyst closing the gap between Database curation and Ontology development will be to tackling these issues.

“[w]e neither have staff dedicated to this initiative only, nor any specific funding. We are all intrinsically motivated to improve the situation” (R. Giessmann)

19 of 21

Thanks to Sabrina Toro, Sue Bello, Nicole Vasilevsky, Zoe Pendlington, Ray Stefancsik, Chris Mungall for your help researching for this talk. Mistakes are all my own.

Thank you,

and the amazing

Open Ontology

(General Concepts)

Open FAIR data annotated using ontologies

Open community of contributors and users

Slide adapted from Chris Mungall, “Pistoia 2023.10.11 - Open Ontologies in the Biomedical Domain” - The triangle of success

20 of 21

Community curation: strategies for eliciting engagement

“[w]e neither have staff dedicated to this initiative only, nor any specific funding. We are all intrinsically motivated to improve the situation” (R. Giessmann)

“The DisProt team are also developing Apicuron [...] to provide community curators recognition for contributions at ORCID.” (V. Wood)

“Many people will curate if they understand the benefits, and it is made as easy as possible to do. Communication is key.” (V. Wood)

“some funding, awards, travel fellowships, authorship in publications.” (Taner Z. Sen)

“Based on the survey results it seems that most people did not contribute because no one had asked them.” (R. Mazumder)

From [isb-biocuration] Examples of successful community curation models for databases?

(Fri, 6 Oct)

21 of 21

Do we really need a term for “abnormally increased levels of Athenian particulate matter in the lung”? - Shared semantic schemas

  • Post-coordination has existed for a very long time in Biocuration
  • We have come a long way since 2015: DOSDP patterns, LinkML

abnormal

Modifier

increased amount

particulate matter

located in

Characteris.

Entity

lung

originating from

Athens

This is a stupid example, but do we really need terms for abnormally increased or decreased levels for the entirety of ChEBI? Or UniProt?