1 of 36

Advanced OBO Ontology Toolbox

Nicolas Matentzoglu @ ICBO 2023 Ontology Training

2 of 36

Managing the ontology life cycle

3 of 36

Building a knowledge base/classification that captures all relevant concepts of a domain / for a use case, and their relations to other domains in a scientifically accurate and logically consistent way.

The goal of ontology engineering

4 of 36

Logical axioms in ontologies relate entities from diverse ontologies

Hypolysinemia

=

decreased amount

lysine

part of

Quality

Entity

blood

ChEBI

PATO

5 of 36

decreased amount

lysine

blood

blood

?

How do I get terms from external ontologies to re-use them?

6 of 36

decreased amount

lysine

blood

How do I make sure that whenever I make a change, I didn’t break anything?

?

Developer

7 of 36

decreased amount

lysine

blood

How do I make sure all users can access the ontology in a standardised, FAIR manner?

hpo.owl

User

8 of 36

The ODK is a toolbox and ontology life-cycle management system

ODK image

OAK

make, bash

ROBOT

dosdp-tools

Reasoners

fastobo

Icon from www.flaticon.com

Toolbox

Workflow system

docker pull obolibrary/odkfull

9 of 36

blood

blood

Workflow: Dependency management

blood

imports

hp-edit.owl

uberon_import.owl

  • Repeatable workflows (“refresh-uberon”)
  • Stay always current with the axiomatisation of upstream sources
  • Manage small import modules using use-case optimised computational extraction techniques instead of adhoc term import or importing the whole.

10 of 36

Continuous Integration Testing

Developer

edit locally

Make pull request

CI System (GH actions) runs ODK checks

  • No more broken ontologies on “main”
  • No more fear you might “break stuff”
  • Rich set of checks:
    • OWL profile checking
    • ROBOT report (incl. many best practices)
    • Customisable with SPARQL-based unit testing
    • Logical consistency

11 of 36

Workflow: Release pipeline

hp-edit.owl

  • Repeatable workflows (“prepare_release”)
  • Robust pipelines to generate standard release files reliably and quickly
  • Following OBO Best practices:
    • Standard serialisations
    • Standard release variants (base, full)
    • Versioning
    • Clear separation of editors and release environment
  • Fully customisable (release variants, serialisations, base-iris)

Release

Full

Base

Subsets

hp-full.owl

hp-full.obo

hp-full.json

Variants

Serialisations

12 of 36

Overview

Generate standard git repository

editors file

release files

imports

Social workflows:

  • Issues
  • Pull requests

CI/CD:

  • Automatic testing
  • Automatic updates
  • Diffs

Executable workflows:

  • Imports, releases
  • Testing
  • Uses ODK toolbox

13 of 36

Acknowledgements

Core team

  • @matentzn Nicolas Matentzoglu (Semanticly)
  • @gouttegd Damien Goutte-Gattat (Flybase)
  • @anitacaron Anita Caron (EMBL-EBI)
  • @balhoff Jim Balhoff (RENCI)
  • @cmungall Chris Mungall (LBNL)
  • @dosumis David Osumi-Sutherland (EMBL-EBI)
  • @ehartley Emily Hartley (Critical Path Institute)
  • @hkir-dev Huseyin Kir (EMBL-EBI)
  • @shawntanzk Shawn Tan (Novo Nordisk)
  • @ubyndr Ismail Ugur Bayindir (EMBL-EBI)
  • @StroemPhi Philip Strömert (NFDI4Chem)

Funding:

Office of the Director, National Institutes of Health (R24-OD011883); National Human Genome Research Institute, ‘Phenomics First’ (RM1HG010860); National Institutes of Mental Health (1RF1MH123220-01); National Heart, Lung, and Blood Institute 5U01HG009453-03; UK Biotechnology and Biological Sciences Research Council/US National Science Foundation Directorate of Biological Sciences (BBSRC-NSF/BIO BB/T014008/1); The Wellcome Trust, ‘Virtual Fly Brain’ (105023MA); Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy (DE-AC0205CH11231 to C.J.M.); European Molecular Biology Laboratory - European Bioinformatics Institute core funds

Paper:

https://doi.org/10.1093/database/baac087

At least 104 ontologies use ODK as per latest count!

14 of 36

ROBOT

The Swiss Army Knife of the Ontology Engineer

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3002-3

15 of 36

Ontology Access Kit (OAK)

ICBO Ontology Training

2023

16 of 36

OAK: A Python library for ontology access

Command Line Interface: For everyone!!

Modular python packages: For developers and data scientists

50 multi-option commands

17 of 36

What can you do with OAK?

Basic Ontology Lookup

  • lookup/search
  • definitions, aliases
  • relationships

Graph-oriented Operations

  • Ancestors, descendants
  • Visualization

Text and NLP

  • Annotate text
  • Ontology Matching

Ontology Associations

  • E.g gene to terms

Plugins for:

  • Fast graph operation
  • OWL reasoning
  • Embedding

OWL-oriented Operations*

  • Filter axioms
  • [limited functionality]

Validation

  • Structural validation

Change

  • Diffs
  • Apply changes

Semantic similarity

  • Jaccard
  • IC-based

18 of 36

Ontology Subgraph Visualization

Customizable

JSON stylesheet

runoak -i cl.db viz -p i,p 'memory T-cell'

Compact representation of OWL TBox axioms (e.g. existential restrictions)

19 of 36

On becoming an

Open Science Engineer

ICBO 2023 - OBO Tutorial - Nicolas Matentzoglu

20 of 36

Open Science and Open Data are catalysts for tackling global problems

  • Problems like rare disease, pandemics and climate change require global coordination
  • Open data helps addressing these issues, but open licenses are only part of the answer…
  • …the really tough part is the standardisation of data to enable their integration
    • Common vocabularies
    • Semantic data models

�Non-standardised data without explicit semantics is not really “open”, even if it is publicly available.

21 of 36

Ontologies play a central role in Open Science

  • Standardisation of data and “global data integration”
    • Scientific databases
    • Medical records
  • Capturing domain knowledge with well defined semantics
    • Is the heart a part of the cardiovascular system?
    • Is Friedreich's ataxia a genetic disease?
  • Data analysis and aggregation
    • Which patients had a disease of the cardiovascular system?
    • Which patients had a genetic disease?
  • Data validation
    • Does it make sense that patient X (female) may have prostate cancer?

22 of 36

Which are the top five technologies that will facilitate global open data integration in the next 5 to 10 years? (Answer by ChatGPT, with GPT-4 model)

23 of 36

To fulfill all of these roles ontologies must be community-driven

2009

24 of 36

To fulfill all of these roles ontologies must be community-driven

2016

25 of 36

To fulfill all of these roles ontologies must be community-driven*

2022

* incomplete list of contributors

26 of 36

Coordinating across sources is very difficult

I need these terms.

These terms do not make sense.

I would be hesitant to classify

Interstitium as an organ.

27 of 36

The Open Science Engineer

The Open Science Engineer contributes to the collection and standardisation of publicly available scientific knowledge through curation, community-coordination and data, ontology and software engineering.

28 of 36

3 Basic Practices for Community Coordination

  1. Practice of Collaboration
  2. Practice of Upstream Fixing
  3. Practice of No-ownership

29 of 36

Practice of Collaboration

  • Be positive and generous with appreciation and attribution.
    • Do not close GitHub issues on people without explaining exactly how they are addressed.
    • If people make pull requests, do not “redo” them. Amend them, and let contributors be part of the git history of your project.
    • Be overly generous with likes. If you like something like it. 👍
  • Continuously improve Open Science documentation.
  • Always strive to reduce work for others members of the community.
    • Use clear language in pull requests and issue comments.
    • Provide well-formatted and detailed issues/pull requests.
  • Promote open communication (less slack, more GitHub).

30 of 36

Open Science Projects are heavily interlinked

RO

OMO

PHENIO

EFO

GWAS

Catalog

31 of 36

Practice of Upstream Fixing

  • The key to maximising your impact is to push any fixes as far upstream as possible.
  • When you experience a problem, always report it to the immediate source. If you can, report it as high upstream as possible.
  • In a perfect world, provide a fix in the form of a pull request.
  • Master the art of “drive-by curation”

32 of 36

Practice of No-Ownership

  • Get involved on other peoples issue trackers.
  • See your issues and pull requests through to the end (don’t drop the ball, no one will do your work for you).
  • Feel empowered to nudge reviewers until they tell you not to.
  • Find review buddies (this is really helpful to organise community work).
  • Be proactive... and brave.
    • Reduce your fear of breaking the ontology.
    • Reduce your fear of getting a pull request rejected.
  • Reduce other people's fear of breaking the ontology by adding additional QC checks.

33 of 36

Tools all aspiring Open Science Engineers with a focus on semantics should know of

  1. ROBOT: he undeniable Swiss Army Knife for Ontology Engineers
  2. Ontology Access Kit (OAK): Extracting information from ontologies
  3. Protégé: Manual ontology editing
  4. Ontology Lookup Service (OLS): Term browser
  5. Ontology Development Kit (ODK): Ontology Lifecycle Management
  6. Text Editor Workflows (regex, search-replace)
  7. Basic Shell Scripting, GNU make and UNIX CLI pipelines
  8. LinkML: Building Semantic Data Models
  9. SPARQL: Querying Ontologies and Knowledge Graphs
  10. From tables to ontologies: ROBOT templates and DOSDP template workflows

https://oboacademy.github.io/obook/reference/semantic-engineering-toolbox/

Prompt Engineering?

34 of 36

First steps to becoming an effective Open Science Engineer in Biomedical Ontologies

  1. Make sure you have both a GitHub and Stack Overflow account
  2. Start by liking and up-voting issue comments and question answers relevant to you
  3. Complete a basic ontology contributor training
  4. Make a pull request on an OBO ontology
  5. Master the ontology tool belt!
  6. Join an Open Data project today and offer your help! OBO Foundry, any ontology you find interesting

35 of 36

Today was a good start…

36 of 36

Join the community to tackle the global challenges of our time!

Thank you!