1 of 53

OBO Tools and Workflows

ICBO 2017

2 of 53

Ontology development in the bad old days

Decide to build ontology

3 of 53

Ontology development in the bad old days

Decide to build ontology

Start from scratch

4 of 53

Ontology development in the bad old days

Decide to build ontology

Start from scratch

Save locally

Edit in IDE

My Hard Drive

My awesome ontology v1.owl

5 of 53

Ontology development in the bad old days

Decide to build ontology

Start from scratch

Save locally

Edit in IDE

My Hard Drive

My awesome ontology v7FINALFINAL.owl

6 of 53

Ontology development in the bad old days

My Institution FTP server

My Hard Drive

Decide to build ontology

Start from scratch

Save locally

My awesome ontology v7FINALFINAL.owl

Share with the world

My awesome ontology v7FINALFINAL.owl

Edit in IDE

7 of 53

Problems with this workflow

  • No version control
  • Single-developer only
  • No quality control procedures
  • No reuse of existing ontologies
  • No automation of repetitive or error-prone tasks
  • No persistent URLs for ontology releases
  • No documentation
  • ...

8 of 53

Alternatives

  • Approach 1: Integrated web-based solution
    • Examples
      • WebProtege
      • Wikidata
      • Semantic Mediawikis
    • Advantages:
      • Easier for maintainer of ontology
    • Disadvantages
      • Vary with each solution, but generally less control/power
  • Approach 2: Treat ontology development more like software engineering
    • Use of hosted version control (e.g. github)
    • Compilation, unit tests, etc
    • We will focus on this 2nd approach, and return to discussion of both

9 of 53

OBO Workflows

  • Intention not to be overly prescriptive
  • Community-based exploration of different solutions
  • Evolving best practices
  • Reuse of software engineering idioms, best practice and workflows where appropriate
  • Modular: mix and match with other tools (e.g. Tawny)
  • Core tools
    • Protege
    • ROBOT
    • Owltools
    • ontoanimals
    • dosdp-tools

10 of 53

Command line tools for ontology processing

OWLTools

  • History 2008->?
    • Evolved organically
  • Many commands
    • 299
    • Ultimate kitchen sink
  • OORT:
    • Obo Ontology Release Tool

ROBOT

  • Designed to replace owltools command runner
    • Much leaner, cleaner
  • 18 commands
    • Reason
    • Filter
    • Diff
    • Mirror
    • Template
    • Merge
    • Annotate
    • Convert

11 of 53

The ontology starter kit (OSK)

git clone git@github.com:INCATools/ontology-starter-kit.git�cd ontology-starter-kit�./seed-my-ontology-repo.pl -d po ro pato -u cmungall -t "Triffid Behavior ontology" triffo

cd target/triffid-behavior-ontology�git remote add origin git@github.com:cmungall/triffid-behavior-ontology.git�git push -u origin master

  • Ideal for new projects
  • For existing projects, easy to steal from

12 of 53

First step: Use a hosted version control system

  • Multiple files to track
    • Source + Derived
    • Ontology(s) + Documentation + Metadata
  • A VCS will allow you to manage changes to all these files effectively
  • Hosted VCSs provide many advantages
  • Examples
    • GitHub
    • GitLab
    • Bitbucket

13 of 53

Anatomy of an ontology project: triffo

  • ‘Canonical’ folder layout
  • Analogous to software project layouts
  • We recommend following same layout
    • Even if you think you don’t need everything yet
    • We’re open to other layouts
    • OSK will provide this for you

14 of 53

Anatomy of an ontology project: the README

15 of 53

Welcome contributors with a CONTRIBUTING.md

16 of 53

Your ‘source code’

17 of 53

Ontology source

18 of 53

OBO Practices for editing ontology source

  • Identifiers
  • Metadata and lexical conventions
    • E.g. every class must have a unique rdfs:label, unique definition
    • Metadata vocabularies: OboInOwl, IAO, rdfs, dc
  • Logical Axioms
    • Use of BFO and RO
    • Document design decisions and design patterns!
  • Reuse
    • Import modules
    • Axiomatize using external ontology classes where appropriate

19 of 53

OBO Annotations Plugin

  • Availability
  • Capabilities
    • Easy editing of standard annotations
    • Automatic SOP for
      • Obsoletion of classes
      • Merging of classes (coming soon)
  • Other OBO-friendly plugins and Protege new features coming soon...

20 of 53

Documentation is vital

  • Necessary for maintainability
  • Inline documentation
    • Annotation assertions on ontology classes
      • For users
      • For developers
  • Documentation external to ontology
    • Various options
      • Readthedocs
      • Github wiki
      • External wiki
      • Google docs
    • Recommendation:
      • Keep it in markdown!

21 of 53

Reuse: Importing parts of external ontologies

22 of 53

ROBOT will extract modules from OBO ontologies

robot extract -i po.owl -T imports/po_terms.txt --method BOT -O http://purl.obolibrary.org/obo/triffo/imports/po_import.owl -o imports/po_import.owl

23 of 53

The problem of cyclic dependencies

24 of 53

Managing files with version control

  • Tips:
    • Commit early commit often
    • Use standard conventions, e.g. GitHub
      • Reference tickets, optionally close with commit
        • E.g. git commit -m ‘NTR: venomous stinger, fixes #32’
  • Git/GitHub flows
    • Different ones to choose from
    • Consider technical skills of editors, don’t overcomplicate
      • But avoid committing on master
    • Borrow. E.g.: http://go-ontology.readthedocs.io/en/latest/DailyWorkflow.html
    • GitHub web interface good for smaller files (e.g. dosdp TSVs, Tawny programs)
  • Meaningful diffs are your friend!

25 of 53

Meaningful diffs are your friend: avoid spurious diffs

  • Bad diffs are largely a thing of the past
  • If your source is OWL...
    • Choose format wisely
      • Functional or Manchester work well
      • Some groups stick with obo format for maximum readability (but beware issues)
      • Note this is for source. Releases will be canonical RDF serialization
    • Use OWLAPI (latest 4.x or 5.x)
      • E.g. Protege, ROBOT, OWLTools
        • Latest versions!
      • This ensures canonical ordering in serialization, minimizing spurious diffs
    • Consider: https://github.com/ShahimEssaid/git-owl-tools
  • Otherwise…
    • Tawny programs, DOSDP yaml and TSVs, HOWL, Robot templates, all play nicely

26 of 53

https://github.com/ShahimEssaid/git-owl-tools

Can also plug in robot diff or your favorite diff tool

27 of 53

But obo format still works best….

28 of 53

Compilation of ontology source

  • Steps
    • Collect various source files
      • Foo-edit.owl
      • Local imports
      • Additional module compilation (dosdp-TSVs, tawny files, ad-hoc OWL files)
    • Validate
      • Is the ontology logically well-formed? Consistent, coherent
      • Is it structurally and lexically well formed?
    • Reason
      • ‘Compiling’ additional axioms, e.g. direct subClassOf axioms
      • Removal of redundancy
  • Tooling
    • Robot + Makefiles

29 of 53

Reasoning with robot

  • Basic reasoning command:
    • robot reason -r elk -i myont-edit.owl -o myont.owl
    • Checks for coherency first, fails fast
    • Asserts direct superclasses
  • Extensions
    • Materialize
      • Useful if we have GCIs
    • Relax
    • Reduce

30 of 53

Standard SPARQL for reporting and validation

  • SPARQL
    • W3 standard query language
    • Useful for
      • Reports
      • Lint checks
    • Declarative
    • Easy to execute locally
      • ROBOT
      • Protege
    • Easy to execute remotely
      • E.g. on ontobee triplestore
  • Under consideration
    • Shex or SHACL

Sparql folder

Shapes?

31 of 53

Standard SPARQL for reporting and validation

  • SPARQL
    • W3 standard query language
    • Useful for
      • Reports
      • Lint checks
    • Declarative
    • Easy to execute locally
      • ROBOT
      • Protege
    • Easy to execute remotely
      • E.g. on ontobee triplestore
  • Under consideration
    • Shex or SHACL

Sparql folder

Shapes?

32 of 53

Makefiles

  • Typing the same thing on the command line is tedious
  • Make automatically manages dependencies for you
  • A makefile is an executable recipe for building targets by running commands

triffo-edit.owl

triffo.owl

Robot

reason

triffo.obo

Robot

convert

triffo-lite.owl

Robot

filter

Robot

validate

Robot

query

.sparql

.sparql

.sparql

ok?

Tsv report

Tsv report

fail

ok?

fail

33 of 53

34 of 53

35 of 53

36 of 53

Continuous Integration

  • Travis
    • Integrated with github
    • Recommended: use with Pull Requests
  • Other systems, e.g. Jenkins
    • Good for integration tests

37 of 53

Example travis file

38 of 53

Example travis file

  • Coming soon: Docker image

39 of 53

40 of 53

41 of 53

Building different artefacts

  • Subsets
    • Class subsets
      • OWLTools
    • Axiomatic subsets
      • ROBOT
  • Conversion
    • robot convert
    • OWL (canonical RDF/XML)
    • OBO Format (legacy)
    • OBO JSON

42 of 53

OBO JSON

  • Spec:
  • Python library

43 of 53

Registering with OBO

  • http://obofoundry.org

44 of 53

45 of 53

  • OBO website
    • Runs on github pages
    • Yaml drives Jekyll templates

46 of 53

  • OBO PURL Server
    • Runs on amazon micro
    • Yaml compiles down to apache conf

47 of 53

Release Management

  • Creating a release
    • make release
      • Robot will auto-annotate a versionIRI
    • Semi-automatically make CHANGES /Changelog file
      • E.g. robot diff
  • Make release on github
    • Commit derived files
    • Name release using versionIRI
      • E.g. v2017-09-13
    • Versioned PURLs automatically available!
  • If files too big for github
    • Variety of other solutions, e.g. using S3
      • NEW: we recommend osf.io

48 of 53

49 of 53

50 of 53

Generated by owljs (deprecated)

51 of 53

Working with user community

  • Github tracker
  • Need for a simpler interface
    • Integrate with INCA Forms, Webulous?

52 of 53

Putting it all together: the ontology starter kit

  • Generates
    • Directory layout
    • Makefile
    • Imports
    • .travis.yml
    • README, LICENSE, CONTRIBUTING, etc
  • Makes obo metadata
    • markdown/yaml
    • Purl yaml

git clone git@github.com:INCATools/ontology-starter-kit.git�cd ontology-starter-kit�./seed-my-ontology-repo.pl -d po ro pato -u cmungall -t "Triffid Behavior ontology" triffo

cd target/triffid-behavior-ontology�git remote add origin git@github.com:cmungall/triffid-behavior-ontology.git�git push -u origin master

53 of 53

Acknowledgments

  • James Overton
  • Heiko Dietze
  • Eric Douglass
  • David Osumi-Sutherland
  • Matt Horridge
  • Jim Balhoff