MIREOT - Minimal information to reference external ontology terms

Editors: Alan Ruttenberg, Melanie Courtot   Contributors: Bill Bug, Daniel Schober, Philippe Rocca-Serra, Allyson Lister, James Malone

Abstract

Developing an ontology covering all aspects of biomedical investigations is an ambitious project on its own. Several implementation choices have been made to ensure interoperability with other resources. These include the use of BFO as the upper-level ontology and membership in the OBO Foundry (http://www.obofoundry.org/). Such membership means OBI abides by OBO Foundry design principles, among which is a commitment to orthogonality, or partitioning of the term space, with other OBO Foundry ontologies.

Current members of the OBO Foundry cover elements of biological and clinical investigations, to varying degrees. OBI aims to avoid duplication of effort, and instead develops ways to import pre-existing knowledge into the ontology.

While the Web Ontology Language (OWL), the format used to create OBI, provides a mechanism to import ontologies [example], this mechanism is unsuitable for OBI. Firstly, given the current state of editing tools and the issues they have working with large ontologies, direct OWL imports are not practical for day-to-day development. Secondly, the other ontologies used by OBI are under active development and may not be aligned with the OBI development approach. Importing such ontologies as a whole could lead to inconsistencies or unintended inferences.

We propose a method that allows selective use of classes from external ontologies that are of direct interest to OBI. In examining OBO Foundry ontologies, it was found that specific terms are fairly stable in what they are intended (and observed) to denote, even as the containing ontology is reorganized. In this report we propose a mechanism and minimal information standard for importing required terms into an ontology. We also suggest an implementation for this mechanism, scripts for automation of the import process and finally some ideas for future work and extensions.

Status of this document

This is an authors' draft with no official standing. This document has been prepared as part of the activities of the OBI Consortium in response to needs expressed by members of the group. Please send comments to the authors at mcourtot@gmail.com and alanruttenberg@gmail.com.

This document is in a state of flux with revisions occurring frequently.

Method

Though the quickest way to import a class and its annotations into OBI is to import the ontology containing it, current limitations in tools and reasoners can sometimes make such a solution impractical on a day-to-day basis. For example, the OWL representation of the NCBI Taxonomy is 250 Megabytes and contains tens of thousands of classes. Most OWL tools can neither load nor reason over an ontology this large.  A second alternative would be to copy a fragment from the external resource file into obi.owl. However there are several issues with this strategy:

To cope with these requirements, we devised a more involved, but more flexible, mechanism.

The absolutely minimal information needed to reference an external class is the ontology URI and the term's URI. This data should be pretty stable, and can be used to unambiguously reference the external class from within OBI. The absolutely minimal information to integrate this class into OBI is to know its position in the hierarchy, i.e. what OBI class the imported class is a subclass of.

We therefore decided to store the minimal of the minimal information, ontology URI and term URI of the mapped class as well as the OBI superclass in a file called external.owl.

We also want to provide extra information about our imported classes, and in order to do so we want to store additional data, like the label of the imported class, the definition or any other kind of information that may be deemed necessary. We map this extra information into the OBI annotation property set, for example the rdfs:label of an oboInOwl:Definition instance will be mapped to the value of obi:definition property. To keep this information up-to-date, we decided to store it in a separate file that can be removed on a regular basis, and rebuild via script based on external.owl

On the issue of inference, we note that in the current state, correct inference using the external classes is only guaranteed if the full ontologies are imported. We expect to provide an option in the OBI distribution that replaces external.owl, with imports.owl a file of imports statements generated by extracting the ontology URIs mentioned in external.owl. Other options are possible, for instance having the script that generates additional information use software that extracts a "module" 

To implement this, we modified the obi.owl and obi.repository files. The two additional files, external.owl and externalDerived.owl are directly imported by obi.owl ( as opposed to importing only external.owl, which would itself import externalDerived.owl) to maintain the ability of relocating the whole ontology by editing a single file.

We also created a new annotation property, OBI_0000283, "imported from", to record the ontology URI, which has been added to the AnnotationProperty.owl file.

Figure 1: architecture implemented

Use case I - cell

In the biomaterial branch, we currently have the term cell. This term is already defined by the Cell Type Ontology http://obofoundry.org/cgi-bin/detail.cgi?id=cell, which is part of the OBO foundry effort, and we would like to use the cell class as defined by this resource.

The following is the declaration of the cell class in the file http://purl.org/obo/owl/CL:

  <owl:Class rdf:about="http://purl.org/obo/owl/CL#CL_0000000">

    <rdfs:label xml:lang="en">cell</rdfs:label>

    <oboInOwl:hasDefinition>

      <oboInOwl:Definition>

        <rdfs:label xml:lang="en">Anatomical structure that has as its parts a maximally connected cell compartment surrounded by a plasma membrane.</rdfs:label>

        <oboInOwl:hasDbXref>

          <oboInOwl:DbXref>

            <rdfs:label>CARO:mah</rdfs:label>

            <oboInOwl:hasURI rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://purl.org/obo/owl/CARO#CARO_mah</oboInOwl:hasURI>

          </oboInOwl:DbXref>

        </oboInOwl:hasDbXref>

      </oboInOwl:Definition>

    </oboInOwl:hasDefinition>

    <oboInOwl:hasOBONamespace>cell</oboInOwl:hasOBONamespace>

    <oboInOwl:hasDbXref>

      <oboInOwl:DbXref>

        <rdfs:label>FMA:68646</rdfs:label>

        <oboInOwl:hasURI rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://purl.org/obo/owl/FMA#FMA_68646</oboInOwl:hasURI>

      </oboInOwl:DbXref>

    </oboInOwl:hasDbXref>

  </owl:Class>

1/ external.owl

We add the minimum of the minimal information in the external.owl file, i.e. ontology URI and term URI of the mapped class, and its position in the OBI hierarchy.

  <owl:Class rdf:about="http://purl.org/obo/owl/CL#CL_0000000">

    <rdfs:subClassOf rdf:resource="http://purl.obofoundry.org/obo/OBI_0000141"/>

    <OBI_0000283 rdf:resource="http://purl.org/obo/owl/CL"/>

  </owl:Class>

The above contains the information about:

We also declare the cell namespace in the header of the external.owl file, by adding  xmlns:cell="http://purl.org/obo/owl/CL#".

The external.owl file can be edited manually, or created via script.

2/ externalDerived.owl

In the cell example, we also wanted to have the information relevant to the preferred term to be used (e.g., cell), the definition of the term (e.g. Anatomical structure that has as its parts a maximally connected cell compartment surrounded by a plasma membrane.) and the label to be used (e.g. cell).

Per the described mechanism, this additional minimal information is stored in the externalDerived.owl file.

  <rdf:Description rdf:about="http://purl.org/obo/owl/CL#CL_0000000">

    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">cell</rdfs:label>

    <OBI_0000288 xml:lang="en">cell</OBI_0000288>

    <OBI_0000291 xml:lang="en">Anatomical structure that has as its parts a maximally connected cell compartment surrounded by a plasma membrane.</OBI_0000291>

  </rdf:Description>

The above contains the information about:

This file will be updated on a regular basis. The proposed mechanism is that a script reads the external.owl file to know what to get and where, without modifying it, and using the data in that file would then connect to the remote ontologies and update the additional information in the externalDerived.owl file.

The externalDerived.owl file itself should never be edited manually in the future (script to be created): it is created by a script reading the external.owl file and writing the externalDerived.owl file.

Remark: the MIREOT mechanism provides a way to map towards an external resource, the cell class in the above example. There is however no check enforced that there was no cell class pre existing in OBI. The curator who decides to import a class has to decide whether he/she needs to deprecate an OBI class if needed.

Use case II - taxonomic information

It is expected that in most of the cases we will want to import information as described in the use case I above, i.e. a simple mapping towards an external class. However in some cases we might want or need more than that, and the MIREOT mechanism has been devised to be flexible and allow to store other types of information as deemed necessary.

Consider the scenario in which we have 2 experiments, one in human and one in mouse. The files are annotated with the classes human and mouse from OBI, which are in turn mapped from the NCBI taxonomy.

We can easily imagine that somebody would want to have a query of the form "give me all experiments in mammals". In this case, we would need to know that human and mouse are subclasses (even indirect) of mammals in the NCBI taxonomy. Therefore, when mapping towards an NCBI term, we decided to get all its superclasses as well up to the root of the NCBI taxonomy. As per the mechanism described above, the mapped class (e.g. human) is defined in external.owl, whereas this additional information to the human class (i.e. its superclasses) are stored in externalDerived.owl.

This use case is currently being worked upon and this section should be updated accordingly later on.

At the moment:

To test go to http://sparql.neurocommons.org:8890/nsparql/

This query gets all subclass/superclass pairs starting at mus  

(tax:_10088) and their labels.

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix tax: <http://purl.org/obo/owl/NCBITaxon#NCBITaxon>

select ?parent ?super ?parentlabel ?superlabel

from <http://purl.org/science/graph/taxon>

where

  {

    { graph <http://purl.org/science/graph/taxon/inferred>

       { tax:_10088 rdfs:subClassOf ?super. }

     { ?super rdfs:subClassOf ?parent.

       ?super rdfs:label ?superlabel.

       ?parent rdfs:label ?parentlabel

     }

    }

    UNION

    {

      ?super rdfs:subClassOf ?parent.

      ?super rdfs:label ?superlabel.

      ?parent rdfs:label ?parentlabel

      FILTER (?super = tax:_10088) }

  }

For construction of externalDerived.owl we can use the following

construct

  { ?sub rdfs:subClassOf ?super.

    ?sub rdfs:label ?sublabel.

  }

Note: We need the union to pick up the bottom-most subClassOf relation.

For more properties to include in externalDerived.owl we add more to  

the query and corresponding construct.

Results

The desired class cell has been imported from the Cell Type Ontology, and appears as a subclass of material entity in the OBI ontology. Its definition, preferredterm and label are directly imported from the Cell Type Ontology and mapped into our own set of annotation properties, and this is transparent for the user. We don't modify the imported values - in case this is accidentally done anyway, it will be erased when an automatic update of externalDerived.owl will be performed.

The annotation property imported from denotes the origin of the class.

We can add extra restrictions, annotations or other to these imported classes: for example in OBI, cell is the bearer of the role reagent role or specimen role. OBI specific elements like this additional restriction should be stored in the owl file of the branch importing the class.

However, we don't add a curation status annotation property, or definition editor or definition source: the term is directly imported from the external resource, with its status and definition that has been created by the outside resource. (if available in the external resource we could map those into OBI annotation properties though)

Figure 2: screenshot of the Protege editor open on the cell class

Updating a 'Mireoted' identifier:

It might be the case that the imported element need to be replaced or updated. In this case, external.owl should be modified using the procedure described above. However, the ontology should be inspected in order to make sure that all references (e.g. when setting restrictions and defining ranges) to the old identifier/label are also updated. Currently, this check would have to be done manually (via editor or perl liners). This is a critical point as failing to do so would cause resolution error as well as reasoning failure.

We will be working on providing tools performing automated checks and updates. 

Future work

We need to automatize the whole process. At this point we had to adjust manually the files.

General remarks

We limit the mapping process to OBO foundry ontologies.