OBI ID Policy


Authors: Melanie Courtot, Alan Ruttenberg, Bill Bug and the OBI Consortium


Executive Summary



Note: Although we've developed a URI specifically designed to promote returning fragments, for the coming release, the only URI that will return OBI content is http://purl.obofoundry.org/obo/obi.owl which will return the entire OBI merged file or something like http://purl.obofoundry.org/obo/2008-03-05/obi.owl which would return a specific version tagged by date. See "Future work" below.


Background


As part of the release process, we want to produce files that have homogeneous identifiers (IDs) and stable URIs.


Like the rest of the OBO ontologies we want to use purl [1] based URIs, because of the ability to redirect to a different URL should we want to change hosts, etc. (the current OBI URIs are based on Sourceforge).


Regarding the form of the URI itself, those that have expressed an opinion have the opinion that we should give all entities that we define - classes, relations, and instances, with IDs and use labels for the human readable version.



Chris (Mungall) has suggested that perhaps we should adopt the current form and then have OBO change as a whole. We think we'd rather not, as this is an initial release for OBI, and get to a stable form at the start, if possible.

 

Three options proposed


Current OBO practice.

 Example URI: http://purl.obofoundry.org/obo/owl/OBI#OBI_0010000 

Pro: Current OBO practice.
Cons: "owl" in name doesn't make sense. # (hash) doesn't scale well if you want to have each term resolve to just a bit of owl about the term (which is desirable) because web servers don't get the part of the URI after the "#". So an HTTP GET of any OBI URI returns the whole ontology, which over time we hope to have be rather large, and hence this is undesirable behavior.

Somewhat conservative change:
 Example URI: http://purl.obofoundry.org/obo/OBI/OBI_0010000

Pro: "/" instead of "#" gets rid of problem named above. No "owl" in 
the name.
Cons: More verbose than necessary - we see "OBI" twice in the URI for 
classes.

Removing the extra OBI

 Example URI: http://purl.obofoundry.org/obo/OBI_0010000 

Pro: As short as is sensible. 
Cons: Issue with terms that don't have OBI ids, such as relations, instances, and classes that we might want to keep named CURRENT STATUS - approved by OBI All agree on
- using purl based URIs.

- using IDs for everything from now on.
- option 3 (
http://purl.obofoundry.org/obo/OBI_0010000) agreed upon


Basing our URIs on a domain we control/ use of
DNS
Jonathan Rees, working with Alan at Science Commons, suggests that we might want to use our own domain name and have the DNS [2] point to purl.org, to protect ourselves against a future time when purl.org might not be as reliable.

This would mean that we should allocate, for instance, host in the obofoundry.org domain for this. It should be new host name so that it can be redirected at the DNS level, so that we don't require extra time for the resolution or dedicated servers to actually handle lookups.

Note: this means that somebody who wants to access OBI will be directed by the DNS server to the right place directly, the user won't need to go first to the general OBO foundry system which would then redirect the incoming request.

The proposed host name is (agreed upon by this document's authors and Chris Mungall) http://purl.obofoundry.org/obo/OBI_0100102


MAQ (Melanie Asked Questions)

If we adopt http://purl.obofoundry.org/obo/OBI_0100102 for all our IDs. These are based in part on my (long) list of questions to which Alan (very patiently) responded.


How to reference an OBI term (for example when annotating a file)?
Use http://purl.obofoundry.org/obo/OBI_0100102 instead of http://obi.sourceforge.net/ontology/OBI.owl#OBI_0100102 

How to browse the whole ontology file?

Go to the URL http://purl.obofoundry.org/obo/obi.owl and you will get the display of the whole OBI.owl file

What about our xml:base?
This tells us what to use as a default prefix in the string target of rdf: (about|resource|ID) - note ID also gets a "#" prepended
so we would never use rdf:ID, but rdf:about instead. As it happens, the base is overridden in our case because we always use rdf:about and a full URI. (we use the full URI because of a bug in protege that forces the xml:base to be the same as the ontology URI - i.e. the location of the ontology) But just in case we set xml:base to http://purl.obofoundry.org/obo/. So if we write rdf:about="OBI_0123456" the URI is http://purl.obofoundry.org/obo/OBI_0123456
What is our default namespace?
The default namespace (xmlns in the file) defines what is used as a default prefix for XML tags.
It is analogous to the xml:base property, but for XML tags.
So in order to be able to write <OBI_0000285 .... to add a property value, we need to set the default namespace to http://purl.obofoundry.org/obo/ so that it knows that we mean the property name is http://purl.obofoundry.org/obo/OBI_0000285
And the ontology URI?

This is the address on the web, and the name of the ontology. It will be
http://purl.obofoundry.org/obo/obi.owl in our case.
Can I still use something in the form of http://purl.obofoundry.org/obo/obi.owl#OBI_0100102?

No, you would not get the correct fragment in the file, and it wouldn't be correct to use that to annotate files.


Where to get the latest version of OBI?
The "latest version" ontology URI would fetch from the central repository of OBI ontologies (e.g. http://purl.obofoundry.org/obo/obi.owl). But the terms identify separate locations, e.g. http://purl.obofoundry.org/obo/OBI_0123456.

We would likely prototype the responses to these URIs by redirecting them to a server at Science Commons where we could develop the software that returns bite-sized chunks for each term. Once developed we could move this service to wherever we deem appropriate. (we can either redirect from the purl on the pattern http://purl.obofoundry.org/obo/OBI,if possible or script a redirection for each id otherwise.)

We also propose to mint URIs for each released version of the OBI ontology. For example, http://purl.obofoundry.org/obo/2008-03-05/obi.owl: people could choose to import specific versions of the ontology to preserve stability if they need to.
When querying for a specific term, would the users also get something else than just the class?

They could. We could also include, e.g. the told superclasses.
Whatever we think useful as long as
 a) the import brings in the rest
of the semantics that is needed (but most browsers don't actually need the full semantics)
 b) We don't say anything that conflicts with
something in the ontology.
But we could add extra
informative information, like a property that holds the OBO format text.

What happens to elements that don't currently have an ID (annotation property, relations...)?

<owl:AnnotationProperty rdf:ID="alternative_term_citation">
<definition>
formal citation of the source of the alternative_definition, e.g. identifier in external
database to indicate / attribute source(s) for the definition. Free text indicate / attribute source(s)
for the definition. EXAMPLE: Author Name, URI, MeSH Term C04, PUBMED ID, Wiki uri on 31.01.2007

</ definition>
</owl:AnnotationProperty>

would become

<owl:AnnotationProperty rdf:about="OBI_6786789">
<rdfs:label>alternative_term_citation</rdfs:label>
<definition>formal citation of the source of the alternative_definition, e.g. identifier in external
database to indicate / attribute source(s) for the definition. Free text indicate / attribute source(s) for the
definition. EXAMPLE: Author Name, URI, MeSH Term C04, PUBMED ID, Wiki uri on 31.01.2007

</ definition>
</owl:AnnotationProperty>

Note that we added the rdfs:label to keep the name of the property and that we use rdf:about instead of ID.

Does assigning IDs to everything have any consequence?
It shouldn't. Modulo tool bugs. Chris Mungall seemed to say this was OK too, so that reinforces.

What about the classes we are importing from ontologies who still use the traditional OBO format?

Nothing changes for classes we are importing, they keep their ID and URIs.

e.g. <owl:Class rdf:about="http://purl.org/obo/owl/CL#CL_0000236">
<rdfs:label xml:lang="en">B cell</rdfs:label>
...
will stay as it is.
Later we will try to get CL to use a similar scheme: http://purl.obofoundry.org/obo/CL_0000236


Sample file

Alan generated a zip file including curation status and new IDs
http://groups.google.com/group/obi-developer/web/newids_curation_status.tgz for all to test.
Note that this file uses the thing.obofoundry.org, which we decided to change to purl.obofoundry.org on the suggestion of Chris Mungall.

The file also prototypes using a curation_status widget in Protege:

Note that


Future Work
We will likely in the future provide some way of "browsing" the ontology in a better way, i.e somebody accessing http://purl.obofoundry.org/obo/OBI_0100102 would get only the class OBI_0100102. But it would be an official "ontology" with an import statement to the full ontology, so that we don't mislead anyone that the semantics can be understood without reference to the full ontology.


Related discussions


http://groups.google.com/group/obi-developer/browse_thread/thread/178768c384f4e9c6?hl=en

http://groups.google.com/group/obi-developer/browse_thread/thread/e226710bcc7cd50e?hl=en

http://groups.google.com/group/obi-developer/browse_thread/thread/d83deb3313911cd6?hl=en

http://groups.google.com/group/obi-developer/browse_thread/thread/40096a117a966339/87e1faafa44444b3#87e1faafa44444b3

http://groups.google.com/group/obi-developer/browse_thread/thread/ee60ca4a56440de5/1169f7dabf191bf5#1169f7dabf191bf5



References


[1] http://purl.org/, A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect.


[2] http://en.wikipedia.org/wiki/Domain_name_system, The Domain Name System (DNS) associates various sorts of information with so-called domain names; most importantly, it serves as the "phone book" for the Internet by translating human-readable computer hostnames, e.g. www.example.com, into the IP addresses, e.g. 208.77.188.166, that networking equipment needs to deliver information.



Acknowledgments


Thanks to Jonathan Rees and Chris Mungall for their help.