Bioschemas: �Marking up biodiversity websites to improve �data discovery and web-scale integration
* Wimmics: AI in bridging social semantics and formal semantics on the Web
Franck MICHEL*
Workshop on �Data Standards & Common Language
NFDI 4 Biodiversity
1
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Semantic markup for web pages
2
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
: semantic markup for resources on the internet
Collaborative community project founded in 2011 by
Define a common vocabulary to markup resources on the internet
schema.org
What we are �talking about: �types (797)
What we can say about those things: �properties (1453)
3
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas: schema.org extension for Life Sciences
Community initiative built on top of Schema.org
Aim
Help search engines understand and index webpages
Improve resources discoverability and interoperability
Approach
Reuse/extend Schema.org for life sciences
Keep it simple (no complex domain ontology)
Provide guidelines on how to markup resources
Support software
Specification
Data model
Minimum information
Controlled vocabularies
Cardinality
Documentation
Examples
New (properties | types)
4
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas: schema.org extension for Life Sciences
Released terms
Terms in draft status
5
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas: schema.org extension for Life Sciences
Released terms
Terms in draft status
6
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Taxon
Type: http://schema.org/Taxon
Profile: https://bioschemas.org/profiles/Taxon provides usage recommendations
7
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Example: webpage about taxon Delphinapterus leucas
<script type="application/ld+json">
{
"@context": [
"http://schema.org",
{ "dct": "http://purl.org/dc/terms/" }
],
"@type" : "Taxon",
"@id" : "60932",
"dct:conformsTo" : {
"@id": "https://bioschemas.org/profiles/Taxon/0.6-RELEASE",
"@type": "CreativeWork"
},
"name": "Delphinapterus leucas (Pallas, 1776)",
"taxonRank": "species"
}
</script>
8
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Example markup of a page about taxon Delphinapterus leucas
<script type="application/ld+json">
{
"@context": [
"http://schema.org",
{ "dct": "http://purl.org/dc/terms/"
"dwc": "http://rs.tdwg.org/dwc/terms/",
"dwc:vernacularName": { "@container": "@language" }
}
],
"@type" : "Taxon",
"@id" : "60932",
"dct:conformsTo" : {
"@id": "https://bioschemas.org/profiles/Taxon/0.6-RELEASE",
"@type": "CreativeWork"
},
"name": "Delphinapterus leucas (Pallas, 1776)",
"taxonRank": ["species", { "@id": "http://www.wikidata.org/entity/Q7432" } ],
"additionalType": "dwc:Taxon",
"alternateName": [ "Balaena albicans Muller, 1776", "Beluga catodon Gray, 1846" ],
"dwc:vernacularName": [
{ "@language": "en", "@value": "Beluga Whale" },
{ "@language": "fr", "@value": "Bélouga" }
],
...
9
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Example markup of a page about taxon Delphinapterus leucas
...
"parentTaxon": {
"@type": "Taxon",
"name": "Delphinapterus Lacépède, 1804",
"mainEntityOfPage": "https://inpn.mnhn.fr/espece/cd_nom/191588?lg=en",
"taxonRank" : "genus"
},
"image": "https://inpn.mnhn.fr/photos/uploads/webtofs/inpn/3/181473.jpg"
"sameAs": [
"http://doris.ffessm.fr/Especes/Delphinapterus-leucas-Beluga-868",
"http://www.marinespecies.org/aphia.php?p=taxdetails&id=137115",
"http://www.iucnredlist.org/details/6335"
],
"identifier": [
{ "@type": "PropertyValue",
"name": "WoRMS id",
"propertyID": "http://www.wikidata.org/entity/P850", # WoRMS id
"value": "137115"
}
],
}
</script>
10
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
TaxonName
Taxon 0.7-DRAFT
Type: https://bioschemas.org/types/TaxonName meant to become http://schema.org/TaxonName
Profile: https://bioschemas.org/profiles/TaxonName provides usage recommendations
11
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
BioSample
Type: https://bioschemas.org/types/BioSample meant to become http://schema.org/BioSample
Profile: https://bioschemas.org/profiles/BioSample provides usage recommendations
12
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Live �deployments
Photo: https://www.flickr.com/photos/35034363287@N01/2284904309
13
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas: profiles & deployments
Released Profiles
Picture: Carole Goble, Turing Lecture 2018
100+ deployments, 100M webpages
14
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Taxon/TaxonName deployments
Leisure sea fishing legislation.
PSB Int. for Plant Phenotype Analysis
Profiles for researchers, organizations, journals, publishers…
15
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Why do (early) deployments matter?
16
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Searching, aggregating, exploiting �Bioschemas markup
17
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Exploiting Bioschemas markup
18
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Community Registry: IDPcentral
name
description...
Protein
FAIR community registry of Bioschemas metadata
name
description...
SequenceAnnotation
rangeStart
rangeEnd...
SequenceRange
BMUSE
Slide by Alasdair Gray. (Bio)schemas: Making (not only life sciences) data resources more Interoperable and Discoverable on the Web. NFDI InfraTalk, 2022.
19
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
BMUSE: Bioschemas Markup Scraper and Extractor
Slide by Alasdair Gray. (Bio)schemas: Making (not only life sciences) data resources more Interoperable and Discoverable on the Web. NFDI InfraTalk, 2022.
20
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
From page-centric data to concept-centric knowledge
Harvested markup is page-centric.�
But multiple sites/pages may represent the same concept, each site using their own IDs.
Slide by Alasdair Gray. (Bio)schemas: Making (not only life sciences) data resources more Interoperable and Discoverable on the Web. NFDI InfraTalk, 2022.
21
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Next steps
22
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bioschemas work on biodiversity
Currently:
Taxon, TaxonName�Links to DwC terms
BioSample, Dataset…
Future
Occurrence �Links to DwC occurrences?
Specimen �Links to ABCD, openDS, MIDS?
Traits �Links to traits ontologies?
…
23
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Marking up biodiversity resources… at scale
GBIF, EoL, CoL, iDigBio, DiSSCo…
Museum collections,
Literature (BHL, Plazi…),
Citizen science platforms,
Independent institutions,
Associations,
Grey literature…
24
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Take-aways
Marking up webpages
Let’s have search engines �do the job for us!
Not the magic bullet
25
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
https://bioschemas.org/�https://github.com/BioSchemas/specifications/wiki
Questions?
26
Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France