1 of 25

Bioschemas: �Marking up biodiversity websites to improve �data discovery and web-scale integration

* Wimmics: AI in bridging social semantics and formal semantics on the Web

TDWG Webinar, 2021-03-10

Franck MICHEL*

Bioschemas Community http://bioschemas.org/people/

1

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

2 of 25

Semantic markup for web pages

2

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

3 of 25

3

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

4 of 25

: semantic markup for resources on the internet

Collaborative community project founded in 2011 by

Define a common vocabulary to markup resources on the internet

  • Structured data makes resources understandable to search engines
  • Improve ranking, discoverability
  • Provide informative summarizations

schema.org

Microdata

RDFa

Microformats

Markup formats

4

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

5 of 25

: semantic markup for resources on the internet

Collaborative community project founded in 2011 by

Define a common vocabulary to markup resources on the internet

  • Structured data makes resources understandable to search engines
  • Improve ranking, discoverability
  • Provide informative summarizations

Microdata

RDFa

Microformats

schema.org

Markup formats

5

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

6 of 25

: semantic markup for resources on the internet

What we are �talking about: �types (778)

What we can say about those things: �properties (1369)

schema.org

6

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

7 of 25

How to share your biodiversity data?

Webpages

Web API�Linked Data KG

Integrative approach GBIF, EoL, iDigBio…

simple

sophisticated

Flat files

7

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

8 of 25

How to share your biodiversity data?

Web API�Linked Data KG

Integrative approach GBIF, EoL, iDigBio…

simple

sophisticated

Webpages

Flat files

8

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

9 of 25

Bioschemas: schema.org extension for Life Sciences

Community initiative built on top of Schema.org

Aim

Help search engines understand and index webpages

Improve resources discoverability and interoperability

Approach

Reuse/extend Schema.org for life sciences

Keep it simple (no complex domain ontology)

Provide guidelines on how to markup resources

      • Minimum/recommended/optional properties
      • Link to other vocabularies & domain ontologies

Flexibility: recommandations, not constraints

Support software

Specification

Data model

Minimum information

Controlled vocabularies

Cardinality

Documentation

Examples

New (properties | types)

9

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

10 of 25

Bioschemas: schema.org extension for Life Sciences

Currently defined terms

  • ChemicalSubstance
  • DataCatalog
  • Dataset
  • Gene
  • MolecularEntity
  • Protein
  • Sample
  • Taxon

More terms to come

  • BioSample
  • ComputationalTool
  • ComputationalWorkflow
  • LabProtocol
  • Phenotype
  • ProteinStructure
  • RNA
  • TaxonName

10

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

11 of 25

Taxon

Type: https://bioschemas.org/types/Taxon meant to become http://schema.org/Taxon

Profile: https://bioschemas.org/profiles/Taxon provides usage recommendations

dwc:vernacularName

11

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

12 of 25

Example markup of a page about taxon Delphinapterus leucas

<script type="application/ld+json">

{

"@context": "http://schema.org",

"@type" : "Taxon",

"name": "Delphinapterus leucas (Pallas, 1776)",

"taxonRank": "species"

}

</script>

12

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

13 of 25

Example markup of a page about taxon Delphinapterus leucas

<script type="application/ld+json">

{

"@context": [

"http://schema.org",

{ "dwc": "http://rs.tdwg.org/dwc/terms/",

"dwc:vernacularName": { "@container": "@language" }

}

],

"@type" : "Taxon",

"additionalType": "dwc:Taxon",

"taxonRank": ["species", "http://www.wikidata.org/entity/Q7432" ],

"name": "Delphinapterus leucas (Pallas, 1776)",

"alternateName": [ "Balaena albicans Muller, 1776", "Beluga catodon Gray, 1846" ],

"dwc:vernacularName": [

{ "@language": "en", "@value": "Beluga Whale" },

{ "@language": "fr", "@value": "Bélouga" }

],

"parentTaxon": {

"@type": "Taxon",

"name": "Delphinapterus Lacépède, 1804",

"mainEntityOfPage": "https://inpn.mnhn.fr/espece/cd_nom/191588?lg=en",

"taxonRank" : "genus"

},

"image": "https://inpn.mnhn.fr/photos/uploads/webtofs/inpn/3/181473.jpg"

}

</script>

13

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

14 of 25

Example markup of a page about taxon Delphinapterus leucas

<script type="application/ld+json">

{

...

"sameAs": [

"http://doris.ffessm.fr/Especes/Delphinapterus-leucas-Beluga-868",

"http://www.marinespecies.org/aphia.php?p=taxdetails&id=137115",

"http://www.iucnredlist.org/details/6335"

],

"identifier": [

"60932",

{ "@type": "PropertyValue",

"name": "WoRMS id",

"propertyID": "http://www.wikidata.org/entity/P850", # WoRMS id

"value": "137115"

}

],

}

</script>

14

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

15 of 25

What about names registries such as�IPNI, Zoobank, Mycobank?

Photo: https://commons.wikimedia.org/wiki/File:Name_label.JPG

15

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

16 of 25

TaxonName

Taxon

16

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

17 of 25

Example markup of a page about name Delphinapterus leucas

<script type="application/ld+json">

{

"@context": "http://schema.org",

"@type" : "Taxon",

"taxonRank": "species“,

"scientificName": {

"@type" : "TaxonName",

"name": "Delphinapterus leucas",

"author": "(Pallas, 1776)",

"taxonRank": "species"

},

"alternateScientificName": [

{ "@type" : "TaxonName",

"name": "Balaena albicans",

"author": "Muller, 1776",

"taxonRank": "species"

},

{ "@type" : "TaxonName",

"name": "Beluga catodon",

"author": "Gray, 1846",

"taxonRank": "species"

}

]

}

</script>

<script type="application/ld+json">

{

"@context": "http://schema.org",

"@type" : "TaxonName",

"name": "Delphinapterus leucas",

"author": "(Pallas, 1776)",

"taxonRank": "species"

}

</script>

Taxon with TaxonName

TaxonName alone

17

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

18 of 25

Live �deployments

Photo: https://www.flickr.com/photos/35034363287@N01/2284904309

18

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

19 of 25

Early deployment at NMNH Paris

180,000+ pages marked up with �Taxon & TaxonName types

19

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

20 of 25

Early deployments

NMNH Paris

Taxon & TaxonName, 180K pages

GBIF

Taxon & TaxonName, 3M pages

Scholia

Taxon, 2.7M pages�Scientific bibliographic information based on Wikidata

PIPPA

PSB Int. for Plant Phenotype Analysis

Taxon BioChemEntity

OpaleSurfCasting.net

Taxon�French leisure sea fishing legislation.

Why do early deployments matter?

  • A way for the community to show its interest in having these terms
  • Necessary for Schema.org to endorse new types
  • First step to foster novel applications (chicken & egg)

20

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

21 of 25

Next steps

21

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

22 of 25

Bioschemas work on biodiversity

Currently:

Taxon, TaxonNameLinks to DwC terms

Future

SpecimenLinks to ABCD, openDS, MIDS?

TraitsLinks to traits ontologies?

OccurrenceLinks to DwC occurrences?

22

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

23 of 25

Marking up biodiversity resources… at scale

GBIF, EoL, CoL, iDigBio, DiSSCo…

Museum collections,

Literature (BHL, Plazi…),

Citizen science platforms,

Independent institutions,

Associations,

23

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

24 of 25

Take-aways

Marking up webpages

Let’s have search engines �do the job for us!

    • Connect pieces of data at web scale
    • First step for data integration is discovery
    • Dataset search engines
    • What about a Species Search Engine?
    • Increases data visibility and discoverability
    • Relatively inexpensive
    • Connect unconnected pieces of data,�e.g. “grey literature”

Not the magic bullet

    • Names discrepancies
    • Compliance with nomenclature
    • How to name taxonomic ranks

24

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France

25 of 25

https://bioschemas.org/�https://github.com/BioSchemas/specifications/wiki

Questions?

25

Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France