Making (not only life sciences) data resources more Interoperable and Discoverable on the Web
Alasdair J.G. Gray �Bioschemas Steering Council Chair�Heriot-Watt University – ELIXIR-UK
NFDI InfraTalk
4 April 2022
(Bio)schemas:
Bioschemas: Markup for the Life Sciences
Picture: Carole Goble, Turing Lecture 2018
Schema.org: Enhanced Search Results
Picture: Carole Goble, Turing Lecture 2018
Global, lightweight vocabulary of terms
What we can say about those things
What we are talking about
Google Search
http://bioschemas.org
5
Google Search
http://bioschemas.org
6
Oct 2020
Nov 2021
Google Dataset Search (Nov 2021)
http://bioschemas.org
7
https://datasetsearch.research.google.com
https://www.blog.google/products/search/making-it-easier-discover-datasets/
Datasets
Schema definition:
8
Google Dataset Profile
Google Dataset Profile
Other profiles: Events, Jobs, ...
https://developers.google.com/search/docs/data-types/dataset
Bioschemas: Schema.org for the life sciences
Profile over Schema.org model + Bioschemas extensions
Layer of constraints + documentation
Data model
Minimum information
Controlled vocabularies
Cardinality
Documentation
New (properties | types)
Bioschemas �Profile
Data model
Marginality (Minimum | Recommended | Optional)
Controlled vocabularies
Cardinality (ONE | MANY)
Documentation
Examples
New types and properties
Bioschemas: Lightweight semantics
Findable
Accessible
Interoperable
Reusable
Bioschemas Community
bioschemas.org/�liveDeploys
bioschemas.org/�liveDeploys
bioschemas.org/�liveDeploys
23
Types
37
Profiles
83
Sites
80M
Pages
Over
162
Profile deployments
61
ELIXIR deployments
Markup
Live Deployment List�bioschemas.org/liveDeploys
Existing Deployed Markup
MolecularEntity profile
ChemicalSubstance Profile: https://bioschemas.org/profiles/ChemicalSubstance
MolecularEntity Profile: https://bioschemas.org/profiles/MolecularEntity
Adding Bioschemas to a Resource
http://tiny.cc/bs-live-deploy
Profile Creation Process
Mapping
Profile
Use cases
Mockup
Adoption
Testing
Application
Bioschemas: Profiles & Deployments
Released Profiles
Picture: Carole Goble, Turing Lecture 2018
100+ Deployments: Logos of some sites with Bioschemas markup
Exploiting Bioschemas Markup
Specialised Search: TeSS
http://bioschemas.org
23
29 November 2018
Bioschemas Course:
Bioschemas CourseInstance:
Bioschemas TrainingMaterial:
toxicology
No need for custom APIs
No concept merging
Bioschemas Profile for Workflows�
270+ workflow management systems
Bioschemas profile
- Minimum Information for Registering a Computational Workflow
- Creators, inputs, outputs, WfMS type ….
Working with WfMS providers to extract and add Bioschemas markup
Workflow Registry for discovery
Workflow Package
for exchange & portability
Search
Mark-up,
Validation
https://bioschemas.org/profiles/ComputationalWorkflow/
Community Registry: IDPcentral
http://bioschemas.org
25
29 November 2018
Protein
No need for custom APIs
FAIR community registry of Bioschemas metadata
SequenceAnnotation
SequenceRange
Concept merging
IDP Data Sources
26
Bioschemas Markup for IDP
Not shown:
27
Legend
Red: Schema.org
Blue: Pending Schema.org
Green: Bioschemas
BMUSE: Bioschemas Markup Scraper and Extractor
Harvested Markup
No need for custom APIs
Concept merging
Identity Reconciliation
Concept merging
IDP-KG: Merging Entries Options
Concept merging
Keep all values with provenance �of source
Data Verification
32
IDP-KG Interfaces
SPARQL Endpoint
Annotation count per protein
REST API
Querying IDP-KG
34
Querying IDP-KG
35
Count by type
Protein information
Annotation count per protein
Annotation information
Annotation count by term code
36
Bioschemas Data Harvest
Project 29:
#29_bioschemas
Alban Gaignard, Leyla Jael Garcia Castro, Alasdair J. G. Gray / Online
Pages harvested
413,748
Sites harvested
25 (partially)
https://swel.macs.hw.ac.uk/data/
Global Research Graph: OpenAIRE
http://bioschemas.org
39
29 November 2018
No need for custom APIs
Concept merging
Terminology transformation
Markup needs to have:
EOSC-ENHANCE Bioschema use-case
VALIDATE
REGISTER
12345
12345
Added-value services
Slide credit: Paolo Manghi, ISTI, CNR, Pisa, Italy
Project 29:
#29_bioschemas
Alban Gaignard, Leyla Jael Garcia Castro, Alasdair J. G. Gray / Online
Bioschemas
FAIRsharing
Harvesting & Mapping
Lorem 4
Connecting TeSS and bio.tools
Work by Alban Gaignard �(ELIXIR-FR)
2019
Automated Data Curation
http://bioschemas.org
Bioschema DataCatalog:
Data Exchange: Without an API�MarRef → BioSamples
https://github.com/EBIBioSamples/bioschemas_marref_demo/blob/master/Summary.md
Bioschemas �Scraper
Rich snippet generation
Summary
Bioschemas: Markup for the Life Sciences
Picture: Carole Goble, Turing Lecture 2018
Bioschemas
What?
How?
Approach can be followed by other domains!
Future Challenges
49
Acknowledgements http://bioschemas.org/people
http://bioschemas.org
Join Bioschemas: http://bioschemas.org/howtojoin/
bioschemas.org
@bioschemas
github.com/�bioschemas
Bioschemas�Community Call
4th Monday of the month
17:00 CET, 25 April 2022
tiny.cc/bs-slack