Data models, visualizations and Shape Expressions for RDF
Combining RDFShape & Cytoscape
Jose Emilio Labra Gayo
WESO Research Group
University of Oviedo, Spain
Andra Waagmeester
Micelio
Eric Prud'hommeaux
Micelio, Janeiro Digital
RDF & linked data
The good things…
Flexible and extensible
Powerful query language: SPARQL
Interoperability & integration
Lots of success stories
…and the challenges
Too much flexibility? Inconsistencies
Is it really easy to query? Where to start?
Are 2 data models compatible?
...but a lot of work to do yet
Long term goal: Increase quality and usability of RDF data
Validating and describing RDF data
ShEx - Shape Expressions
Human-readable & machine processable
Syntax inspired by SPARQL, Turtle
Semantics inspired by Relax NG
Recently adopted by Wikidata/Wikibase
Online Free HTML version http://book.validatingrdf.com
<User> {ďż˝ :name xsd:string ;ďż˝ :age xsd:integer ? ;ďż˝ :enrolledIn @<Course> + ;ďż˝ :knows @<User> * ;ďż˝}ďż˝<Course> {ďż˝ :subject xsd:string + ;ďż˝ :students @<User> {1,20}ďż˝}
RDFShape
Online RDF playground: http://rdfshape.weso.es/
It can be used to:
Information about RDF data and endpoints
Query RDF data and endpoints with SPARQL
Validate RDF data and endpoints with ShEx and SHACL
Information and visualization of ShEx schemas
Conversion between different formats
Tasks for RDF validation at RDFShape
RDF data or
SPARQL endpoint
select
nodes
Derive
schema
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
dcterms:isPartOf IRI ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
Schema
derived
refine
Schema visualization
validate
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
rdf:type obo:SO_0001060 ; # sequence variant
dcterms:isPartOf umcu:chromosome ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
Schema
Legend
RDF node selected
RDF node that conforms with schema
RDF node that doesn't conform
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
rdf:type obo:SO_0001060 ; # sequence variant
dcterms:isPartOf umcu:chromosome ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
}
. . .
ShEx schema
visualize
RDFShape
improve
Shape map
Some thoughts and goals
RDF validation is increasingly being adopted
Towards a Shapes ecosystem
Domain experts like UML diagrams and visualizations
Tools usability is still primitive
Goal: Develop better tools to help RDF data modellers
Proposal to Biohackathon’19
RDFShape & Cytoscape integration
RDF data or
SPARQL endpoint
select
nodes
Derive
schema
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
dcterms:isPartOf IRI ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
Schema
derived
refine
Schema visualization
validate
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
rdf:type obo:SO_0001060 ; # sequence variant
dcterms:isPartOf umcu:chromosome ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
Schema
Legend
RDF node selected
RDF node that conforms with schema
RDF node that doesn't conform
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX dcterms: <http://purl.org/dc/terms/>
umcu:variant {
rdf:type obo:SO_0001060 ; # sequence variant
dcterms:isPartOf umcu:chromosome ;
wdp:P644 xsd:int ;
wdp:P645 xsd:int ;
wdp:P2576 LITERAL ; # P2576 = UCSC Genome
wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature
wdp:P3343 umcu:gene ; # P3343 = variant of
}
umcu:chromosome {
rdf:type [wd:Q37748] ;
dcterms:identifier LITERAL ;
}
. . .
ShEx schema
visualize
RDFShape
improve
Shape map
Cytoscape
Goals at Biohackathon’19
RDFShape & Cytoscape
Separate Client/Server code
Client: React app https://github.com/labra/rdfshape-client
Server: Scala (http4s): https://github.com/labra/rdfshape
RDF data, validation results visualization
Suggestions/use cases for RDF validation
Learn from users
Organizing a ShEx-athon?
End of presentation