1 of 9

Data models, visualizations and Shape Expressions for RDF

Combining RDFShape & Cytoscape

Jose Emilio Labra Gayo

WESO Research Group

University of Oviedo, Spain

Andra Waagmeester

Micelio

Eric Prud'hommeaux

Micelio, Janeiro Digital

2 of 9

RDF & linked data

The good things…

Flexible and extensible

Powerful query language: SPARQL

Interoperability & integration

Lots of success stories

…and the challenges

Too much flexibility? Inconsistencies

Is it really easy to query? Where to start?

Are 2 data models compatible?

...but a lot of work to do yet

Long term goal: Increase quality and usability of RDF data

3 of 9

Validating and describing RDF data

ShEx - Shape Expressions

Human-readable & machine processable

Syntax inspired by SPARQL, Turtle

Semantics inspired by Relax NG

Recently adopted by Wikidata/Wikibase

Online Free HTML version http://book.validatingrdf.com

<User> {ďż˝ :name xsd:string ;ďż˝ :age xsd:integer ? ;ďż˝ :enrolledIn @<Course> + ;ďż˝ :knows @<User> * ;ďż˝}ďż˝<Course> {ďż˝ :subject xsd:string + ;ďż˝ :students @<User> {1,20}ďż˝}

4 of 9

RDFShape

Online RDF playground: http://rdfshape.weso.es/

It can be used to:

Information about RDF data and endpoints

Query RDF data and endpoints with SPARQL

Validate RDF data and endpoints with ShEx and SHACL

Information and visualization of ShEx schemas

Conversion between different formats

5 of 9

Tasks for RDF validation at RDFShape

RDF data or

SPARQL endpoint

select

nodes

Derive

schema

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

dcterms:isPartOf IRI ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

Schema

derived

refine

Schema visualization

validate

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

rdf:type obo:SO_0001060 ; # sequence variant

dcterms:isPartOf umcu:chromosome ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

Schema

Legend

RDF node selected

RDF node that conforms with schema

RDF node that doesn't conform

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

rdf:type obo:SO_0001060 ; # sequence variant

dcterms:isPartOf umcu:chromosome ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

}

. . .

ShEx schema

visualize

RDFShape

improve

Shape map

6 of 9

Some thoughts and goals

RDF validation is increasingly being adopted

Towards a Shapes ecosystem

Domain experts like UML diagrams and visualizations

Tools usability is still primitive

Goal: Develop better tools to help RDF data modellers

7 of 9

Proposal to Biohackathon’19

RDFShape & Cytoscape integration

RDF data or

SPARQL endpoint

select

nodes

Derive

schema

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

dcterms:isPartOf IRI ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

Schema

derived

refine

Schema visualization

validate

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

rdf:type obo:SO_0001060 ; # sequence variant

dcterms:isPartOf umcu:chromosome ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

Schema

Legend

RDF node selected

RDF node that conforms with schema

RDF node that doesn't conform

PREFIX efo: <http://www.ebi.ac.uk/efo/>

PREFIX obo: <http://purl.obolibrary.org/obo/>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX wd: <http://www.wikidata.org/entity/>

PREFIX dcterms: <http://purl.org/dc/terms/>

umcu:variant {

rdf:type obo:SO_0001060 ; # sequence variant

dcterms:isPartOf umcu:chromosome ;

wdp:P644 xsd:int ;

wdp:P645 xsd:int ;

wdp:P2576 LITERAL ; # P2576 = UCSC Genome

wdp:P3331 LITERAL ; # P3331 = HGVS nomenclature

wdp:P3343 umcu:gene ; # P3343 = variant of

}

umcu:chromosome {

rdf:type [wd:Q37748] ;

dcterms:identifier LITERAL ;

}

. . .

ShEx schema

visualize

RDFShape

improve

Shape map

Cytoscape

8 of 9

Goals at Biohackathon’19

RDFShape & Cytoscape

Separate Client/Server code

Client: React app https://github.com/labra/rdfshape-client

Server: Scala (http4s): https://github.com/labra/rdfshape

RDF data, validation results visualization

Suggestions/use cases for RDF validation

Learn from users

Organizing a ShEx-athon?

9 of 9

End of presentation