LINKED OPEN DATA
DANIELE FUSI - TIZIANA MANCINELLI
VENICE CENTRE FOR DIGITAL�AND PUBLIC HUMANITIES
JULY 8, 2020
Strand #1 - Part 1
Introduction
From the Web of Hypertexts�to the Web of Data
Towards a Global Data Representation
CSS
JS
HTML
XML
XSLT
representation
transformation
presentation
DB
XML
...
software
web 3.0 (data)�global data representation
HTML, CSS and JS are the pillars of the web.
HTML is a presentational markup, designed to mark the structure of hypertexts.
Hypertexts target human readers, and "talk" to them about any content: web 1.0, a web of hypertexts for humans.
similar separation in wider context: content in a separate backend, whatever its storage techs.
semantic markup: content independent and separate from presentation, which is
generated by software.
web 2.0 (applications)
presentation generated�from content
web 1.0 (hypertexts)�global data presentation
web of applications, where only the topmost layer of the systems targets humans. Much smarter: everything responds to specific users requests in real time, with presentations tailored to their interactions and preferences, like using a desktop app. Yet, it's smarter by virtue of these apps and their data, rather than of infrastructure.
bring the separation of content and presentation found in web applications into the web infrastructure itself, by providing a global data representation, rather than presentation through hypertexts; i.e. directly publish data, with globally standard models. A web of data, targeting machines: a sort of world-wide, uniformly modeled database, where anyone can publish his own data at any time.
software systems interact with users to get data on behalf of them, and present it in real time according to user interaction in a GUI
Mining Data from its Text Presentation
the human language is ambiguous
human languages are different
Homer
data in traditional pages is informal and lacks structure:�we just have a name ("Homer") in a text. The machine knows nothing about what’s a poet or a cartoon; it just relies on words, even if in a very smart and powerful way.
Representation: Structured Data
URI
URI
URI
Deployment requirements for more structured data
Tim Berners-Lee, 2010 - https://www.w3.org/2011/gld/wiki/5_Star_Linked_Data
Data Distribution: Tabular Data Sample
name
birthDate
birthPlace
Marco Polo
1254
Venice
Niccolò Polo
1230
Venice
Maffeo Polo
1230
Venice
column
column
column
row
row
row
metadata
ID
156
1201
368
column
list of persons:
meaning of each cell in a row of data
data records, each row is a person
Distributing by Rows
server B
server A
name
birthDate
birthPlace
Marco Polo
1254
Venice
Niccolò Polo
1230
Venice
Maffeo Polo
1230
Venice
name
birthDate
birthPlace
duplicate metadata
ID
156
1201
368
ID
to define the meaning of each column in rows
Distributing by Columns
server B
server A
name
birthDate
birthPlace
Marco Polo
1254
Venice
Niccolò Polo
1230
Venice
Maffeo Polo
1230
Venice
ID
156
1201
368
ID
156
1201
368
duplicate IDs
to link columns to the same row (person)
Distributing by Columns and Rows (Cells)
server B
server A
name
birthDate
birthPlace
Marco Polo
1254
Venice
Niccolò Polo
1230
Venice
Maffeo Polo
1230
Venice
ID
156
1201
368
name
ID
name
ID
ID
156
1201
368
ID
birthDate
birthPlace
ID
156
1201
368
ID
duplicate both IDs and metadata
this is the most atomic data distribution, and is the approach taken by SemWeb using statement-like constructs known as triples
Semantic Web Data Modeling: RDF
Resource Description Framework
https://www.w3.org/RDF/
http://www.w3.org/1999/02/22-rdf-syntax-ns#
Modeling Data in RDF: Triples
triple
Marco Polo
Marco Polo�@it
http://www.w3.org/2000/01/rdf-schema#label
http://dbpedia.org/resource/Marco_Polo
type = string
language = Italian
name
Marco Polo
ID
156
S
P
O
subject
predicate
object
any data is expressed by triples: e.g. Marco-Polo...
this implies having vocabularies defining all the resources used as S,P, or O, each with its own URI
A World of Vocabularies
Publishing Data on the Web
publishing data presentations (however generated) as hypertexts/GUIs for humans
publishing data representations as triples for machines
vocabulary A
vocabulary B
triples (URI)
triples from other ontologies
merge
link
web site (URL)
web 1.0/2.0
web 3.0
anyone can publish pages in a site, which gets hyperlinked to other sites.�Pages are identified by URLs.
anyone can publish data as triples, which get merged into the global data graph. Concepts are identified by URIs.
Assumptions for a Global World
layer 1
layer 1
layer 2
layer 1
Marco Polo
Niccolò Polo
has father
Marco Polo
Maffeo Polo
has father
Marco Polo
Niccolò Polo
has father
server of layer 2 down
Marco Polo
layer 2
Marco Polo
abc.com/mp
def.edu/h/156
Sample Graph
Marco Polo
person
explorer of Asia
Marco Polo�@it
Поло, Марко @ru
http://www.w3.org/2000/01/rdf-schema#label
Venice
Venezia�@it
1254
http://xmlns.com/foaf/0.1/Person
http://dbpedia.org/resource/Marco_Polo
http://dbpedia.org/resource/Venice
http://dbpedia.org/resource/Category:Explorers_of_Asia
Venedig�@de
http://dbpedia.org/ontology/birthDate
http://dbpedia.org/ontology/birthPlace
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#label
http://www.w3.org/2000/01/rdf-schema#label
X is a person;
... URIs are verbose!
X was born in 1254;
X is an explorer of Asia.
this place has name "Venice" (Italian), "Venedig" (German);
X has name "Marco Polo" (Italian), Поло, Марко (Russian);
X was born in a place;
everything is a triple, and each part in it has its own URI (except for literals)
Sample Graph
Marco Polo
person
explorer of Asia
Marco Polo�@it
Поло, Марко @ru
rdfs:label
Venice
Venezia�@it
1254
foaf:Person
dbr:Marco_Polo
dbr:Venice
dbc:Explorers_of_Asia
Venedig�@de
rdfs:label
rdfs:label
dbo:birthDate
dbo:birthPlace
rdf:type
rdf:type
foaf: http://xmlns.com/foaf/0.1/
rdfs: http://www.w3.org/2000/01/rdf-schema#
dbo: http://dbpedia.org/ontology/
dbc: http://dbpedia.org/resource/Category/
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
dbr: http://dbpedia.org/resource/
shorten URIs by replacing their first portion�with an arbitrarily chosen prefix (qnames)
More Nodes
Marco Polo
person
explorer of Asia
Marco Polo�@it
Поло, Марко @ru
rdfs:label
Venice
Venezia�@it
1254
rdf:type
foaf:Person
dbr:Marco_Polo
dbr:Venice
dbc:Explorers_of_Asia
Venedig�@de
rdfs:label
Venetian lagoon
dbr:Venetian_Lagoon
dbr:Ferdinand_Magellan
Magellan
rdf:type
rdf:type
Giulia Lama
dbr:Giulia_Lama
Alpi Eagles
dbr:Alpi_Eagles
dbo:hubAirport
Anthony Quinn
dbr:Anthony_Quinn
rdf:type
rdfs:label
rdfs:label
dbo:birthDate
dbo:birthPlace
dbo:nearestCity
dbo:deathPlace
The power of Linking Data: the value of the network is greater than the sum of its parts
jumping across nodes up to even remotely connected things, derived from different ontologies, all merged into the same, global data graph
Serializing Data
Serialization: Turtle - Tokens
"123"^^xs:integer
"1254-6-28"^^xs:date
@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@PREFIX foaf: <http://xmlns.com/foaf/0.1/>.
@PREFIX dbr: <http://dbpedia.org/resource/>.
dbr:Marco_Polo rdf:type foaf:Person.
Literals
QNames
dbr:Marco_Polo a foaf:Person.
Abbreviations
Serialization: Turtle - Statements
@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@PREFIX foaf: <http://xmlns.com/foaf/0.1/>.
@PREFIX dbr: <http://dbpedia.org/resource/>.
dbr:Marco_Polo� rdfs:label "Marco Polo"@it;
rdfs:label "Поло, Марко"@ru;
a foaf:Person.
3 triples, shared subject
Serialization: Turtle - BNodes
[] a foaf:Person;
foaf:name "Ted".
there is an otherwise unknown person named Ted
dbr:Marco_Polo x:hasMistress [
a foaf:Person.
]
bnode as S
bnode as O
Marco Polo had a mistress, and we don't know anything else about her
Turtle Sample
Marco Polo
person
explorer of Asia
Marco Polo�@it
Поло, Марко @ru
rdfs:label
Venice
Venezia�@it
1254
foaf:Person
dbr:Marco_Polo
dbr:Venice
dbc:Explorers_of_Asia
Venedig�@de
rdfs:label
rdfs:label
dbo:birthDate
dbo:birthPlace
rdf:type
rdf:type
@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@PREFIX foaf: <http://xmlns.com/foaf/0.1/>.
@PREFIX dbo: <http://dbpedia.org/ontology/>.
@PREFIX dbc: <http://dbpedia.org/resource/Category>.
dbr:Marco_Polo� rdfs:label "Marco Polo"@it;
rdfs:label "Поло, Марко"@ru;
a foaf:Person;
dbo:birthDate "1254-1-1"^^xsd:date;
dbo:birthPlace dbr:Venice;
a dbc:Explorers_of_Asia.
dbr:Venice� rdfs:label� "Venezia"@it,
"Venedig"@de.
preamble
Subject Predicate Object
1
2
3
4
5
6
7
8
LOD Lab
LAB: Adding and Presenting Data
CSS
JS
HTML
XML
XSLT
representation
presentation
transformation
semantically marked text
1. add RDF data to XML TEI about persons and places
2. apply XSLT transformation
3. examine the result
1A. manually write Turtle triples from realia.csv
1B. convert Turtle into RDF/XML and paste it
Materials
realia.csv
TEI.xml
RDF/XML
turtle
LOD.xsl
TEI.html
LOD.css
indigo-pink.css
realia-list.js
manual input
convert
paste
transform
result
LAB Steps
1. Adding Other Namespaces
<TEI xmlns="http://www.tei-c.org/ns/1.0"
xmlns:tei="http://www.tei-c.org/ns/1.0"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dbo="http://dbpedia.org/ontology/">
2. Adding RDF/XML to TEI Header
<xenoData>
<rdf:RDF>
<rdf:Description tei:ref="#MP" rdf:about="http://dbpedia.org/ontology/Marco_Polo">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
<rdfs:label xml:lang="en">Marco Polo</rdfs:label>
</rdf:Description>
<rdf:Description tei:ref="#NP" rdf:about="http://dbpedia.org/ontology/Niccolò_and_Maffeo_Polo">
<rdf:type rdf:resource="http://xmlns.com/
foaf/0.1/Person" />
<rdfs:label xml:lang="en">Niccolò Polo</rdfs:label>
</rdf:Description>
</rdf:RDF>
</xenoData>
RDF container for all persons and places
each person/place is a rdf:Description
Finding DBPedia Resources
see triples in various formats
Some Person Resources (realia.csv)
TEI ID as found in header's profileDesc
the corresponding DBPedia resource URI (minus the prefix)
3. Adding Persons: Turtle
http://dbpedia.org/ontology/Marco_Polo
http://xmlns.com/foaf/0.1/Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#label
dbp:
a
foaf:
rdfs:
@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@PREFIX dbo: <http://dbpedia.org/ontology/>.
@PREFIX foaf: <http://xmlns.com/foaf/0.1/>.
dbo:Marco_Polo a foaf:Person;
rdfs:label "Marco Polo"@en.
Converting Formats
Adding Persons – RDF/XML Result
<rdf:Description
tei:ref="#MP"
rdf:about="http://dbpedia.org/ontology/Marco_Polo">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdfs:label xml:lang="en">Marco Polo</rdfs:label>
</rdf:Description>
Some Places Data (realia.csv)
4. Adding Places
@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@PREFIX dbo: <http://dbpedia.org/ontology/>.
dbo:Venice a dbo:Place;
rdfs:label "Venice"@en.
<dbo:Place
tei:ref="#Venice"
rdf:about="http://dbpedia.org/resource/Venice">
<rdfs:label xml:lang="en">Venice</rdfs:label>
</dbo:Place>
as for persons, the only difference being the resource type, a dbo:Place rather than a foaf:Person
5. Transforming into Presentation
Presentation Result
Quick Glance
Final Result: HTML, CSS, JS from XML
each text color corresponds to a different TEI element
lines and columns numbers in gray
drop-letters
names and places connected to RDF have tips
fully working web apps for persons and places
Web app: Persons and Places From DBPedia
Web app: Feeding with Document's Data
<app-realia-list�type="person"�json-list='[{ "uri": "http://dbpedia.org/resource/Marco_Polo", "label": "Marco Polo" },�{ "uri": http://dbpedia.org/resource/Kublai_Khan", "label": "Kublai Khan"}]'�</app-realia-list> �
persons URIs and names
contains persons or places
Querying LOD Data
label(s)
birth and death
calculated age
abstract in selected language
link to Wikipedia
linked image
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT dbr:Marco_Polo as ?person ?name
?birth_date ?birth_place ?birth_place_label
?death_date ?death_place ?death_place_label
?topic ?depiction ?abstract
WHERE {
dbr:Marco_Polo a foaf:Person;
foaf:name ?name.
OPTIONAL {
dbr:Marco_Polo dbo:birthDate ?birth_date;
dbo:deathDate ?death_date;
foaf:isPrimaryTopicOf ?topic;
foaf:depiction ?depiction;
dbo:abstract ?abstract.
dbr:Marco_Polo dbo:birthPlace ?birth_place.
?birth_place rdfs:label ?birth_place_label.
dbr:Marco_Polo dbo:deathPlace ?death_place.
?death_place rdfs:label ?death_place_label.
}
FILTER(lang(?birth_place_label)="en")
FILTER(lang(?death_place_label)="en")
}
SPARQL Query
Querying LOD Data
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT dbr:Marco_Polo as ?person ?name
?birth_date ?birth_place ?birth_place_label
?death_date ?death_place ?death_place_label
?topic ?depiction ?abstract
WHERE {
dbr:Marco_Polo a foaf:Person;
foaf:name ?name.
OPTIONAL {
dbr:Marco_Polo dbo:birthDate ?birth_date;
dbo:deathDate ?death_date;
foaf:isPrimaryTopicOf ?topic;
foaf:depiction ?depiction;
dbo:abstract ?abstract.
dbr:Marco_Polo dbo:birthPlace ?birth_place.
?birth_place rdfs:label ?birth_place_label.
dbr:Marco_Polo dbo:deathPlace ?death_place.
?death_place rdfs:label ?death_place_label.
}
FILTER(lang(?birth_place_label)="en")
FILTER(lang(?death_place_label)="en")
}
Marco Polo
foaf:�Person
rdf:type
foaf:name
Marco Polo^^en
dbo:birthDate
1254-1-1
dbo:deathDate
1324-1-1
foaf:isPrimaryTopicOf
foaf:depiction
dbo:abstract
...
...
dbr:Republic_of_Venice
dbo:birthPlace
rdfs:label
Republic of Venice^^en
dbo:deathPlace
English only
must be a person
having a name
if present, we also want: birth/death date, wiki page, default picture, abstract...
... birth place and its label...
... death place and its label...
only English labels for places
pick these data, as specified below
en.Wikipedia.org...
commons.wikimedia...
not filtered by language,�get all the abstracts
Inspecting the Query
SPARQL
results in a table
execute
Data Presentation Logic
Data and Services Aggregation