Introduction to Linked Data
Jakub Klímek
This work is licensed under a Creative Commons Attribution 4.0 International License.
Regular data ~ not linked
ID | Jméno | Hraje |
1235 | Joaquin Phoenix | Joker |
1234 | Robert De Niro | Joker |
... | | |
Hey, look at 1234, he’s cool!
Where and how do I get data about 1234?
Which Joker, the 5.6 or the 9.0 one?
[{
"id": "1234",
"id2": "az-11",
"name": {
"en": "Joker"
},
"rating": 9.0
}, {
"id": "1235",
"id2": "yt-18",
"name": {
"en": "Joker"
},
"rating": 5.6
}]
Stars in, or stars as?
Or this 1234?
This 1234?
Issues with regular, non-linked data
i.e. CSV, JSON, XML, Excel files...
3
ID | Name | Stars |
1235 | Joaquin Phoenix | Joker |
1234 | Robert De Niro | Joker |
... | | |
Issues => Additional requirements on data
i.e. CSV, JSON, XML, ...
4
Is there such a system?
The World Wide Web
Shared global space of documents
Built on top of several simple principles:
There are two kinds of applications working in this space of documents:
5
HTML
HTML
HTML
HTML
Web browser
Search engine
HTTP
HTTP
The World Wide Web - what can we do with it?
6
HTML
HTML
HTML
HTML
Web browser
Search engine
HTTP
HTTP
The World Wide Web - 30 years, Sir Tim Berners-Lee
7
30th Anniversary of the World Wide Web @ CERN�(March 2019)
From the Web to Linked Data
8
Web of Documents
9
�Lots of information about Prague in the Web of Documents. Problems:
Prague budget
Basic info about Prague
Prague public contracts
Demography of Prague
EU funded projects in Prague
Web of Documents
10
�Queries, for which there is information:
Prague budget
Basic info about Prague
Prague public contracts
Demography of Prague
EU funded projects in Prague
The World Wide Web
11
HTML
HTML
HTML
HTML
Web browser
Search engine
HTTP
HTTP
Database A
Database B
Database D
Database C
The World Wide Web
Different APIs provide machine readable data for further processing in so called mash-up applications.
Also built on several simple principles:
12
Database A
Database B
Database D
Database C
Mash-up App
Mash-up App
HTTP
Proprietary Data API A
HTTP
HTTP
HTTP
Proprietary Data API C
Proprietary Data API D
Proprietary Data API B
Social network silos
13
Problems with data on the current Web
14
Web of Documents | Current Web IS NOT Web of Data! |
URLs as unique global identifiers of documents | no unique global identifiers of things |
HTML as a format for publishing documents | many formats for publishing data (XML, JSON, CSV, XLS, ...) |
HTTP for localization and accessing documents by their URLs | HTTP for localization of APIs and accessing them (REST) [but not for localization of things and accessing their data] |
hyperlinks between documents | none of current formats enables us to link related entities |
Can we apply the principles of the Web to data?
Linked Data ~ the Web of Data
Principles, “best practices” for publishing and linking data about entities on the Web.
15
BEST PRACTICE
The principles of Linked Data
16
Web of Documents without the first two principles
Web pages without URLs
17
ID | Name | Stars |
1235 | Joaquin Phoenix | Joker |
1234 | Robert De Niro | Joker |
... | | |
Things as first-class citizens
18
Project
CZ.2.16/2.1.00/22189
Prague City
Prague Council
Prague Demography
Prague Budget
Contract
DIL/23/07/007302/2010
1. + 2. Use HTTP URIs as names for things
19
Project
CZ.2.16/2.1.00/22189
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/council
http://praha.eu/city
1. + 2. Use HTTP URIs as names for things
20
Project
CZ.2.16/2.1.00/22189
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/council
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
1. + 2. Use HTTP URIs as names for things
21
Project
CZ.2.16/2.1.00/22189
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/council
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/contract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/prague/demogstat
The principles of Linked Data
22
Web of Documents without the third principle
Web pages are in many formats, not only HTML
Thanks for the URI of your web page
… we all know this - how many times you click on a link and PDF/Word/Excel opens
23
RDF - Resource Description Framework - idea
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
24
Jakub Klímek studied at Charles University .
predicate
object
subject
Jakub Klímek Charles University
studied at
RDF - Resource Description Framework - data
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
25
Playground Revitalization
Name
Contracting�Authority
31.8.2011
Estimated end date
http://praha.eu/contract/7302
http://praha.eu/council
RDF - Resource Description Framework - URLs
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
26
Playground Revitalization
http://purl.org/dc/terms/title
http://purl.org/procurement/public-contracts#contractingAuthority
31.8.2011
http://purl.org/procurement/public-contracts#estimatedEndDate
http://praha.eu/contract/7302
http://praha.eu/council
RDF - Resource Description Framework - prefixes
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
27
Playground Revitalization
dcterms:title
pc:contractingAuthority
31.8.2011
pc:estimatedEndDate
http://praha.eu/contract/7302
http://praha.eu/council
RDF - Resource Description Framework - entities
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
28
Client
Playground Revitalization
Authority: Prague
Delivery date: 31.8.2011
Price: 28 444 000 CZK
...
Playground Revitalization
28444000
CZK
dcterms:title
pc:contracting
Authority
pc:agreedPrice
gr:hasCurrency
gr:hasCurrency
Value
31.8.2011
pc:estimatedEndDate
http://praha.eu/contract/7302
http://praha.eu/contract/7302
http://praha.eu/
contract/7302/price
http://praha.eu/council
RDF - serializations
RDF - graph based data model - a set of triples
Triple describes a relation as:
subject predicate object
2004 & 2014 W3C Recommendations
Triples are written in one of RDF notations / syntaxes:
RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG
29
Client
Playground Revitalization
Authority: Prague
Delivery date: 31.8.2011
Price: 28 444 000 CZK
...
http://praha.eu/contract/7302
<http://www.praha.eu/contract/7302>
dcterms:title "Playground Revitalization" ;
pc:estimatedEndDate "31.8.2011" ;
pc:agreedPrice <http://www.praha.eu/contract/7302/price> ;
pc:contractingAuthority <http://www.praha.eu/council> .
<http://www.praha.eu/contract/7302/price>
gr:hasCurrency "CZK" ;
gr:hasCurrencyValue "28444000" .
Technical detour: HTTP Accept header and URIs
30
Web browser
esfcr.cz
HTTP
(HTML)
http://esfcr.cz/.../projekt/
CZ10421016300169
Applications
HTTP
(RDF)
<http://esfcr.cz/data/projekt/CZ10421016300169>
esf:nazev "INNOSTART" ;
esf:registracni_cislo "CZ.1.04/2.1.01/63.00169" ;
esf:castka "4711681" ;
esf:realizace_od "2011-06-01" ;
esf:realizace_do "2013-03-31" ;
esf:realizator <http://esfcr.cz/.../25438352> ;
esf:partner <http://esfcr.cz/.../25438352> ;
esf:kontaktni_osoba <http://esfcr.cz/.../8541274571>;
esf:region <http://esfcr.cz/.../ustecky> .
http://esfcr.cz/.../projekt/
CZ10421016300169
The principles of Linked Data
31
Web of Documents without the fourth principle
Web pages without links to other pages
32
4. Include links to other URIs (provide context)
33
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/contract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/prague/demogstat
http://praha.eu/council
4. Include links to other URIs (provide context)
34
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/contract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/prague/demogstat
c: hasBeneficiary
a:fundedBy
b:hasBudget
http://praha.eu/council
d:hasDemography
4. Include links to other URIs (provide context)
35
praha.eu (Prague)
http://praha.eu/contract/7302
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
risy.cz (Regional Information Service)
http://risy.cz/
location/prague
http://risy.cz/contract/22189-01
http://risy.cz/
project/22189
czso.cz (Czech Statistical Office)
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/prague/demogstat
c: hasBeneficiary
a:fundedBy
b:hasBudget
http://praha.eu/council
d:hasDemography
owl:sameAs
owl:sameAs
The Web of Data
36
http://praha.eu/contract/7302
http://praha.eu/city
http://mfcr.cz/
prague/budget
http://mfcr.cz/
prague
http://risy.cz/
location/prague
http://risy.cz/contract/22189-01
http://risy.cz/
project/22189
http://registry.
czso.cz/prague
http://czso.cz/
prague
http://czso.cz/prague/demogstat
c: hasBeneficiary
a:fundedBy
b:hasBudget
http://praha.eu/council
d:hasDemography
owl:sameAs
owl:sameAs
Web of documents vs. Web of data (Linked Data)
37
Web of documents | Linked Data |
HTML as document publication format | RDF as a data publication format |
URL as a unique global document identifier | URL as a unique global entity identifier |
HTTP protocol for accessing documents using their URL | HTTP protocol for accessing data about entities using their URL |
Links to other documents | Links to other entities |
| vocabularies – standards for common data representation |
Linked Data = Technical interoperability solution
38
HTTPAPI
FTP
SOAP�WSDL
TriG dump
IRI dereference
SPARQL
#LD
Web of Data
39
If the data from those web sites was published as Linked Data, getting the answer to the queries, e.g.
would have only one step:��1. Write the query answering the question
(in the worst case there would be a preceding “download data” step if federated querying was not supported)
Prague budget
Basic info about Prague
Prague public contracts
Demography of Prague
EU funded projects in Prague
The Web of Data - will it be successful?
40
The LOD cloud - 2007 - 12 datasets
41
The LOD cloud - 2008 - 32 datasets
42
The LOD cloud - 2009 - 89 datasets
43
The LOD cloud - 2010 - 203 datasets
44
The LOD cloud - 2011 - 295 datasets
45
The LOD cloud - 2014 - 570 datasets
46
The LOD cloud - 2017 - 1146 datasets
47
The LOD cloud - 2018 - 1229 datasets
48
The LOD cloud - 2019 - 1239 datasets
49
The LOD cloud - 2023
50
Vocabularies in the Web of Data
51
Web of documents | Linked Data |
HTML as document publication format | RDF as a data publication format |
URL as a unique global document identifier | URL as a unique global entity identifier |
HTTP protocol for accessing documents using their URL | HTTP protocol for accessing data about entities using their URL |
Links to other documents | Links to other entities |
| vocabularies – standards for common data representation |
Vocabularies in the Web of Data
52
Playground Revitalization
http://purl.org/dc/terms/title
http://purl.org/procurement/public-contracts#contractingAuthority
31.8.2011
http://purl.org/procurement/public-contracts#estimatedEndDate
http://praha.eu/contract/7302
http://praha.eu/council
Vocabularies in the Web of Data
53
rdf:type
http://praha.eu/contract/7302
pc:Contract
object��Class
Linked Open Vocabularies
Catalog of vocabularies used on the Web of Data
Basic rule - vocabulary reuse
54
https://lov.linkeddata.es/dataset/lov/
LD is a prerequisite. Data Quality is what is missing
55
Linked Data + Core Vocabularies
Data Quality
How to start with Linked Data
56
How to start with Linked Data
57
How to start with Linked Data
58
How to publish Linked Data
59
Triplestore implementations
Enterprise grade
60
Data transformation tools
61
Linked Enterprise Data, Knowledge Graphs
62
Application of LD in industry = Knowledge Graphs
63
Application of LD in industry = Knowledge Graphs
64
Application of LD in industry = Knowledge Graphs
65
What next? Re-decentralization of Web
66
Solid - Social Linked Data
67
Open Data
68
Open data - 5 star classification�Legal and technical maturity of data
69