1 of 69

Introduction to Linked Data

Jakub Klímek

2 of 69

Regular data ~ not linked

ID

Jméno

Hraje

1235

Joaquin Phoenix

Joker

1234

Robert De Niro

Joker

...

Hey, look at 1234, he’s cool!

Where and how do I get data about 1234?

Which Joker, the 5.6 or the 9.0 one?

[{

"id": "1234",

"id2": "az-11",

"name": {

"en": "Joker"

},

"rating": 9.0

}, {

"id": "1235",

"id2": "yt-18",

"name": {

"en": "Joker"

},

"rating": 5.6

}]

Stars in, or stars as?

Or this 1234?

This 1234?

3 of 69

Issues with regular, non-linked data

i.e. CSV, JSON, XML, Excel files...

  • Ambiguous identification of entities in data
    • Person with ID aaa1234 in a document located on my laptop in folder /data/temp/people.json
    • Another person with ID aaa1234 in the XML file on this CD�
  • Low findability and accessibility of data
    • Get data about person aaa1234 => Go to my laptop, open the folder, load/open the file, search/query
  • No contextual information
    • Person aaa1234 lives in Prague. I want to know more about Prague.�Where and how do I get the information?

3

ID

Name

Stars

1235

Joaquin Phoenix

Joker

1234

Robert De Niro

Joker

...

4 of 69

Issues => Additional requirements on data

i.e. CSV, JSON, XML, ...

  • Identification of entities in data
    • Global
    • Unique�
  • Findability and accessibility of data
    • Find data based on the identification
    • Access it in single format
  • Contextual information
    • When I access information, I want to know where and how to find more

4

Is there such a system?

5 of 69

The World Wide Web

Shared global space of documents

Built on top of several simple principles:

  1. HTML as a format for publishing documents
  2. URLs as unique global identifiers of documents
  3. HTTP for localization and accessing documents by their URLs
  4. hyperlinks between documents

There are two kinds of applications working in this space of documents:

  • web browsers (localizing and browsing documents through hyperlinks)
  • search engines (indexing and full text searching of documents)

5

HTML

HTML

HTML

HTML

Web browser

Search engine

HTTP

HTTP

6 of 69

The World Wide Web - what can we do with it?

  • Publish human-readable documents
  • Everyone can view them in their browser
    • if they know the URL
  • + links
    • To documents with yet unknown URL
    • From other documents
    • From catalogs
  • Fulltext search, keyword search

6

HTML

HTML

HTML

HTML

Web browser

Search engine

HTTP

HTTP

7 of 69

The World Wide Web - 30 years, Sir Tim Berners-Lee

7

8 of 69

From the Web to Linked Data

8

9 of 69

Web of Documents

9

�Lots of information about Prague in the Web of Documents. Problems:

  • Encoded in documents distributed across the Web
  • Documents intended for humans not computers
  • Documents about Prague or related things not linked
  • Computers not able to process data about Prague published on the Web

Prague budget

Basic info about Prague

Prague public contracts

Demography of Prague

EU funded projects in Prague

10 of 69

Web of Documents

10

�Queries, for which there is information:

  • Top 100 suppliers of Prague with headquarters outside of Prague region.
  • Money spent in Prague for new children playgrounds in the last 5 years per one child.
  • Organizations in Prague funded by EU structural funds and their top 100 suppliers.

Prague budget

Basic info about Prague

Prague public contracts

Demography of Prague

EU funded projects in Prague

11 of 69

The World Wide Web

  • Typically, there are underlying databases
  • From which, human readable documents are generated

  • ...and scraped by users who want to query it

11

HTML

HTML

HTML

HTML

Web browser

Search engine

HTTP

HTTP

Database A

Database B

Database D

Database C

12 of 69

The World Wide Web

Different APIs provide machine readable data for further processing in so called mash-up applications.

Also built on several simple principles:

  • XML/JSON as formats for publishing data
  • HTTP protocol for transferring data between APIs and applications

12

Database A

Database B

Database D

Database C

Mash-up App

Mash-up App

HTTP

Proprietary Data API A

HTTP

HTTP

HTTP

Proprietary Data API C

Proprietary Data API D

Proprietary Data API B

13 of 69

Social network silos

13

14 of 69

Problems with data on the current Web

14

Web of Documents

Current Web IS NOT Web of Data!

URLs as unique global identifiers of documents

no unique global identifiers of things

HTML as a format for publishing documents

many formats for publishing data (XML, JSON, CSV, XLS, ...)

HTTP for localization and accessing documents by their URLs

HTTP for localization of APIs and accessing them (REST) [but not for localization of things and accessing their data]

hyperlinks between documents

none of current formats enables us to link related entities

Can we apply the principles of the Web to data?

15 of 69

Linked Data ~ the Web of Data

Principles, “best practices” for publishing and linking data about entities on the Web.

  • Application of the proven principles of Web of Documents to data
  • 2 main goals
    • Machine readable and understandable data (based on the Semantic Web)
    • Providing context to data (via links to other data)

15

BEST PRACTICE

16 of 69

The principles of Linked Data

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
  4. Include links to other URIs so that they can discover more things.

16

17 of 69

Web of Documents without the first two principles

Web pages without URLs

  • What web page are you talking about?
  • Where do I get the web page you are talking about?
  • How do I get the web page you are talking about?
  • ...

17

ID

Name

Stars

1235

Joaquin Phoenix

Joker

1234

Robert De Niro

Joker

...

18 of 69

Things as first-class citizens

18

Project

CZ.2.16/2.1.00/22189

Prague City

Prague Council

Prague Demography

Prague Budget

Contract

DIL/23/07/007302/2010

19 of 69

1. + 2. Use HTTP URIs as names for things

19

Project

CZ.2.16/2.1.00/22189

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/council

http://praha.eu/city

20 of 69

1. + 2. Use HTTP URIs as names for things

20

Project

CZ.2.16/2.1.00/22189

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/council

http://praha.eu/city

mfcr.cz (Ministry of Finance)

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

21 of 69

1. + 2. Use HTTP URIs as names for things

21

Project

CZ.2.16/2.1.00/22189

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/council

http://praha.eu/city

mfcr.cz (Ministry of Finance)

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

risy.cz (Regional Information Service)

http://risy.cz/

location/prague

http://risy.cz/contract/22189-01

http://risy.cz/

project/22189

czso.cz (Czech Statistical Office)

http://registry.

czso.cz/prague

http://czso.cz/

prague

http://czso.cz/prague/demogstat

22 of 69

The principles of Linked Data

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  • Include links to other URIs so that they can discover more things.

22

23 of 69

Web of Documents without the third principle

Web pages are in many formats, not only HTML

Thanks for the URI of your web page

  • In what language/format is your page?
  • Which software supports your language for pages?
  • How many browsers do you have?

… we all know this - how many times you click on a link and PDF/Word/Excel opens

23

24 of 69

RDF - Resource Description Framework - idea

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

24

Jakub Klímek studied at Charles University .

predicate

object

subject

Jakub Klímek Charles University

studied at

25 of 69

RDF - Resource Description Framework - data

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

25

Playground Revitalization

Name

Contracting�Authority

31.8.2011

Estimated end date

http://praha.eu/contract/7302

http://praha.eu/council

26 of 69

RDF - Resource Description Framework - URLs

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

26

Playground Revitalization

http://purl.org/dc/terms/title

http://purl.org/procurement/public-contracts#contractingAuthority

31.8.2011

http://purl.org/procurement/public-contracts#estimatedEndDate

http://praha.eu/contract/7302

http://praha.eu/council

27 of 69

RDF - Resource Description Framework - prefixes

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

27

Playground Revitalization

dcterms:title

pc:contractingAuthority

31.8.2011

pc:estimatedEndDate

http://praha.eu/contract/7302

http://praha.eu/council

28 of 69

RDF - Resource Description Framework - entities

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

28

Client

Playground Revitalization

Authority: Prague

Delivery date: 31.8.2011

Price: 28 444 000 CZK

...

Playground Revitalization

28444000

CZK

dcterms:title

pc:contracting

Authority

pc:agreedPrice

gr:hasCurrency

gr:hasCurrency

Value

31.8.2011

pc:estimatedEndDate

http://praha.eu/contract/7302

http://praha.eu/contract/7302

http://praha.eu/

contract/7302/price

http://praha.eu/council

29 of 69

RDF - serializations

RDF - graph based data model - a set of triples

Triple describes a relation as:

subject predicate object

2004 & 2014 W3C Recommendations

Triples are written in one of RDF notations / syntaxes:

RDF/XML, RDFa, N-Triples, Turtle, JSON-LD, N-Quads, TriG

29

Client

Playground Revitalization

Authority: Prague

Delivery date: 31.8.2011

Price: 28 444 000 CZK

...

http://praha.eu/contract/7302

<http://www.praha.eu/contract/7302>

dcterms:title "Playground Revitalization" ;

pc:estimatedEndDate "31.8.2011" ;

pc:agreedPrice <http://www.praha.eu/contract/7302/price> ;

pc:contractingAuthority <http://www.praha.eu/council> .

<http://www.praha.eu/contract/7302/price>

gr:hasCurrency "CZK" ;

gr:hasCurrencyValue "28444000" .

30 of 69

Technical detour: HTTP Accept header and URIs

30

Web browser

esfcr.cz

HTTP

(HTML)

http://esfcr.cz/.../projekt/

CZ10421016300169

Applications

HTTP

(RDF)

<http://esfcr.cz/data/projekt/CZ10421016300169>

esf:nazev "INNOSTART" ;

esf:registracni_cislo "CZ.1.04/2.1.01/63.00169" ;

esf:castka "4711681" ;

esf:realizace_od "2011-06-01" ;

esf:realizace_do "2013-03-31" ;

esf:realizator <http://esfcr.cz/.../25438352> ;

esf:partner <http://esfcr.cz/.../25438352> ;

esf:kontaktni_osoba <http://esfcr.cz/.../8541274571>;

esf:region <http://esfcr.cz/.../ustecky> .

http://esfcr.cz/.../projekt/

CZ10421016300169

31 of 69

The principles of Linked Data

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  • Include links to other URIs so that they can discover more things.

31

32 of 69

Web of Documents without the fourth principle

Web pages without links to other pages

  • Imagine a Wiki page with no links
  • Where do I get more information about the things mentioned in the article?
  • Is the thing mentioned in the article really the one I think it is?

32

33 of 69

4. Include links to other URIs (provide context)

33

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/city

mfcr.cz (Ministry of Finance)

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

risy.cz (Regional Information Service)

http://risy.cz/

location/prague

http://risy.cz/contract/22189-01

http://risy.cz/

project/22189

czso.cz (Czech Statistical Office)

http://registry.

czso.cz/prague

http://czso.cz/

prague

http://czso.cz/prague/demogstat

http://praha.eu/council

34 of 69

4. Include links to other URIs (provide context)

34

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/city

mfcr.cz (Ministry of Finance)

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

risy.cz (Regional Information Service)

http://risy.cz/

location/prague

http://risy.cz/contract/22189-01

http://risy.cz/

project/22189

czso.cz (Czech Statistical Office)

http://registry.

czso.cz/prague

http://czso.cz/

prague

http://czso.cz/prague/demogstat

c: hasBeneficiary

a:fundedBy

b:hasBudget

http://praha.eu/council

d:hasDemography

35 of 69

4. Include links to other URIs (provide context)

35

praha.eu (Prague)

http://praha.eu/contract/7302

http://praha.eu/city

mfcr.cz (Ministry of Finance)

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

risy.cz (Regional Information Service)

http://risy.cz/

location/prague

http://risy.cz/contract/22189-01

http://risy.cz/

project/22189

czso.cz (Czech Statistical Office)

http://registry.

czso.cz/prague

http://czso.cz/

prague

http://czso.cz/prague/demogstat

c: hasBeneficiary

a:fundedBy

b:hasBudget

http://praha.eu/council

d:hasDemography

owl:sameAs

owl:sameAs

36 of 69

The Web of Data

36

http://praha.eu/contract/7302

http://praha.eu/city

http://mfcr.cz/

prague/budget

http://mfcr.cz/

prague

http://risy.cz/

location/prague

http://risy.cz/contract/22189-01

http://risy.cz/

project/22189

http://registry.

czso.cz/prague

http://czso.cz/

prague

http://czso.cz/prague/demogstat

c: hasBeneficiary

a:fundedBy

b:hasBudget

http://praha.eu/council

d:hasDemography

owl:sameAs

owl:sameAs

37 of 69

Web of documents vs. Web of data (Linked Data)

37

Web of documents

Linked Data

HTML as document publication format

RDF as a data publication format

URL as a unique global document identifier

URL as a unique global entity identifier

HTTP protocol for accessing documents using their URL

HTTP protocol for accessing data about entities using their URL

Links to other documents

Links to other entities

vocabularies – standards for common data representation

38 of 69

Linked Data = Technical interoperability solution

38

HTTPAPI

FTP

SOAP�WSDL

TriG dump

IRI dereference

SPARQL

#LD

39 of 69

Web of Data

39

If the data from those web sites was published as Linked Data, getting the answer to the queries, e.g.

  • e.g. Money spent in Prague for new children playgrounds in the last 5 years per one child.

would have only one step:��1. Write the query answering the question

(in the worst case there would be a preceding “download data” step if federated querying was not supported)

Prague budget

Basic info about Prague

Prague public contracts

Demography of Prague

EU funded projects in Prague

40 of 69

The Web of Data - will it be successful?

40

41 of 69

The LOD cloud - 2007 - 12 datasets

41

42 of 69

The LOD cloud - 2008 - 32 datasets

42

43 of 69

The LOD cloud - 2009 - 89 datasets

43

44 of 69

The LOD cloud - 2010 - 203 datasets

44

45 of 69

The LOD cloud - 2011 - 295 datasets

45

46 of 69

The LOD cloud - 2014 - 570 datasets

46

47 of 69

The LOD cloud - 2017 - 1146 datasets

47

48 of 69

The LOD cloud - 2018 - 1229 datasets

48

49 of 69

The LOD cloud - 2019 - 1239 datasets

49

50 of 69

The LOD cloud - 2023

50

51 of 69

Vocabularies in the Web of Data

51

Web of documents

Linked Data

HTML as document publication format

RDF as a data publication format

URL as a unique global document identifier

URL as a unique global entity identifier

HTTP protocol for accessing documents using their URL

HTTP protocol for accessing data about entities using their URL

Links to other documents

Links to other entities

vocabularies – standards for common data representation

52 of 69

Vocabularies in the Web of Data

  • Documents specifying the meaning of predicates

  • http://purl.org/dc/terms/title
    • Name of a document
  • http://purl.org/procurement/public-contracts#contractingAuthority
    • Contracting Authority
  • http://purl.org/procurement/public-contracts#estimatedEndDate
    • Estimated end date

52

Playground Revitalization

http://purl.org/dc/terms/title

http://purl.org/procurement/public-contracts#contractingAuthority

31.8.2011

http://purl.org/procurement/public-contracts#estimatedEndDate

http://praha.eu/contract/7302

http://praha.eu/council

53 of 69

Vocabularies in the Web of Data

  • Documents specifying the meaning of predicates
  • Documents specifying the meaning of classes�
  • rdf:type
    • predicate saying an entity (subject) is of a type (object)
    • = entity belongs to a class (set)
  • pc:Contract
    • Class (set) of all public contracts

53

rdf:type

http://praha.eu/contract/7302

pc:Contract

object��Class

54 of 69

Linked Open Vocabularies

Catalog of vocabularies used on the Web of Data

Basic rule - vocabulary reuse

  • Schema.org
  • Dublin Core Vocabulary
  • Data Cube Vocabulary
  • Simple Knowledge Organization System (SKOS)

54

https://lov.linkeddata.es/dataset/lov/

55 of 69

LD is a prerequisite. Data Quality is what is missing

55

Linked Data + Core Vocabularies

Data Quality

56 of 69

How to start with Linked Data

  1. Analysis of your data
    • What data we have?
    • What can we publish?
    • What do we want to publish?
    • How is the data interconnected inside?
    • How is the data connected to other data?
  2. Structured domain description
    • Description of structure and semantics of the data
    • e.g. UML class diagrams
    • and detailed documentation

56

57 of 69

How to start with Linked Data

  • Ontology (vocabulary) design
    • Which existing vocabularies/ontologies cover our domain model?
    • Design vocabularies/ontologies for the parts not covered
    • Mapping of new concepts to existing ones
  • IRI patterns
    • Which domain?
    • Division of namespaces
    • Patterns in paths

57

58 of 69

How to start with Linked Data

  • Data export
    • Scripts for transformation of data to the target format using vocabularies
    • Linking of data to existing data in the Cloud
  • Data publishing
  • Registration in catalogs
  • Applications using the data
    • Can be done by someone else

58

59 of 69

How to publish Linked Data

59

  • SPARQL Endpoint
    • Web service
    • Endpoint to a database
    • Allows to query the dataset
    • https://data.gov.cz/sparql
  • Something in between

60 of 69

Triplestore implementations

Enterprise grade

Open-Source

60

61 of 69

Data transformation tools

  • Open-Source
    • LinkedPipes ETL

61

62 of 69

Linked Enterprise Data, Knowledge Graphs

  • Web of Documents inside an enterprise ~ Intranet
  • Web of Data inside an enterprise ~ Intranet of Data
  • The same LOD principles applied in an enterprise
  • Publishers ~ systems producing data
  • Consumers ~ systems consuming data
  • In addition
    • Ability to integrate data from the LOD cloud

62

63 of 69

Application of LD in industry = Knowledge Graphs

63

64 of 69

Application of LD in industry = Knowledge Graphs

64

65 of 69

Application of LD in industry = Knowledge Graphs

65

66 of 69

What next? Re-decentralization of Web

66

67 of 69

Solid - Social Linked Data

  • current project of Tim Berners-Lee
  • social linked data
  • set of conventions and tools for building decentralized social applications based on Linked Data principles
  • Users have freedom to choose
    • where their data resides and
    • who is allowed to access it
  • Inrupt
    • Company created by Tim Berners-Lee to bring Solid to production
    • Enterprise Solid Server

67

68 of 69

Open Data

68

69 of 69

Open data - 5 star classification�Legal and technical maturity of data

69