2 of 30

RDF is…

…to the Web of Data what HTML is to the Web of Documents
…(relatively) simple: everything is just triples
…a data model: it is not a file format!
…a logical formalism: it has a formal semantics
…more than XML or JSON: XML and JSON have a tree-based model, RDF has a graph-based model
…a Web standard: a W3C recommendation¹

¹https://www.w3.org/TR/rdf11-concepts/

3 of 30

A lingua franca for the Web of Data

Some data models are more suitable for some applications
However, having a common model for data exchange:

reduces entropy (less surprise, less energy used in processing information)
ensures interoperability

If everyone exchange data in a uniform model, energy can be put into:

Integrating data from different sources (merge operation, term disambiguation)
Writing processors
Validating processors
Running processors

Internally, an application may map RDF to a local data model, and vice-versa

4 of 30

RDF basics

The core features of RDF are:

identify things (resources)
Express relations between things (properties)

Additional important features are:

Assign data values to things (literals)
Organise things in categories (i.e., classes or types)
Add simple knowledge about categories and relations

There are other, less important features (discussed later)

5 of 30

Identify things

RDF is used to describe resources
A resource may be anything (a real or imaginary entity, abstract or concrete)
To describe a resource, it must be named or identified
On the Web, the identification mechanism must be uniform at Web scale: an identifier must identify the same thing everywhere on the Web
RDF uses Internationalized Resource Identifiers or IRIs (RFC 3987)

6 of 30

Internationalized Resource Identifiers

IRIs generalise URIs (Uniform Resource Identifiers, RFC 3986) by allowing any UNICODE characters
IRIs and URIs identify things but may be used as locators (i.e., as URLs) at the same time
Examples:

urn:ietf:rfc:3987
svn://yadiyada.foo.bar/
mailto:antoine.zimmermann@emse.fr
ftp://ftp.liris.fr/#meta
http://en.wikipedia.org/wiki/User:Wikiuser100
https://w3id.org/people/az/me
http://dbpedia.org/resource/Saint-Étienne

Reminder: to shorten notations, we use CURIEs, e.g., using prefix rdf: is for http://www.w3.org/1999/02/22-rdf-syntax-ns#

7 of 30

How to choose an IRI for something?

If possible, reuse an existing IRI from an authoritative source, e.g.:

from a national library for books (library of congress, BNF, BNL, DNB)
from a government website for a ministry
from a well known knowledge base like DBpedia or Wikidata

If not, make your own IRI:

use HTTP IRIs
use a namespace under your control
Cool URIs don’t change¹

Refer to the guide on Cool URIs for the Semantic Web²

¹W3C style guide for hypertext: https://www.w3.org/Provider/Style/URI

²W3C Interest group note: https://www.w3.org/TR/cooluris/

8 of 30

Relate things

RDF can express binary relations between things, such as “Laura loves Helmut”, “Steven works for Google Inc.”, “EMSE is located in Saint-Étienne”
This is written as a triple:

(subject, predicate, object)

where subject and object identify the resources in the relationship, and predicate identifies the relation
The predicate in an RDF triple is always an IRI

Example:

(http://example.org/data/Laura, subject� http://social.relations.com/loves, predicate� http://exmple.org/data/Helmut) object

9 of 30

Data values

As everything else, a data value (number, string, date) is a resource
A specific data value can be identified with a literal, a character string that represents the value
Every literal is typed such that its string representation can be interpreted as the correct value
E.g., "42" represents the number fourty two if this is of type decimal integer, but represents sixty six if it is an hexadecimal integer

10 of 30

RDF literals

An RDF literal has 2 or 3 components which are:

A lexical form which is a UNICODE string
A datatype IRI that can be any IRI
When the datatype IRI is rdf:langString, there is a language tag which is a BCP 47 tag

Usually, we use standard datatype IRIs from the xsd: namespace (XML Schema Datatypes) and the rdf: namespace
We will write literals "lexical form"^^datatypeIRI and when it is an rdf:langString, "lexical form"@langTag

Examples:

"42"^^xsd:integer� "THX 1138"^^xsd:string� "chat"@fr,"chat"@en� "<p>The <em>beautiful</em> literal!</p>"^^rdf:HTML

11 of 30

Unidentified resources

RDF can describe entities that are known to exist but whose identity is unknown (or is irrelevant/unimportant)
E.g., a book has at least an author, but they may not be known
The existence of a thing can be indicated in the subject or object position of a triple with a blank node
E.g. “something is in my bag”

There may be multiple things in my bag, but the blank node does not identify any of them. It just indicates the existence of something in it.

:inside

:myBag

12 of 30

Typical use of blank nodes: n-ary relations

RDF can only represent binary relations, in the form of RDF triples.
However, any n-ary relation can be encoded as a set of binary relations

:St-Étienne

174 082

:population

:value

2020

:year

171 057

:value

2015

:year

:population

13 of 30

RDF Serialisation (concrete syntaxes)

RDF is a data model with an abstract syntax but it must be encoded in a file format
There are several formats for RDF:

N-triples: one line = one triple (easy to parse)
Turtle: compact and readable notation
RDF/XML: XML-based format
JSON-LD: JSON-based format
RDFa: encoding RDF inside Web pages

In this tutorial, we will focus on the Turtle syntax

14 of 30

The Turtle syntax (1)

Full IRIs:

<http://www.example.com/test#this>

A simple triple:

<http://www.example.com/test#this>� <http://relations.example.com/in>� <http://www.example.com/test#box> .

Abbreviated IRIs (declare prefixes at the beginning of the file):

# This is a comment�@prefix ex: <http://www.example.com/test#> . # end dot!�PREFIX rel: <http://relations.example.com/> # alternative notation (no dot!)�# spaces, newlines, tabs separate elements of a triple:�ex:this rel:in ex:box . # dot ends statement

15 of 30

The Turtle syntax (2)

Literals:

ex:this rel:date "2019-09-13"^^xsd:date . # normal literal�ex:this rel:name "this"@en . # language-tagged literal�ex:this rel:code "TX32" . # xsd:string can be omitted�ex:this rel:number 42 . # xsd:integer (no quotes)�ex:this rel:sizeInMeters 3.75 . # xsd:decimal (use a dot)�ex:this rel:isGood true . # xsd:boolean�ex:this rel:isBorring false . # xsd:boolean

Blanknodes:

[] rel:in ex:box .�_:b1 rel:in ex:box . # a blank node identifier…�ex:me rel:likes _:b1 . # ...allows to reuse the same blank node

16 of 30

The Turtle syntax (3)

Repeat the same subject and predicate:

ex:box rel:contains ex:this .�ex:box rel:contains ex:that .�# can be written�ex:box rel:contains ex:this, ex:that . # comma

Repeat subject:

ex:this rel:date "2019-09-13"^^xsd:date;� rel:name "this"@en; # new lines are optional� rel:code "TX32";� rel:nextTo ex:that, ex:thoot, ex:thus .

17 of 30

The Turtle syntax (4)

More on blank nodes:

# assume prefixes are declared�ex:johnDoe rel:worksFor [� a ex:University; # the IRI rdf:type can be replaced by 'a'� rel:name "Berkley";� rel:locatedIn ex:California�] .

is the same as:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>�# other prefixes must be declared�ex:johnDoe rel:worksFor _:bnode .�_:bnode rdf:type ex:University . # 'a' and 'rdf:type' represents the same IRI�_:bnode rel:name "Berkley" .�_:bnode rel:locatedIn ex:California .

18 of 30

The Turtle syntax (5)

Declaring a base IRI:

@base <http://example.com/base/> . # ends with dot�BASE <http://example.com/base/> # alternative syntax (no dot!)�# prefixes must be declared�<bob> a vocab:Person; # relative IRI� rel:knows <claire> .�BASE <http://example.org/base2#> # base can be redefined�<bob> rel:knows <http://example.com/base/bob> . # different bobs

is the same as:

<http://example.com/base/bob> a vocab:Person;� rel:knows <http://example.com/base/claire> .�<http://example.org/base2#bob>� rel:knows <http://example.com/base/bob> .

19 of 30

RDF Vocabularies

An RDF vocabulary is a set of IRIs meant to be used in a certain way
The IRIs are usually declared to be of certain types (e.g., Property or Class)
The intended meaning and use is usually documented in a readable specification
E.g. {:part-of, :capital, :population, :value} is an RDF vocabulary where the terms are intended to be used in predicate positions and, e.g., :population should relate a place to a node denoting a population census, etc.

20 of 30

The RDFS vocabulary (1)

The RDF standard defines a generic vocabulary (the RDFS vocabulary) with, notably:

rdfs:label is a property that relates a thing to a short natural language name

:St-Étienne rdfs:label "Saint-Étienne"@fr .

rdfs:comment is a property that relates a thing to some text about the thing

:St-Étienne rdfs:comment "A city in France."@en .

rdf:type is a property that relates a thing to its type or class

:St-Étienne rdf:type :City . # equivalent to :St-Étienne a :City

21 of 30

The RDFS vocabulary (2)

The RDFS vocabulary can also declare vocabularies:

rdf:Property is the class of properties (properties are the terms usually used in predicate position)

:part-of a rdf:Property . # :part-of rdf:type rdf:Property

rdfs:Class is the class of classes (classes are usually used in object position of triples having rdf:type as predicate)

:City a rdfs:Class .

22 of 30

Working with multiple RDF graphs

Various operations can be done on multiple graphs:

Check (syntactic) equivalence
Check subgraph relation
Merge 2 graphs (or more) into 1
Store multiple graphs in one file or data structure

We will also see more operations on graphs in the next lectures

23 of 30

Isomorphism As Equivalence Relation

We employ isomorphism to check whether two RDF are structurally equivalent
Are those two graphs the same?

@prefix : <http://example.org/doc.ttl#> .�:St-Étienne :capital :Loire .�:St-Étienne :label "Saint-Étienne"@fr .

@prefix : <http://example.org/doc.ttl#> .�:St-Étienne :label "Saint-Étienne"@fr .�:St-Étienne :capital :Loire .

:Loire

:St-Étienne

:Loire

Saint-Étienne

:capital

:label

24 of 30

Isomorphism As Equivalence Relation

We employ isomorphism to check whether two RDF are structurally equivalent
Are those two graphs the same?

@prefix : <http://example.org/doc.ttl#> .�:St-Étienne :population _:b1 .�_:b1 :value 174082 .�_:b1 :year 2020 .

@prefix : <http://example.org/doc.ttl#> .�:St-Étienne :population _:node1 .�_:node1 :year 2020 .�_:node1 :value 174082 .

:St-Étienne

:population

174 082

:value

2020

:year

:St-Étienne

:population

174 082

:year

2020

:value

25 of 30

Graph and subgraph isomorphism complexity

From graph theory, we know that the graph isomorphism problem (GIP):

belongs to the GI complexity class (solvable in quasi-polynomial time 2^{O(log(n)^c)})
but it is not know if GI = P or GI = NP or P < GI < NP

However, the subgraph isomorphism problem (SGIP) is NP-complete
The RDF graph isomorphism problem can be polynomially reduced to GIP
The RDF subgraph isomorphism problem can be polynomially reduced to SGIP

26 of 30

Merging RDF graphs

When merging 2 graphs, the blank nodes of the 2 graphs must be made distinct

:St-Étienne

:population

174 082

:value

2020

:year

:St-Étienne

:population

171 057

:value

2015

:year

:St-Étienne

:population

171 057

:value

2015

:year

:population

174 082

:value

2020

:year

27 of 30

Encoding and storing multiple graphs

Multiple RDF graphs can be stored in multiple files
However, it is often convenient to work with a single data structure or file encoding multiple graphs
An RDF dataset is an abstract structure for doing this:
An RDF dataset comprises:

a distinguished RDF graph called the default graph
a finite set of named graphs (n,G) where n is an IRI or a blank node called the graph name and G is an RDF graph, such that no name appears more than once

28 of 30

Formats for encoding RDF datasets

RDF datasets can be serialised in these formats:

N-Quads: generalise N-triples
TriG: generalises Turtle
JSON-LD: natively supports RDF datasets

In addition, all triplestores (RDF databases) use RDF datasets as their internal data structure
Multiple graphs can be queried in triplestores

29 of 30

Summary

RDF is a web standard for representing knowledge graphs
RDF relies on a world-wide uniform identification mechanism called IRIs
RDF can represent data values, using literals
RDF can describe resources that are known to exist but are not identified
By using HTTP IRIs, RDF can implement Linked Data
RDF has various data formats that allows one:

to encode RDF graphs in files
to conform to other data models like XML (RDF/XML) and JSON (JSON-LD)
to embed RDF graphs in web pages (RDFa, JSON-ld)
to encode multiple RDF graphs at the same time

30 of 30

Licensing and acknowledgements

These slides were prepared by Antoine Zimmermann

Some parts are inspired by the slides by Tobias Käfer, Andreas Harth, and Lars Heling under Creative Commons Attribution 4.0 available at https://ai4industry.sciencesconf.org/data/Distributed_Knowledge_Graphs_Pt._2_RDF.pdf

These slides are themselves provided under the same license: CC-BY 4.0 https://creativecommons.org/licenses/by/4.0/