1 of 30

The Resource Description Framework (RDF)

AI4Industry Summer School, July 2023

Antoine Zimmermann, EMSE

2 of 30

RDF is…

  • …to the Web of Data what HTML is to the Web of Documents
  • …(relatively) simple: everything is just triples
  • …a data model: it is not a file format!
  • …a logical formalism: it has a formal semantics
  • …more than XML or JSON: XML and JSON have a tree-based model, RDF has a graph-based model
  • …a Web standard: a W3C recommendation1

1 https://www.w3.org/TR/rdf11-concepts/

3 of 30

A lingua franca for the Web of Data

  • Some data models are more suitable for some applications
  • However, having a common model for data exchange:
    • reduces entropy (less surprise, less energy used in processing information)
    • ensures interoperability
  • If everyone exchange data in a uniform model, energy can be put into:
    • Integrating data from different sources (merge operation, term disambiguation)
    • Writing processors
    • Validating processors
    • Running processors
  • Internally, an application may map RDF to a local data model, and vice-versa

4 of 30

RDF basics

  • The core features of RDF are:
    • identify things (resources)
    • Express relations between things (properties)
  • Additional important features are:
    • Assign data values to things (literals)
    • Organise things in categories (i.e., classes or types)
    • Add simple knowledge about categories and relations
  • There are other, less important features (discussed later)

5 of 30

Identify things

  • RDF is used to describe resources
  • A resource may be anything (a real or imaginary entity, abstract or concrete)
  • To describe a resource, it must be named or identified
  • On the Web, the identification mechanism must be uniform at Web scale: an identifier must identify the same thing everywhere on the Web
  • RDF uses Internationalized Resource Identifiers or IRIs (RFC 3987)

6 of 30

Internationalized Resource Identifiers

  • IRIs generalise URIs (Uniform Resource Identifiers, RFC 3986) by allowing any UNICODE characters
  • IRIs and URIs identify things but may be used as locators (i.e., as URLs) at the same time
  • Examples:
    • urn:ietf:rfc:3987
    • svn://yadiyada.foo.bar/
    • mailto:antoine.zimmermann@emse.fr
    • ftp://ftp.liris.fr/#meta
    • http://en.wikipedia.org/wiki/User:Wikiuser100
    • https://w3id.org/people/az/me
    • http://dbpedia.org/resource/Saint-Étienne
  • Reminder: to shorten notations, we use CURIEs, e.g., using prefix rdf: is for http://www.w3.org/1999/02/22-rdf-syntax-ns#

7 of 30

How to choose an IRI for something?

  • If possible, reuse an existing IRI from an authoritative source, e.g.:
    • from a national library for books (library of congress, BNF, BNL, DNB)
    • from a government website for a ministry
    • from a well known knowledge base like DBpedia or Wikidata
  • If not, make your own IRI:
    • use HTTP IRIs
    • use a namespace under your control
    • Cool URIs don’t change1
  • Refer to the guide on Cool URIs for the Semantic Web2

1W3C style guide for hypertext: https://www.w3.org/Provider/Style/URI

2W3C Interest group note: https://www.w3.org/TR/cooluris/

8 of 30

Relate things

  • RDF can express binary relations between things, such as “Laura loves Helmut”, “Steven works for Google Inc.”, “EMSE is located in Saint-Étienne”
  • This is written as a triple:

(subject, predicate, object)

  • where subject and object identify the resources in the relationship, and predicate identifies the relation
  • The predicate in an RDF triple is always an IRI

Example:

(http://example.org/data/Laura, subjecthttp://social.relations.com/loves, predicatehttp://exmple.org/data/Helmut) object

9 of 30

Data values

  • As everything else, a data value (number, string, date) is a resource
  • A specific data value can be identified with a literal, a character string that represents the value
  • Every literal is typed such that its string representation can be interpreted as the correct value
  • E.g., "42" represents the number fourty two if this is of type decimal integer, but represents sixty six if it is an hexadecimal integer

10 of 30

RDF literals

  • An RDF literal has 2 or 3 components which are:
    • A lexical form which is a UNICODE string
    • A datatype IRI that can be any IRI
    • When the datatype IRI is rdf:langString, there is a language tag which is a BCP 47 tag
  • Usually, we use standard datatype IRIs from the xsd: namespace (XML Schema Datatypes) and the rdf: namespace
  • We will write literals "lexical form"^^datatypeIRI and when it is an rdf:langString, "lexical form"@langTag

Examples:

"42"^^xsd:integer"THX 1138"^^xsd:string"chat"@fr,"chat"@en"<p>The <em>beautiful</em> literal!</p>"^^rdf:HTML

11 of 30

Unidentified resources

  • RDF can describe entities that are known to exist but whose identity is unknown (or is irrelevant/unimportant)
  • E.g., a book has at least an author, but they may not be known
  • The existence of a thing can be indicated in the subject or object position of a triple with a blank node
  • E.g. “something is in my bag”

  • There may be multiple things in my bag, but the blank node does not identify any of them. It just indicates the existence of something in it.

:inside

:myBag

12 of 30

Typical use of blank nodes: n-ary relations

  • RDF can only represent binary relations, in the form of RDF triples.
  • However, any n-ary relation can be encoded as a set of binary relations

:St-Étienne

174 082

:population

:value

2020

:year

171 057

:value

2015

:year

:population

13 of 30

RDF Serialisation (concrete syntaxes)

  • RDF is a data model with an abstract syntax but it must be encoded in a file format
  • There are several formats for RDF:
    • N-triples: one line = one triple (easy to parse)
    • Turtle: compact and readable notation
    • RDF/XML: XML-based format
    • JSON-LD: JSON-based format
    • RDFa: encoding RDF inside Web pages
  • In this tutorial, we will focus on the Turtle syntax

14 of 30

The Turtle syntax (1)

  • Full IRIs:

<http://www.example.com/test#this>

  • A simple triple:

<http://www.example.com/test#this>� <http://relations.example.com/in>� <http://www.example.com/test#box> .

  • Abbreviated IRIs (declare prefixes at the beginning of the file):

# This is a comment@prefix ex: <http://www.example.com/test#> . # end dot!PREFIX rel: <http://relations.example.com/> # alternative notation (no dot!)�# spaces, newlines, tabs separate elements of a triple:ex:this rel:in ex:box . # dot ends statement

15 of 30

The Turtle syntax (2)

  • Literals:

ex:this rel:date "2019-09-13"^^xsd:date . # normal literal�ex:this rel:name "this"@en . # language-tagged literal�ex:this rel:code "TX32" . # xsd:string can be omitted�ex:this rel:number 42 . # xsd:integer (no quotes)�ex:this rel:sizeInMeters 3.75 . # xsd:decimal (use a dot)�ex:this rel:isGood true . # xsd:boolean�ex:this rel:isBorring false . # xsd:boolean

  • Blanknodes:

[] rel:in ex:box .�_:b1 rel:in ex:box . # a blank node identifier…�ex:me rel:likes _:b1 . # ...allows to reuse the same blank node

16 of 30

The Turtle syntax (3)

  • Repeat the same subject and predicate:

ex:box rel:contains ex:this .ex:box rel:contains ex:that .# can be written�ex:box rel:contains ex:this, ex:that . # comma

  • Repeat subject:

ex:this rel:date "2019-09-13"^^xsd:date;rel:name "this"@en; # new lines are optional� rel:code "TX32";rel:nextTo ex:that, ex:thoot, ex:thus .

17 of 30

The Turtle syntax (4)

  • More on blank nodes:

# assume prefixes are declaredex:johnDoe rel:worksFor [a ex:University; # the IRI rdf:type can be replaced by 'a'rel:name "Berkley";rel:locatedIn ex:California] .

  • is the same as:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#># other prefixes must be declared�ex:johnDoe rel:worksFor _:bnode ._:bnode rdf:type ex:University . # 'a' and 'rdf:type' represents the same IRI�_:bnode rel:name "Berkley" ._:bnode rel:locatedIn ex:California .

18 of 30

The Turtle syntax (5)

  • Declaring a base IRI:

@base <http://example.com/base/> . # ends with dotBASE <http://example.com/base/> # alternative syntax (no dot!)# prefixes must be declared�<bob> a vocab:Person; # relative IRI� rel:knows <claire> .BASE <http://example.org/base2#> # base can be redefined�<bob> rel:knows <http://example.com/base/bob> . # different bobs

  • is the same as:

<http://example.com/base/bob> a vocab:Person;rel:knows <http://example.com/base/claire> .<http://example.org/base2#bob>rel:knows <http://example.com/base/bob> .

19 of 30

RDF Vocabularies

  • An RDF vocabulary is a set of IRIs meant to be used in a certain way
  • The IRIs are usually declared to be of certain types (e.g., Property or Class)
  • The intended meaning and use is usually documented in a readable specification
  • E.g. {:part-of, :capital, :population, :value} is an RDF vocabulary where the terms are intended to be used in predicate positions and, e.g., :population should relate a place to a node denoting a population census, etc.

20 of 30

The RDFS vocabulary (1)

The RDF standard defines a generic vocabulary (the RDFS vocabulary) with, notably:

  • rdfs:label is a property that relates a thing to a short natural language name

:St-Étienne rdfs:label "Saint-Étienne"@fr .

  • rdfs:comment is a property that relates a thing to some text about the thing

:St-Étienne rdfs:comment "A city in France."@en .

  • rdf:type is a property that relates a thing to its type or class

:St-Étienne rdf:type :City . # equivalent to :St-Étienne a :City

21 of 30

The RDFS vocabulary (2)

The RDFS vocabulary can also declare vocabularies:

  • rdf:Property is the class of properties (properties are the terms usually used in predicate position)

:part-of a rdf:Property . # :part-of rdf:type rdf:Property

  • rdfs:Class is the class of classes (classes are usually used in object position of triples having rdf:type as predicate)

:City a rdfs:Class .

22 of 30

Working with multiple RDF graphs

  • Various operations can be done on multiple graphs:
    • Check (syntactic) equivalence
    • Check subgraph relation
    • Merge 2 graphs (or more) into 1
    • Store multiple graphs in one file or data structure
  • We will also see more operations on graphs in the next lectures

23 of 30

Isomorphism As Equivalence Relation

  • We employ isomorphism to check whether two RDF are structurally equivalent
  • Are those two graphs the same?

@prefix : <http://example.org/doc.ttl#> .:St-Étienne :capital :Loire .:St-Étienne :label "Saint-Étienne"@fr .

@prefix : <http://example.org/doc.ttl#> .:St-Étienne :label "Saint-Étienne"@fr .�:St-Étienne :capital :Loire .

:Loire

:St-Étienne

:St-Étienne

:Loire

Saint-Étienne

Saint-Étienne

:capital

:capital

:label

:label

24 of 30

Isomorphism As Equivalence Relation

  • We employ isomorphism to check whether two RDF are structurally equivalent
  • Are those two graphs the same?

@prefix : <http://example.org/doc.ttl#> .:St-Étienne :population _:b1 ._:b1 :value 174082 .�_:b1 :year 2020 .

@prefix : <http://example.org/doc.ttl#> .:St-Étienne :population _:node1 ._:node1 :year 2020 .�_:node1 :value 174082 .

:St-Étienne

:population

174 082

:value

2020

:year

:St-Étienne

:population

174 082

:year

2020

:value

25 of 30

Graph and subgraph isomorphism complexity

  • From graph theory, we know that the graph isomorphism problem (GIP):
    • belongs to the GI complexity class (solvable in quasi-polynomial time 2O(log(n)^c))
    • but it is not know if GI = P or GI = NP or P < GI < NP
  • However, the subgraph isomorphism problem (SGIP) is NP-complete
  • The RDF graph isomorphism problem can be polynomially reduced to GIP
  • The RDF subgraph isomorphism problem can be polynomially reduced to SGIP

26 of 30

Merging RDF graphs

  • When merging 2 graphs, the blank nodes of the 2 graphs must be made distinct

:St-Étienne

:population

174 082

:value

2020

:year

+

=

:St-Étienne

:population

171 057

:value

2015

:year

:St-Étienne

:population

171 057

:value

2015

:year

:population

174 082

:value

2020

:year

27 of 30

Encoding and storing multiple graphs

  • Multiple RDF graphs can be stored in multiple files
  • However, it is often convenient to work with a single data structure or file encoding multiple graphs
  • An RDF dataset is an abstract structure for doing this:
  • An RDF dataset comprises:
    • a distinguished RDF graph called the default graph
    • a finite set of named graphs (n,G) where n is an IRI or a blank node called the graph name and G is an RDF graph, such that no name appears more than once

28 of 30

Formats for encoding RDF datasets

  • RDF datasets can be serialised in these formats:
    • N-Quads: generalise N-triples
    • TriG: generalises Turtle
    • JSON-LD: natively supports RDF datasets
  • In addition, all triplestores (RDF databases) use RDF datasets as their internal data structure
  • Multiple graphs can be queried in triplestores

29 of 30

Summary

  • RDF is a web standard for representing knowledge graphs
  • RDF relies on a world-wide uniform identification mechanism called IRIs
  • RDF can represent data values, using literals
  • RDF can describe resources that are known to exist but are not identified
  • By using HTTP IRIs, RDF can implement Linked Data
  • RDF has various data formats that allows one:
    • to encode RDF graphs in files
    • to conform to other data models like XML (RDF/XML) and JSON (JSON-LD)
    • to embed RDF graphs in web pages (RDFa, JSON-ld)
    • to encode multiple RDF graphs at the same time

30 of 30

Licensing and acknowledgements

These slides were prepared by Antoine Zimmermann

Some parts are inspired by the slides by Tobias Käfer, Andreas Harth, and Lars Heling under Creative Commons Attribution 4.0 available at https://ai4industry.sciencesconf.org/data/Distributed_Knowledge_Graphs_Pt._2_RDF.pdf

These slides are themselves provided under the same license: CC-BY 4.0 https://creativecommons.org/licenses/by/4.0/

30