1 of 57

Hierarchical data formats:�XPath, XSLT

Jakub Klímek

2 of 57

XML Path Language - XPath

2

3 of 57

XPath - example

<?xml version="1.0" encoding="UTF-8"?>

<catalog>

<title xml:lang="en">My catalog</title>

<title xml:lang="cs">Můj katalog</title>

<description xml:lang="en">This is my dummy catalog</description>

<description xml:lang="cs">Toto je můj falešný katalog</description>

<contact-point>

<name xml:lang="en">John Doe</name>

<e-mail>mailto:john@doe.org</e-mail>

</contact-point>

<datasets>

<dataset>

<title xml:lang="en">Bikesharing in Brno</title>

<title xml:lang="cs">Sdílení kol v Brně</title>

<distributions>

<distribution>

<media-type>application/xml</media-type>

<downloadURL>http://brno.cz/myfile.xml</downloadURL>

</distribution>

<distribution>

<accessService>

<endpointURL>https://brno.cz/myAPI</endpointURL>

<title xml:lang="en">My API</title>

</accessService>

</distribution>

</distributions>

</dataset>

<dataset>

<title xml:lang="en">Bikesharing in Prague</title>

<title xml:lang="cs">Sdílení kol v Praze</title>

<distributions>

<distribution>

<title xml:lang="en">CSV</title>

<media-type>text/csv</media-type>

<downloadURL>http://praha.eu/myfile.csv</downloadURL>

</distribution>

</distributions>

</dataset>

</datasets>

</catalog>

/catalog/datasets/dataset/title

  • Element: title
  • Element: title
  • Element: title
  • Element: title

3

/catalog/datasets/dataset/title/text()

  • Text: Bikesharing in Brno
  • Text: Sdílení kol v Brně
  • Text: Bikesharing in Prague
  • Text: Sdílení kol v Praze

/catalog/datasets/dataset/title[@xml:lang="en"]/text()

  • Text: Bikesharing in Brno
  • Text: Bikesharing in Prague

4 of 57

XPath - specifications

  • XML Path Language (XPath) 1.0
    • W3C Recommendation 1999
    • what we will cover mostly
  • XML Path Language (XPath) 2.0 (Second Edition)
    • W3C Recommendation 2010
    • most widely implemented
  • XML Path Language (XPath) 3.1
    • W3C Recommendation 2017

4

5 of 57

XML Infoset and XPath Data Model

XML Information Set (Second Edition)

  • W3C Recommendation (Second Edition) 2004
  • Set of definitions for referring to information in well-formed XML documents
  • Created with no particular processing language in mind

XQuery and XPath Data Model 3.1

  • W3C Recommendation 2017
    • For XPath 2.0 2001
  • Based on XML Infoset
  • Created for use in XPath, XSLT and XQuery
  • Includes support for information coming from XML Schemas
    • e.g. types of text values

5

document

root�<coffeeShops>

attribute�number

element�<coffeeShop>

element�<name>

element�<employees>

element�<employee>

text�"Place"

text�2

element�<name>

element�<age>

6 of 57

XPath Data Model

<?xml version="1.0" encoding="UTF-8"?>

<catalog>

<title xml:lang="en">My catalog</title>

<title xml:lang="cs">Můj katalog</title>

<description xml:lang="en">This is my dummy catalog</description>

<description xml:lang="cs">Toto je můj falešný katalog</description>

<contact-point>

<name xml:lang="en">John Doe</name>

<e-mail>mailto:john@doe.org</e-mail>

</contact-point>

<datasets>

<dataset>

<title xml:lang="en">Bikesharing in Brno</title>

<title xml:lang="cs">Sdílení kol v Brně</title>

<distributions>

<distribution>

<media-type>application/xml</media-type>

<downloadURL>http://brno.cz/myfile.xml</downloadURL>

</distribution>

<distribution>

<accessService>

<endpointURL>https://brno.cz/myAPI</endpointURL>

<title xml:lang="en">My API</title>

</accessService>

</distribution>

</distributions>

</dataset>

<dataset>

<title xml:lang="en">Bikesharing in Prague</title>

<title xml:lang="cs">Sdílení kol v Praze</title>

<distributions>

<distribution>

<title xml:lang="en">CSV</title>

<media-type>text/csv</media-type>

<downloadURL>http://praha.eu/myfile.csv</downloadURL>

</distribution>

</distributions>

</dataset>

</datasets>

</catalog>

6

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

7 of 57

XPath Data Model

XPath types of nodes

  • document (root)
  • element
  • attribute
  • text
  • comment
  • namespace
  • processing instruction

7

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

8 of 57

XPath basic examples

8

9 of 57

XPath - example

/

9

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

10 of 57

XPath - example

/catalog

10

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

11 of 57

XPath - absolute path

/catalog/datasets

11

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

Absolute path�/step1/.../stepN

12 of 57

XPath - result set of nodes

/catalog/datasets/dataset

12

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

Result is a set of nodes�(in no explicit order)

13 of 57

XPath - result set of nodes

/catalog/datasets/dataset/title

13

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distribution>

element�<distributions>

element�<distribution>

14 of 57

XPath - access function

/catalog/datasets/dataset/title/text()

14

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

access function

text()

15 of 57

XPath - attribute

/catalog/datasets/dataset/title/@xml:lang

15

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

@attribute

16 of 57

XPath - predicate

/catalog/datasets/dataset/title[@xml:lang="en"]/text()

16

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

[predicate]

�logical expression

17 of 57

XPath - relative path

title[@xml:lang="en"]/text()

17

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

Relative path�step1/.../stepN

Needs a starting point

18 of 57

XPath - axes - child

/catalog/datasets

/child::catalog/child::datasets

18

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

Default axis can be omitted�

child

XPath path step�

axis::node-test [predicate1] ... [predicateN]

19 of 57

XPath - axes - child

/catalog/child::*

/catalog/*

19

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

Children do not contain�attributes or text nodes

20 of 57

XPath - axes - descendant

/catalog/descendant::title

20

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

21 of 57

XPath - axes - descendant

/catalog/descendant::*

21

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

22 of 57

XPath - axes - attribute

/catalog/descendant::*/attribute::*

/catalog/descendant::*/@*

22

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

23 of 57

XPath - axes - preceding-sibling

/catalog/title/preceding-sibling::title/text()

23

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

24 of 57

XPath - axes - descendant-or-self

/catalog/descendant-or-self::title

/catalog//title

24

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

25 of 57

XPath - axes - self, parent

Starting in dataset

.

self::node()

..

parent::node()

25

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

26 of 57

XPath - document order

<?xml version="1.0" encoding="UTF-8"?>

<catalog>

<title xml:lang="en">My catalog</title>

<title xml:lang="cs">Můj katalog</title>

<description xml:lang="en">This is my dummy catalog</description>

<description xml:lang="cs">Toto je můj falešný katalog</description>

<contact-point>

<name xml:lang="en">John Doe</name>

<e-mail>mailto:john@doe.org</e-mail>

</contact-point>

<datasets>

<dataset>

<title xml:lang="en">Bikesharing in Brno</title>

<title xml:lang="cs">Sdílení kol v Brně</title>

<distributions>

<distribution>

<media-type>application/xml</media-type>

<downloadURL>http://brno.cz/myfile.xml</downloadURL>

</distribution>

<distribution>

<accessService>

<endpointURL>https://brno.cz/myAPI</endpointURL>

<title xml:lang="en">My API</title>

</accessService>

</distribution>

</distributions>

</dataset>

<dataset>

<title xml:lang="en">Bikesharing in Prague</title>

<title xml:lang="cs">Sdílení kol v Praze</title>

<distributions>

<distribution>

<title xml:lang="en">CSV</title>

<media-type>text/csv</media-type>

<downloadURL>http://praha.eu/myfile.csv</downloadURL>

</distribution>

</distributions>

</dataset>

</datasets>

</catalog>

26

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

according to the position of start tags of elements.

0

1

2

3

4

8

9

10

11

12

13

20

27 of 57

XPath - all axes

27

ancestor

descendant

following

preceding

following-sibling

preceding-sibling

child

attribute

namespace

self

parent

28 of 57

XPath - name()

/catalog/datasets/name()

28

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

29 of 57

XPath - position()

/catalog/datasets/dataset/position()

  • 1 (xs:integer)
  • 2 (xs:integer)

29

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

30 of 57

XPath - position()

/catalog/datasets/dataset[position() = 2]

/catalog/datasets/dataset[2]

30

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

31 of 57

XPath - last()

/catalog/datasets/dataset[last()]

/catalog/datasets/dataset[position() = last()]

31

document�(root)

element�<catalog>

element�<title>

text�"My catalog"

element�<title>

element�<description>

element�<datasets>

attribute�"xml:lang"

"en"

text�"Můj katalog"

attribute�"xml:lang"

"cs"

element�<dataset>

element�<dataset>

element�<title>

text�"Bikesharing in Brno"

element�<title>

attribute�"xml:lang"

"en"

text�"Sdílení kol v Brně"

attribute�"xml:lang"

"cs"

element�<distributions>

element�<distribution>

32 of 57

32

Axes

self

child

descendant

parent

ancestor

attribute

following(-siblings)

preceding(-siblings)

Node tests

name

node with particular name

*

node with arbitrary name

node()

any node

text()

any text node

Abbreviations

/

/child::

/@

/attribute::

/.

/self::node()

/..

/parent::node()

//

/descendant-or-self::node()/

Functions

position()

position of node in the result

last()

position of the last node in the result

count()

number of nodes in the result

normalize-space()

normalization of white spaces

name()

name of node

33 of 57

XPath - common errors

“Select car rental companies in Hawaii which offer at least one cabrio"

/rental[state="Hawaii"]/offer/car[type="cabrio"]

�Correct:

/rental[state="Hawaii" and offer/car[type="cabrio"]]

33

Wrong: returns cars

34 of 57

XPath - common errors

“Select the last section in the book."

//section[last()]

/descendant-or-self::node()/section[last()]

�Correct: /descendant::section[last()]

<book>

<chapter>

<section></section>

</chapter>

<chapter>

<section>

<section></section>

<section>

<section></section>

</section>

</section>

<section></section>

</chapter>

</book>

34

Wrong: returns the last section in each chapter / section

35 of 57

Some XPath 2.0 features

  • result is a sequence (ordered)�("a", 2, "c")[3] results in "c"�(1 to 10)[7] results in 7

  • conditional expressions�if (count(//dataset) > 1) then "Datasets" else "Dataset"��//dataset[some $title in title satisfies $title/@xml:lang="en"]
  • for cycles�for $dataset in //dataset return count($dataset//distribution)

  • comments�(:comment:)

35

36 of 57

  • mapping operator !

//dataset ! count(descendant::distribution)

  • 1
  • 2

  • string concatenation operator ||�//dataset[1]/title[@xml:lang="en"] || //dataset[1]/title[@xml:lang="cs"]�concat(//dataset[1]/title[@xml:lang="en"], //dataset[1]/title[@xml:lang="cs"])
  • functions chaining (to avoid deep nesting)

upper-case(/descendant::dataset[1]/title[1])

/descendant::dataset[1]/title[1] => upper-case()

36

37 of 57

XSL Transformations - XSLT

37

38 of 57

XSLT - example

<?xml version="1.0" encoding="UTF-8"?>

<catalog>

<title xml:lang="en">My catalog</title>

<title xml:lang="cs">Můj katalog</title>

<description xml:lang="en">This is my dummy catalog</description>

<description xml:lang="cs">Toto je můj falešný katalog</description>

<contact-point>

<name xml:lang="en">John Doe</name>

<e-mail>mailto:john@doe.org</e-mail>

</contact-point>

<datasets>

<dataset>

<title xml:lang="en">Bikesharing in Brno</title>

<title xml:lang="cs">Sdílení kol v Brně</title>

<distributions>

<distribution>

<media-type>application/xml</media-type>

<downloadURL>http://brno.cz/myfile.xml</downloadURL>

</distribution>

<distribution>

<accessService>

<endpointURL>https://brno.cz/myAPI</endpointURL>

<title xml:lang="en">My API</title>

</accessService>

</distribution>

</distributions>

</dataset>

<dataset>

<title xml:lang="en">Bikesharing in Prague</title>

<title xml:lang="cs">Sdílení kol v Praze</title>

<distributions>

<distribution>

<title xml:lang="en">CSV</title>

<media-type>text/csv</media-type>

<downloadURL>http://praha.eu/myfile.csv</downloadURL>

</distribution>

</distributions>

</dataset>

</datasets>

</catalog>

<html xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

<h2>Bikesharing in Brno</h2>

<p>Number of distributions: 2</p>

<h2>Bikesharing in Prague</h2>

<p>Number of distributions: 1</p>

</body>

</html>

38

XSLT

39 of 57

XSLT - example

<?xml version="1.0" encoding="UTF-8"?>

<catalog>

<title xml:lang="en">My catalog</title>

<title xml:lang="cs">Můj katalog</title>

<description xml:lang="en">This is my dummy catalog</description>

<description xml:lang="cs">Toto je můj falešný katalog</description>

<contact-point>

<name xml:lang="en">John Doe</name>

<e-mail>mailto:john@doe.org</e-mail>

</contact-point>

<datasets>

<dataset>

<title xml:lang="en">Bikesharing in Brno</title>

<title xml:lang="cs">Sdílení kol v Brně</title>

<distributions>

<distribution>

<media-type>application/xml</media-type>

<downloadURL>http://brno.cz/myfile.xml</downloadURL>

</distribution>

<distribution>

<accessService>

<endpointURL>https://brno.cz/myAPI</endpointURL>

<title xml:lang="en">My API</title>

</accessService>

</distribution>

</distributions>

</dataset>

<dataset>

<title xml:lang="en">Bikesharing in Prague</title>

<title xml:lang="cs">Sdílení kol v Praze</title>

<distributions>

<distribution>

<title xml:lang="en">CSV</title>

<media-type>text/csv</media-type>

<downloadURL>http://praha.eu/myfile.csv</downloadURL>

</distribution>

</distributions>

</dataset>

</datasets>

</catalog>

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes"/>

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset">

<h2>

<xsl:value-of select="title[@xml:lang='en']"/>

</h2>

<p>Number of distributions: <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

<xsl:template match="text()">

<xsl:apply-templates/>

</xsl:template>

</xsl:stylesheet>

39

40 of 57

XSLT - Specifications

  • XSL Transformations (XSLT) Version 1.0
    • W3C Recommendation, 1999
    • what we will cover mostly
  • XSL Transformations (XSLT) Version 2.0
    • W3C Recommendation, 2007
    • most widely implemented
  • XSL Transformations (XSLT) Version 3.0
    • W3C Recommendation, 2017

40

41 of 57

XSLT principles - stylesheet, template, processor

Input

  • one or more XML documents

Output

  • one or more text files
    • XML, HTML
    • RDF Turtle
    • TXT
    • ...

XSLT stylesheet

  • is an XML document
  • stylesheet root element
  • set of templates

XSLT template

  • matches part of input XML document using XPath expressions
  • produces output text

XSLT processor

  • goes through an input XML document
  • tries to match templates

41

42 of 57

XSLT - empty stylesheet

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

</xsl:stylesheet>

version attribute - version of XSLT used

xsl:output - specifies the output behavior of the XSLT processor

  • method
    • html, xhtml, xml - produces well-formed documents
    • text - pure text output
  • indent
    • yes - generates correct indentation for xml, html
    • no - only explicitly generated whitespace included in output

42

43 of 57

XSLT - first template

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

match - contains XPath expression which needs to match

xsl:template

  • content goes to the output
    • here, we generate the HTML stub
  • xsl: elements get processed

e.g xsl:value-of

  • select attribute contains XPath expression
  • result of the expression replaces the xsl:value-of element in the output

43

44 of 57

XSLT - first template

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

44

<html xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

</body>

</html>

  • output is indented
  • whitespace is normalized
  • encoding indicated in the head

45 of 57

XSLT - second template

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">...</xsl:template>

<xsl:template match="dataset">

<h2>

<xsl:value-of select="title[@xml:lang='en']"/>

</h2>

<p>

Number of distributions:� <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

</xsl:stylesheet>

45

<html xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

</body>

</html>

Nothing new in the output… why?

46 of 57

XSLT - apply templates

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset">...</xsl:template>

</xsl:stylesheet>

46

<htmlxmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

My catalog

Můj katalog

This is my dummy catalog

Toto je můj falešný katalog

John Doe

mailto:john@doe.org

<h2>Bikesharing in Brno</h2><p>Number of distributions: 2</p>

<h2>Bikesharing in Prague</h2><p>Number of distributions: 1</p>

</body>

</html>

🤷🏻‍♂️

47 of 57

XSLT - implicit templates

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:template match="*|/">

<xsl:apply-templates/>

</xsl:template>�

<xsl:template match="text()|@*">

<xsl:value-of select="."/>

</xsl:template>�

<xsl:template match="processing-instruction()|comment()"/>�

</xsl:stylesheet>

  • Present implicitly - need to be overridden
  • Result in text from elements and attributes to be copied to output

47

48 of 57

XSLT - implicit templates

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset">

<h2>

<xsl:value-of select="title[@xml:lang='en']"/>

</h2>

<p>Number of distributions: <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

<xsl:template match="text()"/>

</xsl:stylesheet>

48

<html xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

<h2>Bikesharing in Brno</h2>

<p>Number of distributions: 2</p>

<h2>Bikesharing in Prague</h2>

<p>Number of distributions: 1</p>

</body>

</html>

override the implicit template

49 of 57

XSLT - apply templates - select which

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<title>

<xsl:value-of select="title[@xml:lang='en']"/>

</title>

</head>

<body>

<h1>

<xsl:value-of select="title[@xml:lang='en']"/>

</h1>

<xsl:apply-templates select="datasets/dataset"/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset">

<h2>

<xsl:value-of select="title[@xml:lang='en']"/>

</h2>

<p>Number of distributions: <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

<xsl:template match="text()"/>

</xsl:stylesheet>

49

<html xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<head>

<title>My catalog</title>

</head>

<body>

<h1>My catalog</h1>

<h2>Bikesharing in Brno</h2>

<p>Number of distributions: 2</p>

<h2>Bikesharing in Prague</h2>

<p>Number of distributions: 1</p>

</body>

</html>

XPath selecting nodes to which the templates will be applied next.�Default: child::node()

50 of 57

XSLT - named templates and parameters

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:template match="catalog">

<html>

<head>

<xsl:call-template name="processTitle">

<xsl:with-param name="element">title</xsl:with-param>

<xsl:with-param name="lang">cs</xsl:with-param>

</xsl:call-template>

</head>

<body>

<xsl:call-template name="processTitle">

<xsl:with-param name="element">h1</xsl:with-param>

<xsl:with-param name="lang">en</xsl:with-param>

</xsl:call-template>

<xsl:apply-templates select="datasets/dataset"/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset">

<xsl:call-template name="processTitle">

<xsl:with-param name="element">h2</xsl:with-param>

<xsl:with-param name="lang">en</xsl:with-param>

</xsl:call-template>

<p>Number of distributions: <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

<xsl:template match="text()"/>

<xsl:template name="processTitle">

<xsl:param name="element" required="yes"/>

<xsl:param name="lang" required="yes"/>

<xsl:element name="{$element}">

<xsl:value-of select="title[@xml:lang=$lang]"/>

</xsl:element>

</xsl:template>

</xsl:stylesheet>

Named templates

  • name attribute instead of match attribute
  • accept parameters
    • xsl:param - definition in named template
    • $variable - access to variable value in XPath
    • {$variable} - access to variable value elsewhere
  • called using xsl:call-template
    • does not change the currently processed node set
    • xsl:with-param - values passed when calling

xsl:element

  • creates an element on the output
  • name can be constant or {$variable}

50

51 of 57

XSLT - global variables, modes, if

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="html" encoding="UTF-8" indent="yes" />

<xsl:variable name="lang">en</xsl:variable>

<xsl:template match="catalog">

<html>

<head>

<xsl:apply-templates mode="head"/>

</head>

<body>

<xsl:apply-templates mode="catalog"/>

</body>

</html>

</xsl:template>

<xsl:template match="dataset" mode="head"/>

<xsl:template match="dataset" mode="catalog">

<xsl:apply-templates mode="dataset"/>

<p>Number of distributions: <xsl:value-of select="count(descendant::distribution)"/>

</p>

</xsl:template>

<xsl:template match="title" mode="head">

<xsl:if test="@xml:lang=$lang">

<xsl:element name="title">

<xsl:value-of select="text()"/>

</xsl:element>

</xsl:if>

</xsl:template>

<xsl:template match="title" mode="catalog">

<xsl:if test="@xml:lang=$lang">

<xsl:element name="h1">

<xsl:value-of select="text()"/>

</xsl:element>

</xsl:if>

</xsl:template>

<xsl:template match="text()" mode="#all"/>

</xsl:stylesheet>

Global variable

  • defined in the xsl:stylesheet root element using xsl:variable
  • accessible in the whole stylesheet
    • e.g. $lang

Mode

  • ability to process the same nodes in different ways
    • different templates with the same match
  • specified in xsl:apply-templates
  • used in unnamed xsl:template
    • #all matches all modes

51

If

52 of 57

Some remaining XSLT 1.0 features

"Switch"�<xsl:choose>� <xsl:when test='$level=1'>� <xsl:number format="i"/>� </xsl:when>� <xsl:when test='$level=2'>� <xsl:number format="a"/>� </xsl:when>� <xsl:otherwise>� <xsl:number format="1"/>� </xsl:otherwise>�</xsl:choose>

For each�<xsl:for-each select="item">� <xsl:sort select="."/>� <p>� <xsl:number value="position()" format="1. "/>� <xsl:value-of select="."/>� </p>�</xsl:for-each>

Include - XML-based inclusion

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:include href="article.xsl"/>

���Import - templates in importing stylesheet take precedence over imported templates

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="article.xsl"/>

52

53 of 57

Some XSLT 2.0 and 3.0 features

Grouping of data�<xsl:for-each-group select="cities/city" group-by="@country">� <tr>� <td><xsl:value-of select="@country"/></td>� <td>� <xsl:value-of select="current-group()/@name" separator=", "/>� </td>� <td><xsl:value-of select="sum(current-group()/@pop)"/></td>� </tr>�</xsl:for-each-group>

Multiple output documents�<xsl:result-document href="foo.html">� <!-- add instructions to generate document content here -->�</xsl:result-document>

Regular expressions�<!--This example transforms dates of the form "12/8/2003" into ISO 8601 standard form: "2003-12-08".-->�<xsl:analyze-string select="$date" regex="([0-9]+)/([0-9]+)/([0-9]{{4}})">� <xsl:matching-substring>� <xsl:number value="regex-group(3)" format="0001"/><xsl:text>-</xsl:text>� </xsl:matching-substring>�</xsl:analyse-string>

Streaming�<xsl:template match="/">� <xsl:stream href="books.xml">� <xsl:iterate select="/books/book">� <xsl:result-document href="{concat('book', position(),'.xml')}">� <xsl:copy-of select="."/>� </xsl:result-document>� <xsl:next-iteration/>� </xsl:iterate>� </xsl:stream>�</xsl:template>

Higher-order functions�<xsl:value-of select="$f1(2)"/>

Text processing: CSV, JSON, … on input�<xsl:variable name="header" select="tokenize(unparsed-text-lines($csv)[1], $sep)"/>

53

54 of 57

Examples

54

55 of 57

XSLT Example - IANA registry - generating HTML

<?xml version='1.0' encoding='UTF-8'?>

<?xml-stylesheet type="text/xsl" href="media-types.xsl"?>

<?oxygen RNGSchema="media-types.rng" type="xml"?>

<registry xmlns="http://www.iana.org/assignments" id="media-types">

<title>Media Types</title>

<category>Multipurpose Internet Mail Extensions (MIME) and Media Types</category>

<updated>2021-03-10</updated>

<registration_rule>Expert Review for Vendor and Personal Trees.</registration_rule>

<expert>Ned Freed, Alexey Melnikov, Murray Kucherawy (backup)</expert>

<xref type="rfc" data="rfc6838"/>

<xref type="rfc" data="rfc4855"/>

...

55

Link to XSLT stylesheet transforming XML to HTML

56 of 57

XSLT Example - Generating RDF Turtle

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:fn="http://www.w3.org/2005/xpath-functions">

<xsl:output method="text" encoding="UTF-8" />

<xsl:variable name="prefix">https://ex.org/resource/</xsl:variable>

<xsl:variable name="catalogIRI" select="concat($prefix, 'Catalog')"/>

<xsl:template match="catalog">

@prefix dcat: &lt;http://www.w3.org/ns/dcat#&gt; .

@prefix dcterms: &lt;http://purl.org/dc/terms/&gt; .

&lt;<xsl:value-of select="$catalogIRI"/>&gt; a dcat:Catalog .

<xsl:apply-templates>

<xsl:with-param name="currentIRI" select="$catalogIRI"/>

</xsl:apply-templates>

</xsl:template>

<xsl:template match="dataset">

<xsl:variable name="datasetIRI" select="concat($prefix, 'dataset/', fn:position())"/>

&lt;<xsl:value-of select="$catalogIRI"/>&gt; dcat:dataset &lt;<xsl:value-of select="$datasetIRI"/>&gt; .

&lt;<xsl:value-of select="$datasetIRI"/>&gt; a dcat:Dataset .

<xsl:apply-templates select="title">

<xsl:with-param name="currentIRI" select="$datasetIRI"/>

</xsl:apply-templates>

</xsl:template>

<xsl:template match="title">

<xsl:param name="currentIRI"/>

&lt;<xsl:value-of select="$currentIRI"/>&gt; dcterms:title &quot;<xsl:value-of select="text()"/>&quot;@<xsl:value-of select="@xml:lang"/> .

</xsl:template>

<xsl:template match="text()" mode="#all"/>

</xsl:stylesheet>

@prefix dcat: <http://www.w3.org/ns/dcat#> .

@prefix dcterms: <http://purl.org/dc/terms/> .

<https://ex.org/resource/Catalog> a dcat:Catalog .

<https://ex.org/resource/Catalog> dcterms:title "My catalog"@en .

<https://ex.org/resource/Catalog> dcterms:title "Můj katalog"@cs .

<https://ex.org/resource/Catalog> dcat:dataset <https://ex.org/resource/dataset/2> .

<https://ex.org/resource/dataset/2> a dcat:Dataset .

<https://ex.org/resource/dataset/2> dcterms:title "Bikesharing in Brno"@en .

<https://ex.org/resource/dataset/2> dcterms:title "Sdílení kol v Brně"@cs .

<https://ex.org/resource/Catalog> dcat:dataset <https://ex.org/resource/dataset/4> .

<https://ex.org/resource/dataset/4> a dcat:Dataset .

<https://ex.org/resource/dataset/4> dcterms:title "Bikesharing in Prague"@en .

<https://ex.org/resource/dataset/4> dcterms:title "Sdílení kol v Praze"@cs .

56

57 of 57

Literature

Jiří Kosek - XML pro každého (2004!) - https://www.kosek.cz/xml/index.html (in Czech)

57