1 of 68

Modern Data Formats�Introduction

Jakub Klímek

2 of 68

Some developers think mainly in terms of apps...

2

My computer

My app

My database

My app

My data

Data as part of an app

3 of 68

In public administration….

3

Computer

App

Database

App

Data

Give me the data, I want to build another app

No, you only paid for an app

Data as part of an app

App

Possible vendor-lock

4 of 68

Suddenly, the app needs to work with another app

4

His computer

His app

His database

His app

His data

Her computer

Her app

Her database

Her app

Her data

I will send you that

I will send you this

5 of 68

Suddenly, the app needs to work with another app

5

I will send you that

I will send you this

6 of 68

Data independent of applications

6

My computer

My app

My database

My app

Data

My computer

My app

My database

My app

7 of 68

Data independent of applications

7

Data

8 of 68

OK, data is independent. More problems?

Use of improper formats for a given use case

  • e.g. tabular data in hierarchical data format (XML, JSON)

Not following specifications

  • errors or unnecessary work when data is shared
  • whose fault is it?
    • the producer?
    • the consumer?
  • ultimately, who will pay for mitigating the issue?

8

9 of 68

Point of this course

  1. overview of data formats, specifications, tools and use cases
  2. ability to choose a proper data format for a given use case
  3. thinking about data independently of applications and at various levels of abstraction
    • conceptual level - what is the data about
    • logical level - how is data structured using given technology/format
    • physical level - how do the files look like in storage

9

10 of 68

Conceptual view of data

10

11 of 68

Conceptual domain model

Independent of any particular technology or representation. Answers:

What real world entities are described?

What are their properties?

How are they connected?

�Conceptual model can be discussed with non-IT personnel.

11

12 of 68

Example: National Open Data Catalog

12

13 of 68

Example: National Open Data Catalog - Dataset

13

14 of 68

Conceptual domain model - UML Class diagrams

Class: Catalog

��

This is saying:�“There are things in the real-world of a type Catalog.”

14

15 of 68

Conceptual domain model - UML Class diagrams

Class: CatalogAttributes: title, description, homepage

This is saying:�“Each instance of a catalog has a title, description and a homepage.”

15

16 of 68

Conceptual domain model - UML Class diagrams

Class: Catalog�Attributes: title, description, homepage

Class: Contact point�Attributes: name, e-mail

This is saying:�“There are contact points, each has a name and an e-mail”

16

17 of 68

Conceptual domain model - UML Class diagrams

Class: Catalog�Attributes: title, description, homepage

Class: Contact point�Attributes: name, e-mail

Association: contact point

  • Association end 1: Catalog, 0..*
  • Association end 2: Contact point, 0..1

This is saying:

  • An instance of a Catalog may or may not be connected to up to 1 instance of Contact point
  • An instance of a Contact point may or may not be connected to an arbitrary number of instances of Catalog

17

18 of 68

Conceptual domain model - UML Class diagrams

Class: Dataset�Attributes: title, description

Association: dataset

  • Association end 1: Catalog, 0..*
  • Association end 2: Dataset, 0..*

This is saying:

  • An instance of a Catalog may or may not be connected to an arbitrary number of instances of Dataset
  • An instance of a Dataset may or may not be connected to an arbitrary number of instances of Catalog

18

19 of 68

Conceptual domain model - UML Class diagrams

Attributes of Dataset:

  • keywords [0..*], specification [0..*]
  • documentation
  • spatial resolution, temporal resolution

Association: part of

  • Association end 1: Dataset, 0..1
  • Association end 2: Dataset, 0..*

Association: contact point

  • Association end 1: Contact point, 0..1
  • Association end 2: Dataset, 0..*

19

20 of 68

Conceptual domain model - UML Class diagrams

Associations:

  • theme
  • frequency
  • item from RÚIAN
  • spatial coverage
  • EuroVoc classification
  • Time interval

20

21 of 68

Conceptual domain model - UML Class diagrams

Association: distribution

  • Association end 1: � Dataset, 1..1
  • Association end 2: � Distribution, 0..*

21

22 of 68

Conceptual domain model - UML Class diagrams

Association: media type

  • Association end 1: Distribution, 0..*
  • Association end 2: Media type, 1..1

Association: package format

  • Association end 1: Distribution, 0..*
  • Association end 2: Media type, 0..1

Association: compression format

  • Association end 1: Distribution, 0..*
  • Association end 2: Media type, 0..1

22

23 of 68

Conceptual domain model - UML Class diagrams

23

24 of 68

Conceptual domain model - UML Class diagrams

24

25 of 68

Conceptual domain model - UML Class diagrams

DCAT-AP-CZ

  • Czech specification of how to represent data catalogs
    • 2021
  • Based on DCAT-AP 2.1.1
    • European application profile
    • European Commission, 2022
  • Based on DCAT 2
    • W3C Recommendation, 2020

Representations in:

RDF, JSON, CSV

25

26 of 68

Data models

vs.�Data formats

vs.�Data schemas

26

27 of 68

Data models - logical view of data - graphs

Resource Description Framework (RDF) model

Labeled Property Graph (LPG) model

27

https://...

https://...

https://...

https://...

"Coffee shop"

Ann

Dan

Place

V60

Steve

John

KNOWS

KNOWS

EMPLOYS

VISITS

SERVES

EMPLOYS

:Person

:Person

:Person

:Person

:CoffeeShop

:BrewingMethod

name: V60

duration: 3 minutes

name: John�age: 42

name: Steve�age: 24

name: Dan�age: 87

name: Ann�age: 16

since: 2021-03-01

since: 2010-12-31

since: 2020-03-16

"Ann"

https://...

https://...

https://...

https://...

https://...

https://...

https://...

28 of 68

Data models - logical view of data - hierarchies/trees

Document Object Model (DOM)

JSON (both format and model)

28

document

root�<coffeeShops>

attribute�number

element�<coffeeShop>

element�<name>

element�<employees>

element�<employee>

text"Place"

text�2

element�<name>

element�<age>

array

object

array

value�"Place"

object

object

name

employees

object

object

object

value�"Ann"

name

29 of 68

Data models - logical view of data - tables

Relational model

29

name

age

knows

Ann

16

John

42

Steve

Steve

24

Dan

87

Steve

name

employee

Place

Ann

Place

John

30 of 68

Data formats - physical view of data

Graph

  • RDF graph model
    • Text-based: N-Triples, N-Quads, Turtle, TriG, RDF/XML, JSON-LD, RDFa
    • Binary: HDT��
  • Property graph
    • CSV, JSON, GraphML, Cypher Script

Hierarchical

  • DOM
    • XML, HTML
  • JSON
    • JSON, XML

30

How data using a certain data model is serialized into files / sent over network

Relational

  • CSV, SQL dump

31 of 68

Data schemas

Annotations and constraints applicable to instances of data formats, allowing the data to be better described and validated

CSV

  • Schema language
    • CSV on the Web

RDF

  • Schema language
    • SHACL
    • ShEx

JSON

  • Schema language
    • JSON Schema

XML

  • Schema languages
    • DTD
    • XML Schema
    • Relax NG
    • Schematron

31

32 of 68

Specific data formats using meta-formats

CSV, JSON, XML, … sometimes called meta-formats

They serve as “host” formats for use-case specific formats

Data schemas used to define these specific formats

JSON

  • GeoJSON

CSV

  • General Transit Feed Specification (GTFS)
  • Example of Prague public transport

XML

  • SVG, Atom, RSS 2.0, Office Open XML (.docx, .xlsx), …

RDF

  • DCAT, Schema.org

32

33 of 68

Generic data format properties�open vs. closed

machine-readability�binary vs. text-based

33

34 of 68

Open format?

34

35 of 68

Open vs. closed formats

Open

Specification available on the Web, freely accessible to anyone, with no limitation on its usage.

  • Meta-formats e.g.: XML, JSON, CSV, RDF
  • Specific formats
    • SVG, GeoJSON, ...

Closed

  • specification not accessible
  • need for payment for access to specification
  • need for registration
  • need for certification of library/application claiming compatibility

Examples

  • railML.org
    • XML based
    • need for certification

35

36 of 68

Machine-readable format

“All files are machine-readable, because all files are, in the end, read by machines”

36

The second part is true, but not what machine readability is about

37 of 68

Machine-readable format

“Open formats like CSV, XML, JSON or RDF Turtle and Excel .xlsx files are machine readable”

37

38 of 68

Machine-readable format - CSV, JSON

,,,,,,,,,,,,

Back to TOC,,,,,,,,,,,,

r2 : R2. Do you have permanent residence in Brno?,,,,,,,,,,,,

,%,count,,,,,,,,,,

Yes,89.1%,1385,,,,,,,,,,

No,10.9%,169,,,,,,,,,,

TOTAL,100.0%,1554,,,,,,,,,,

"Total sample, Weight: Weight, base n = 1554",,,,,,,,,,,,

,,,,,,,,,,,,

Back to TOC,,,,,,,,,,,,

r3 : R3. For how long have you lived in Brno? ,,,,,,,,,,,,

,%,count,,,,,,,,,,

"ico","nazev","udaje","vymazDatum","zapisDatum"

"3571092","Nadace RK CARE","[{hlavicka=Spisová značka;zapisDatum=2014-11-20;hodnotaText=N 521/KSBR;udajTyp={kod=SPIS_ZN;nazev=spisová značka};spisZn={soud={kod=KSBR;nazev=Krajský soud v Brně};oddil=N;vlozka=521}}, {hlavicka=Název;zapisDatum=2014-11-20;hodnotaText=Nadace RK CARE;udajTyp={kod=NAZEV;nazev=název}}, {hlavicka=Sídlo;zapisDatum=2014-11-20;udajTyp={kod=SIDLO;nazev=sídlo};adresa={statNazev=Česká republika;obec=Lipůvka;castObce=Lipůvka;cisloPo=385;psc=67922;okres=Blansko}}, {hlavicka=Identifikační číslo;zapisDatum=2014-11-20;hodnotaText=3571092;udajTyp={kod=ICO;nazev=identifikační číslo}}, {hlavicka=Právní forma;zapisDatum=2014-11-20;hodnotaText=nad;udajTyp={kod=PRAVNI_FORMA;nazev=právní forma};pravniForma={kod=nad;nazev=Nadace;zkratka=nad}}, {hlavicka=Účel nadace;zapisDatum=2014-11-20;udajTyp={kod=UCEL_SUBJEKTU_SEKC

38

39 of 68

Machine-readable format - XML

<?xml version="1.0" encoding="UTF-8"?>

<PvsRejstrikData rejstrik="" operace="1" xmlns="http://portal.gov.cz/portal/xsd/PvsRejstrikData">

<TYPE>datová sada</TYPE>

<NAZEV>Smlouvy SŽDC 2017</NAZEV>

<POPIS>Uzavřené smlouvy organizace Správa železniční a dopravní cesty (resort dopravy) v roce 2017</POPIS>

<HOMEPAGE></HOMEPAGE>

<PERIODICITY></PERIODICITY>

<SPATIAL_TYPE></SPATIAL_TYPE>

<SPATIAL_TYPE_TXT></SPATIAL_TYPE_TXT>

<SPATIAL_CODE></SPATIAL_CODE>

<SPATIAL_CODE_TXT>Česká republika</SPATIAL_CODE_TXT>

<THEME></THEME>

<THEME_TXT>-</THEME_TXT>

<KEYWORDS>smlouva</KEYWORDS>

<STAV>zpracováno 2017-03-29 15:54:05</STAV>

<PROBLEMY></PROBLEMY>

<x-priloha MimeTyp="application/xml" Jmeno="data.xml">PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+CjxkYXRhc2V0IHhtbG5zPSJodHRwOi8vcG9ydGFsLmdvdi5jei9wb3J0YWwveHNkL1B2c1JlanN0cmlrRGF0YSIgSUQ9IiIgb3BlcmFjZT0iMSI+CiAgPHRpdGxlPlNtbG91dnkgU8W9REMgMjAxNzwvdGl0bGU+CiAgPGRlc2NyaXB0aW9uPlV6YXbFmWVuw6kgc21sb3V2eSBvcmdhbml6YWNlIFNwcsOhdmEgxb5lbGV6bmnEjW7DrSBhIGRvcHJhdm7DrSBjZXN0eSAocmVzb3J0IGRvcHJhdnkpIHYgcm9jZSAyMDE3PC9kZXNjcmlwdGlvbj4KICA8YWNjcnVhbFBlcmlvZGljaXR5PlIvUDFNPC9hY2NydWFsUGVyaW9kaWNpdHk+CiAgPHNwYXRpYWw+CiAgICA8dHlwZT5TVDwvdHlwZT4KICAgIDxub3RhdGlvbj4xPC9ub3RhdGlvbj4KICA8L3NwYXRpYWw+CiAgPHRlbXBvcmFsPgogICAgPHN0YXJ0RGF0ZT4yMDE3LTAxLTAxPC9zdGFydERhdGU+CiAgICA8ZW5kRGF0ZT4yMDE3LTEyLTMxPC9lbmREYXRlPgogIDwvdGVtcG9yYWw+CiAgPGtleXdvcmQ+c21sb3V2YTwva2V5d29yZD4KICA8ZGlzdHJpYnV0aW9uPgogICAgPGFjY2Vzc1VSTD5odHRwOi8vd3d3Lm1kY3IuY3ovTURDUi9tZWRpYS9vdGV2cmVuYWRhdGEvc21sb3V2eS8yMDE3L3NtbG91dnlfc3pkY18yMDE3LmNzdjwvYWNjZXNzVVJMPgogICAgPGRvd25sb2FkVVJMPmh0dHA6Ly93d3cubWRjci5jei9NRENSL21lZGlhL290ZXZyZW5hZGF0YS9zbWxvdXZ5LzIwMTcvc21sb3V2eV9zemRjXzIwMTcuY3N2PC9kb3dubG9hZFVSTD4KICAgIDxmb3JtYXQ+dGV4dC9jc3Y8L2Zvcm1hdD4KICAgIDxsaWNlbnNlPmh0dHBzOi8vcG9ydGFsLmdvdi5jei9wb3J0YWwvb3N0YXRuaS92b2xueS1wcmlzdHVwLWstZHMuaHRtbDwvbGljZW5zZT4KICA8L2Rpc3RyaWJ1dGlvbj4KPC9kYXRhc2V0Pgo=</x-priloha>

</PvsRejstrikData>

39

40 of 68

Machine-readable format - XLSX

40

41 of 68

Machine-readable format

Machine readability is not a property of a format

  • Depends on the form of a particular data instance
  • Says whether the data is easily processed by appropriate applications
  • For example
    • tabular data: structured as table: rows, columns and cells well formatted, easily processed
    • textual data: individual characters easily accessible, i.e. without OCR
    • ...

41

42 of 68

Binary vs. text based formats

“Binary format means that the file is stored as 1s and 0s”

-- a student at a recent state exam

42

This is, of course, true also for text-based file formats

43 of 68

Binary vs. text based formats

Binary files

  • Their structure may be defined�on bit by bit level
  • a.k.a. “non-text” file
  • Not readable by text editors
  • Viewable by hex editors

Text-based files

  • Contains text
  • Typically structured as characters on lines
  • Viewable by text editors
  • Also viewable by hex editors
  • Text is encoded into 1s and 0s using character encoding

43

44 of 68

Text-based formats - character encoding - US-ASCII

Character encoding - representation of characters as binary sequences (numbers)

US-ASCII using 7 bits to represent 1 character

44

Original by: User:Vanessaezekowitz, CC BY-SA 3.0, via Wikimedia Commons

45 of 68

Text-based formats - newline representations

CR - carriage return - \rLF - line feed - \n - Unix/Linux, MacOSCR LF - both of them - \r\n - WindowsSee all variants (Wikipedia)

45

46 of 68

Text-based formats - newline representations

CR - carriage return - \rLF - line feed - \n - Unix/Linux, MacOS�CR LF - both of them - \r\n - Windows�See all variants (Wikipedia)

46

47 of 68

Text-based formats - character encoding - UTF-8

1 to 4 bytes representing one character

first byte compatible with US-ASCII

most frequently used characters use 2 bytes

emojis use 4 bytes

47

Number of bytes

First code point

Last code point

Byte 1

Byte 2

Byte 3

Byte 4

1

U+0000

U+007F

0xxxxxxx

2

U+0080

U+07FF

110xxxxx

10xxxxxx

3

U+0800

U+FFFF

1110xxxx

10xxxxxx

10xxxxxx

4

U+10000

U+10FFFF

11110xxx

10xxxxxx

10xxxxxx

10xxxxxx

48 of 68

Character encoding - BOM - Byte order mark

Magic number at the beginning of a text file

Indicates

  • unicode encoding
  • encoding type
    • UTF-8 - EF BB BF
    • UTF-16 BE - FE FF
    • UTF-16 LE - FF FE
    • UTF-32 00 00 FE FF
    • more...
  • byte order (endianness) for multi-byte encodings

Most data formats use UTF-8 without BOM

  • since other UTF variants are rarely used

48

49 of 68

Character encoding - other encodings

  • Mac OS Roman
  • KOI8-R, KOI8-U, KOI7
  • MIK
  • ISCII
  • TSCII
  • VISCII
  • JIS X 0208 is a widely deployed standard for Japanese character encoding that has several encoding forms.
    • Shift JIS (Microsoft Code page 932 is a dialect of Shift_JIS)
    • EUC-JP
    • ISO-2022-JP
  • JIS X 0213 is an extended version of JIS X 0208.
  • Chinese Guobiao
    • GB 2312
    • GBK (Microsoft Code page 936)
    • GB 18030
  • Taiwan Big5 (a more famous variant is Microsoft Code page 950)
    • Hong Kong HKSCS
  • Korean
    • KS X 1001 is a Korean double-byte character encoding standard
    • EUC-KR
    • ISO-2022-KR
  • Unicode (and subsets thereof, such as the 16-bit 'Basic Multilingual Plane')
    • UTF-8
    • UTF-16
    • UTF-32
  • ANSEL or ISO/IEC 6937

49

In Czechia, from legacy systems mainly

  • iso-8859-2 (Latin 2)
  • windows-1250

50 of 68

Standardization

50

51 of 68

Standards - for data formats and other things

Why do we need standards?

  • Interoperability, naturally
  • But also business
    • So it is clear who is doing something wrong...
    • … and who will pay to fix it

51

52 of 68

Interoperability is costly. For each dataset:

  • 1 specification created�
  • each provider needs to learn the specification�
  • each provider needs to adjust their data publication process�
  • each consumer learns 1 specification to process all data

👨‍🔧

👨‍🔧

👩‍🔧

👨‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👨‍🔧

Spec

53 of 68

Low interoperability is even costlier! For each dataset:

  • each provider creates specification
  • each provider needs to learn the specification�
  • each provider needs to adjust their data publication process�
  • each consumer learns all specifications to process all data

👩‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👨‍🔧

👨‍🔧

👩‍🔧

👨‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👨‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👩‍🔧

👨‍🔧👩‍🔧

Spec

Spec

Spec

Spec

Spec

Spec

Spec

Spec

54 of 68

Internet Engineering Task Force - IETF

Open standards organization

  • founded 1986
  • initially supported by the US federal government
  • now under ISOC
  • participants are volunteers

IETF Working Groups

  • topic, chairperson, charter, focus, deadline
  • open to all

Internet Engineering Steering Group (IESG)

  • final technical review of Internet standards

54

55 of 68

Internet Society - ISOC

Americat non-profit

  • founded 1992
  • to provide leadership in Internet-related standards, education, access, and policy
  • deals mainly with political issues
  • standards are created by the �Internet Engineering Task Force (IETF) to which ISOC is related
  • RFC - Request for Comments
    • some become Internet Standards

“to promote the open development, evolution, and use of the Internet for the benefit of all people throughout the world”

55

56 of 68

World Wide Web Consortium - W3C

International standards organization for the WWW

  • founded 1994
  • by Tim Berners-Lee - inventor of the Web
  • issues Recommendations
    • e.g. HTML, CSS, RDF, XML, RSS…

Specification maturation process

  1. Working draft (WD)
  2. Candidate recommendation (CR)
  3. Proposed recommendation (PR)
  4. W3C recommendation (REC)
  • membership is paid and must be approved
    • universities, non-profits, businesses, governments, individuals

56

57 of 68

Internet Corporation for Assigned Names and Numbers - ICANN

Standards organization

  • founded 1998
  • manages IANA - Internet Assigned Numbers Authority
  • IPv4 and IPv6 address space management
  • autonomous system number allocation
  • root zone management in DNS
  • media types

57

58 of 68

MIME-Type, Media-type

Multipurpose Internet Mail Extensions (MIME) type

The list: Media Types

Managed by�Internet Assigned Numbers Authority (IANA)

Examples:

  • text/html
  • text/xml
  • application/xml
  • application/soap+xml
    • + suffix - specifies serialization - e.g. +xml, +json, +zip
  • application/vnd.openxmlformats-officedocument.wordprocessingml.document
    • vnd. - publicly available products, e.g. Microsoft Office
  • text/x-turtle
    • x- & x. - should not be used - experimental, deprecated, local, etc.

58

59 of 68

Ecma International

Standards organization

  • founded 1961
  • membership-based
    • IT companies, IT trade associations, universities, foundations and public institutions
  • rebranded in 1994 from�European Computer Manufacturers Association (ECMA)
  • HQ: Geneva, Switzerland

Examples:

  • ECMA-262 – ECMAScript Language Specification (based on JavaScript)
  • ECMA-334 – C# Language Specification
  • ECMA-376 – Office Open XML
  • ECMA-404 – JSON

59

60 of 68

RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels

MUST, REQUIRED, SHALL

  • an absolute requirement of the specification

MUST NOT, SHALL NOT

  • an absolute prohibition of the specification

SHOULD, RECOMMENDED

  • there may exist valid reasons in particular circumstances to ignore a particular item
  • full implications must be understood and carefully weighed before choosing a different course

SHOULD NOT, NOT RECOMMENDED

  • there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful
  • full implications should be understood and the case carefully weighed before implementing any behavior described with this label

60

61 of 68

RFC 5234 - Augmented Backus-Naur Form (ABNF)

Example

fragment = *( pchar / "/" / "?" )�pchar = unreserved / pct-encoded / sub-delims / ":" / "@"�pct-encoded = "%" HEXDIG HEXDIGunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"�sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

61

62 of 68

Identifiers

62

63 of 68

URI, URL, IRI, URN

URI - Uniform Resource Identifier - RFC 3986�URN - Uniform Resource Name - RFC 8141, IANA URN namespace registry�URL - Uniform Resource Locator - RFC 3986�IRI - Internationalized Resource Identifier - RFC 3987�� foo://example.com:8042/over/there?name=ferret#nose� \_/ \______________/\_________/ \_________/ \__/� | | | | |� scheme authority path query fragment� | _____________________|__� / \ / \� urn:example:animal:ferret:nose

63

64 of 68

RFC 3986 - Uniform Resource Identifier - examples

  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • http://www.ietf.org/rfc/rfc2396.txt
  • ldap://[2001:db8::7]/c=GB?objectClass?one
  • mailto:John.Doe@example.com
  • news:comp.infosystems.www.servers.unix
  • tel:+1-816-555-1212
  • telnet://192.0.2.16:80/
  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2

64

65 of 68

RFC 3987 - IRI - Internationalized Resource Identifier

Examples

Percent-encoding

  • For some usages only URIs are acceptable
    • e.g. HTTP
  • IRIs are encoded in URIs
  • each byte represented as �'%' and two hexadecimal digits
  • e.g. 💩 => %F0%9F%92%A9
    • emojis are 4 bytes in UTF-8

The same examples of IRIs percent-encoded into URIs

65

66 of 68

RFC 3492 - Punycode

IRIs not to be confused with IDN - internationalized domain name:

  • e.g. https://www.háčkyčárky.cz = https://www.xn--hkyrky-ptac70bc.cz/
  • even less readable than percent-encoding
  • punycoded name is used with DNS

66

67 of 68

Data types

67

68 of 68

Common data types in text-based structured data formats

  • boolean
    • true
    • false
  • number
    • integer
      • 42
    • decimal
      • 42.42
    • float/double
      • 4.2e2
  • date - ISO-8601-compliant
    • YYYY-MM-DD
    • 2021-03-01
  • time
    • HH:MM:SS.sss
    • 10:40:00
  • dateTime
    • YYYY-MM-DDTHH:MM:SS.sss
    • 2021-03-01T10:40:00
  • time zones
    • 2021-03-01T10:40:00+02:00
    • 2021-03-01-02:00
    • 2021-03-01Z

The same data types used in all common formats - RDF syntaxes, XML, JSON, CSV�Based on XML Schema data type system

68