1 of 44

DOREMUS

a Graph of Interlinked Musical Work

Pasquale Lisena

EURECOM, France

M. Achichi, P. Lisena, K. Todorov, R. Troncy, J. Delahousse

2 of 44

2

Which works have been composed

by Mozart when he was <10?

How many works have been composed and performed for the 1st time in the same city?

Which composers had the chance to direct their own work in a performance during the last decade?

3 of 44

3

metadata about

artists, works, performances, scores

Music

knowledge graph

used for building the knowledge graph�open-source, reusable

Tools for converting and interlinking

4 of 44

4

Music is complex

5 of 44

5

M. Lasar (2011). Digging into Pandora’s Music Genome with musicologist Nolan Gasser. https://arstechnica.com/tech-policy/2011/01/digging-into-pandoras-music-genome-with-musicologist-nolan-gasser/

When it comes to classical music, on the other hand, it's much more about the composition itself, because even though the interpretation can vary in various subtle ways.

CLASSICAL

POP

VS

For pop music the experience of the music is really defined by the recording.

6 of 44

6

CLASSICAL

POP

VS

Track-based

Work-based

60 years of history

Thousand years�from Gregorian chant to a work written last Tuesday

Songs

Multi-movement works

Major, minor

Polyphonic, homophonic, monophonic

7 of 44

7

8 of 44

8

Music archives have

very detailed knowledge

PROBLEMS

  • Multiple formats
  • No possible interoperability
  • Need for discovering overlapping knowledge
  • Information codified as free text
  • Not always publicly accessible

APPROACH

Semantic Web!

9 of 44

9

Improve music description to foster music exchange and reuse

Travel to the heart of the musical archives in France’s greatest institutions

Connect sources, multiply usage, enrich user experience

10 of 44

10

Building the

DOREMUS graph

DATA CONVERSION

DATA LINKING

LINK VALIDATION

DATA MODELING

marc2rdf

string2vocabulary

...custom converters

legato

11 of 44

11

DATA CONVERSION

DATA LINKING

LINK VALIDATION

The DOREMUS Model

  • Music specific extension of FRBRoo
  • Dynamic: it is made up of autonomous combined modules
  • Relies on Linked Data principles (everything is an URI, RDF model)

FRBR

museum information

bibliographic records

DATA MODELING

Choffé, Pierre, and Françoise Leresche. DOREMUS: connecting sources, enriching catalogues and user experience. In 24th IFLA World Library and Information Congress. 2016.

12 of 44

12

The building blocks

Work-Expression-Event

F14

Work

F22

Expression

F28

Expression

Creation

R3 is realized in

R17 created

R19 created a realization of

DATA CONVERSION

DATA LINKING

LINK VALIDATION

DATA MODELING

13 of 44

13

F14

Work

F22

Expression

M2

Opus Statement

F28

Expression

Creation

R3 is realized in

E7

Activity

5

1

“Sonate pour violoncelle et piano no 1”@fr

“Sonates" , "Sonata in F"

Ludwig van Beethoven

Ludwig von Beethoven

composer

compositeur@fr compositore@it

R17 created

R19 created a realization of

U17 has opus statement

U12 has genre

P102 has title

U31 had function of type

P14 carried out by

P9 consists of

P4 has time span

1796

Sonata

sonata@it , sonate@fr , klaviersonate@de

M42 Performed

Expression

Creation

M43�Performed

Expression

Berlin

P4 has time span

1796

P7 took

place at

F24 Publication Expression

F30 Publication Event

P4 has time span

1797

P7 took place at

Vienna

U4 had princeps publication

U54 is performed expression of

P165 incorporates

1770

1827

P98

born

P100

died

U11 has key

F Major

F Dur@de , Fa majeur@fr,

Fa maggiore@it , Fa mayor@es

M6

Casting

M23

Casting Detail

U13 has casting

1

U30

quantity

U2 foresees mop

Piano

Pianoforte@it Fortepian@pl

M23

Casting Detail

1

U30

quantity

U2 foresees mop

Cello

Violoncello@it Violoncelle@fr

F15

Complex

Work

F19 Publication Work

M44

Performed

Work

U5 had premiere

 U38 has descriptive expression

R10 has member

14 of 44

14

F22

Expression

M6

Casting

M23

Casting Detail

U13 has casting

1

U30

quantity

U2 foresees mop

Piano

Pianoforte@it Fortepian@pl

M23

Casting Detail

1

U30

quantity

U2 foresees mop

Cello

Violoncello@it Violoncelle@fr

15 of 44

Controlled Vocabularies for Music Metadata

GENRES

Diabolo

IAML

Itema3

Redomi

RAMEAU

Medium of performance

MIMO

Itema3

IAML

Diabolo

RAMEAU

Redomi

Musical keys

Modes

Catalogues

Derivation types

Functions

more available at

http://data.doremus.org/vocabularies

23 families of vocabularies · 11,000+ concepts · 610 links between terms

published at ISMIR 2018

INTERLINKED

INTERLINKED

16 of 44

16

Dealing with different formats

Works: INTERMARC

Scores: INTERMARC

Discs: INTERMARC

Works: UNIMARC

Scores: INTERMARC

Performances: XML

Works - Recordings - Scores

3 different XML sources

A pre-digital archive format in Radio France

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

17 of 44

Source datasets

17

Works

62 550 | XML

Scores

9 154 | XML

Concerts

340 609 | XML

Discs

9 500 | XML

Works

6 846 | UNIMARC

Scores

30 319 | UNIMARC

Concerts

5 164 | XML

Discs

8 602 | XML

Works

135 940 | INTERMARC

Scores

89 184 | INTERMARC

18 of 44

Source datasets

18

DATASET

Works�

Scores

Concerts

Discs

Classic work

Jazz improvisation

Ethnic/World/Traditional music

19 of 44

19

001 FRBNF139081882FR

100 $313891295$w.0..b.....$aBeethoven$mLudwig van$d1770-1827

144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur

001 FRBNF139081882FR

100 $313891295$w.0..b.....$aBeethoven$mLudwig van$d1770-1827

144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur

LANG TITLE MOP OPUS KEY

MARC FILE

MARC must die

Roy Tennant, 2002

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

20 of 44

20

marc2rdf

MARC PARSER

  • Parsing of the file
  • Interpretation of the fields
  • Graph generation

MARC

files

mapping rules

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

21 of 44

21

144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur

F22 Expression: Opus Number

F22 Self-Contained Expression

U17 has opus statement M2 Opus Statement

[U42 has opus number M12 Opus Number]

+ [U43 has opus subnumber M13 Opus Subnumber]

TUM : 144 $p, chain of digits

TUM : 144 $p, chain of digits before the comma

Remove the abbreviation “Op.” before the number

144 $pOp. 352 --> M12 = 352

144 $pOp. 27, no 2 --> M12 = 27, M13 =2

UNIT OF INFORMATION

PATH

INTERMARC BNF

TRANSFER RULE

EXAMPLE

MAPPING RULES

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

22 of 44

22

marc2rdf

MARC PARSER

FREE TEXT INTERPRETER

MARC

files

vocabularies

1st performance in Moscow, December 29, 1956,

by Mstislav Rostropovich on cello and A. Dedukhin on piano

  • Extracting info from the text through empirical rules
  • Disambiguation for vocabularies terms and artists

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

23 of 44

23

marc2rdf

MARC PARSER

FREE TEXT INTERPRETER

STRING 2 VOCABULARY

  • Replace labels with URIs from controlled vocabularies

MARC

files

vocabularies

“Violoncelle”@fr

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

24 of 44

24

STRING 2 VOCABULARY

  • Match against a family of vocabularies

“Soprano”@it

MIMO IAML DIABOLO ITEMA3 REDOMI RAMEAU

GENRE

“C Major”@en

GENRE

vocabulary:key/c

KEY

vocabulary:key/c

  • 2 passes
    • Exact label + language
    • Exact label, any language
  • Correction of editorial mistakes

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

25 of 44

25

INTERMARC

marc2rdf

UNIMARC

EUTERPE XML

ITEMA3 XML

euterpe

converter

itema3

converter

GRAPH BNF

GRAPH PHILHARMONIE

GRAPH EUTERPE

GRAPH ITEMA3

diabolo converter

DIABOLO XML

GRAPH DIABOLO

DATA MODELING

DATA LINKING

LINK VALIDATION

DATA CONVERSION

STRING 2 VOCABULARY

26 of 44

26

GRAPH BNF

GRAPH PHILHARMONIE

http://data.doremus.org/expression/d72301f0-0aba-3ba6-93e5-c4efbee9c6ea

“Quasi una fantasia”

COMPOSER Beethoven

ORDER NUM 14

OPUS 27 n. 2

GENRE sonata

CASTING piano

KEY C sharp major

1st PUB ?

PREMIERE ?

http://data.doremus.org/expression/37932fbc-fef3-3edb-9fae-1eec9b4be01d Sonata quasi una fantasia”

COMPOSER Beethoven

ORDER NUM 14

OPUS 27, no 2

GENRE sonata, romantic music

CASTING piano (1)

KEY C sharp major

1st PUB 1802, Vienna

PREMIERE ?

sameAs

27 of 44

27

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

Challenges

  • Not all the works have values for all the propertieslack of attributes
  • Similar values do not necessarily imply a matchi.e. Beethoven’s Sonata n. 1, Sonata n. 2, Sonata n. 3
  • Lexical, semantic, transliteration, orthographic mismatches

On the left: Beethoven.

On the right: (the same) Beethoven.

28 of 44

28

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

First Linking

Composer + Catalogue

Wolfgang Amadeus Mozart

Eine kleine Nachtmusik K 525

Wolfgang Amadeus Mozart

Serenade No. 13 in G major KV 525

sameAs

29 of 44

29

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

Legato

New linking system

Existing data linking system were not satisfactory

30 of 44

30

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

* works to be compared are grouped by composer

*

31 of 44

31

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

32 of 44

32

DATA MODELING

LINK VALIDATION

DATA CONVERSION

DATA LINKING

Heterogeneities Task

False Positive Trap

Legato performances at the

OAEI campaign 2017

sandbox

mainbox

SPIMBENCH

DOREMUS

33 of 44

33

DATA LINKING

DATA MODELING

DATA CONVERSION

LINK VALIDATION

certain links

confidence score + experts’ validation

?

SINGLE LINK TRIANGLE MISSING LINK CONFLICT

inference if experts’ validation

remove with

experts’ check

34 of 44

34

What is in the Knowledge Graph?

89.872

persons

(composers, performers, …)

18.075

corporate bodies�(orchestras, chorus, publishers, …)

357.451

musical works

16k components

4k derived works

193.412

concerts and studio recordings

469.131

performed work

3.833

foreseen concerts

31.296

publications

48.006

scores

35 of 44

35

Future Work

  • More interlinking with MusicBrainz
  • Internal interlinking of performances
  • Create bridges with other communities �(musicologists, streaming services, …)

Applications

  • Explorative Search Engine
  • KG-Based Recommender System

36 of 44

GitHub page

converters, interlinking tool, data dumps, ...

github.com/DOREMUS-ANR/

OVERTURE

discover DOREMUS data

overture.doremus.org

DOREMUS website

www.doremus.org

CHATBOT

q&a system for classical music

chatbot.doremus.org

THIS PRESENTATION

https://goo.gl/1UmKnV

pasquale.lisena@eurecom.fr

37 of 44

37

Persons

9.269

euterpe

1.503

diabolo

9.040

itema3

8.419

philharmonie

19.881

bnf

54.675

bnf bib

291.421

in the whole graph

89.872

active*

* with 1 or more compositions, performances, dedications, ...

1.479

dedicatees

529

subjects

21.626

composers

7.830

conductors

3.583

performers

13.242

text authors

38 of 44

38

Corporate Bodies

45.743

in the whole graph

18.075

active*

* with 1 or more compositions, performances, dedications, ...

1001

euterpe

0

diabolo

39

itema3

1.603

philharmonie

855

bnf

14.657

bnf bib

6

dedicatees

7

subjects

517

orchestras + ensembles

192

choruses

6.099

publishers

2.194

producers

39 of 44

39

Works

f15

f14

f22

-

10.587

10.587

euterpe

9.343

12.344

12.344

diabolo

--

15.016

15.016

itema3

5.762

14.527

14.875

philharmonie

135.749

134.973

134.973

bnf

245.069

223.357

279.641

bnf bib

420.733

expressions �(include movements)

357.451

complex works

40 of 44

40

Works

16.132

components*

4.619

arrangements

293

transcriptions

43

orchestration

4.884

total of derivations

* movements, parts, acts, selections (extraits) ...

420.733

expressions �(include movements)

357.451

complex works

41 of 44

41

Performances

193.065

concerts (performances)

5.702

converted from specific records

469.131

interpretations of

288.298

distinct works

f31

m43

2.294

2.294

diabolo

2.296

12.602

itema3

7107

47.119

philharmonie

14.115

15.221

bnf

165.225

387.519

bnf bib

42 of 44

42

Foreseen Concerts

3.833

concerts

13.520

interpretations of

10.759

distinct works

m26

f25 > f22

3.833

13.520

euterpe

17

artistic seasons

281

cycles

33

festivals

43 of 44

43

Recordings

397.597

recordings

15.267

supports

f26

f4

f3

2.296

2.842

-

itema3

3.406

11.681

-

philharmonie

392.020

744

199.339

bnf bib

198.693

publications

44 of 44

44

Scores

31.296

publications

48.006

scores

44.668

distinct works

f24

f24 > f22

31.296

48.006

bnf bib