1 of 132

Version Identification

in the 20s

Furkan Yesiler

Christopher Tralie

Joan Serrà

2 of 132

Speakers

Furkan Yesiler

Education

  • PhD Student - Music Technology Group, UPF
  • MSc in Sound and Music Computing, UPF
  • BSc in Computer Engineering, Koc University
  • BSc in Industrial Engineering, Koc University

Experience

  • Visiting Research Engineer, BMAT
  • M&A Advisory Intern, Alpacar Associates
  • Management Consulting Intern, Accenture

Research interests

Music recognition, Music information retrieval, Machine learning, Metric learning, Music similarity, Model compression

2

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

3 of 132

Speakers

Christopher Tralie

Education

  • PhD - Duke University
  • MSc in Electrical and Computer Engineering, Duke University
  • BSc in Electrical Engineering with Certificate in Computer Science, Princeton University

Experience

  • Assistant Professor, Ursinus College
  • Postdoctoral Fellow, Johns Hopkins University
  • Postdoctoral Associate, Duke University

Research interests

Music information retrieval, topological data analysis, time series analysis, applied geometry

3

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

4 of 132

Speakers

Joan Serrà

Education

  • PhD - Music Technology Group, UPF
  • MSc in Information, Communication and Audiovisual Technologies, UPF
  • BSc in Electrical and Electronics Engineering, URL
  • BSc in Telecommunications Engineering, URL

Experience

  • Staff Researcher, Dolby Laboratories
  • Researcher, Telefonica
  • Postdoctoral Researcher, IIIA-CSIC
  • Visiting Researcher, MPIPKS & MPII

Research interests

Machine learning, Generative models, Time series, Speech processing, Recommender systems, Music information retrieval

4

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

5 of 132

Acknowledgments

  • Thanks for their input
    • Rachel Bittner
    • Dogac Basaran
    • Erling Wold
    • Guillaume Doras

  • Thanks to Emilia Gómez

  • F.Y. is supported by MIP-Frontiers (MSCA Grant No. 765068)

5

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

6 of 132

Outline

6

4

5

Complementary Ideas

Current Challenges &

Future Directions

1

3

2

Introduction

Techniques for Version Identification

Examples of Version Identification Systems

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

7 of 132

What is not covered in this tutorial

  • Enhanced input representations
    • Intervalgram, pitch bihistogram, cochlearPCP, …

  • Musicological considerations

  • Fingerprinting / hashing for live version identification

  • Classical music version identification

7

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

8 of 132

Introduction

8

1

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

9 of 132

Songs need new voices to sing them in places they’ve never been sung

in order to stay alive.

Emmylou Harris

9

‘‘

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

10 of 132

Defining versions

“Different renditions of the same underlying musical piece”

10

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

11 of 132

Defining versions

“Different renditions of the same underlying musical piece”

11

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

12 of 132

Why not “cover songs”?

  • Vague definition
  • Historical/economic connotations
  • Wide range of versions

12

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

13 of 132

Types of versions

13

Remaster

Instrumental

Demo

Cross-genre

Acoustic

Mashup

Medley

Live

Standard

Remix

Quotation

Karaoke

Parody

Difficulty in identification

Commonness

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

14 of 132

Types of versions

14

Cross-genre

Instrumental

Mashup

Acoustic

Demo

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

15 of 132

Modifiable characteristics

  • Timbre
    • Production techniques
    • Instrumentation
  • Tempo
  • Timing
  • Key / root note / transposition
  • Structure
  • Harmonization
  • Lyrics / language
  • Recording / background noise

15

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

16 of 132

Task overview

Definition

“Automatic detection of recordings that correspond to the same underlying musical piece”

Applications

  • Digital rights management
  • Organization/navigation of large music collections
  • Set list identification
  • ...

16

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

17 of 132

Task overview

First papers / Precursors

  • Foote, ARTHUR : Retrieving Orchestral Music by Long-Term Structure. ISMIR 2000
  • Yang, Music Database Retrieval Based on Spectral Similarity. ISMIR 2001

17

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

18 of 132

Version identification overview

Popularity

18

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

19 of 132

Datasets

Publicly-available datasets

  • covers80 (2007)
    • 164 songs in 80 cliques

  • SecondHandSongs dataset (2011)
    • Subset of the Million Song Dataset
    • Training: +12k songs
    • Evaluation: +5k songs

  • YoutubeCovers (2015)
    • 350 songs in 50 cliques

  • Covers1000 (2017)
    • 1k songs in 395 cliques

  • SHS100K (2018)
    • Training: +97k songs
    • Evaluation: +10k songs

  • SHS5+ and SHS4- (2019)
    • Training: +62k songs in +7k cliques
    • Evaluation: +48k songs in +19k cliques

  • Da-TACOS (2019)
    • Training: +97k songs in +17k cliques
    • Evaluation: 15k songs in 3k cliques

19

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

20 of 132

SecondHandSongs.com

VI researchers’ best friend

20

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

21 of 132

SecondHandSongs.com

  • Founded in early 2003 by Bastien De Zutter, Mathieu De Zutter and Denis Monsieur
  • 27 active editors
  • +100k originals, +800k versions

21

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

22 of 132

MIREX

  • “Audio Cover Song Identification” task
  • Since 2006
  • Evaluation set
    • 30 cliques with 11 songs each
    • 670 noise songs

22

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

23 of 132

Evaluation metrics

  • Similarity-based evaluation
  • Mostly adopted from MIREX

23

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

24 of 132

Evaluation metrics

  • Mean average precision

24

Prec@k

rel@k

R

1/1

1

R

2/2

1

2/3

0

2/4

0

R

3/5

1

3/6

0

3/7

0

APrec@k

1/1 * 1

2/2 * 1

2/3 * 0

2/4 * 0

3/5 * 1

3/6 * 0

3/7 * 0

+

+

+

+

+

+

3

Sorted items

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

25 of 132

Evaluation metrics

  • Mean average precision
  • Mean rank of the first relevant item

25

R

R

R

rankrel1=1

R

R

rankrel1=2

R

rankrel1=5

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

26 of 132

Evaluation metrics

  • Mean average precision
  • Mean rank of the first relevant item
  • Mean reciprocal rank

26

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

27 of 132

Evaluation metrics

  • Mean average precision
  • Mean rank of the first relevant item
  • Mean reciprocal rank
  • Number of relevant items @1
  • Number of relevant items @10

27

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

28 of 132

Techniques for

Version Identification

28

2

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

29 of 132

Techniques

Recall the principal invariances we want to deal with:

  • Timbre
  • Tempo
  • Timing
  • Transposition
  • Structure
  • Harmonization
  • Lyrics / language
  • Noise

29

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

30 of 132

Techniques

Features

Emphasis on pitch/tonal features

30

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

31 of 132

Techniques

Features

  • Melody

31

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Marolt, A mid-level melody-based representation for calculating audio similarity. ISMIR 2006.

Salamon et al., Melody, bass line, and harmony representations for music version identification. AdMIRe 2012.

Doras & Peeters. Cover detection using dominant melody embeddings. ISMIR 2019.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

32 of 132

Techniques

Features

  • Melody
  • Chord progressions

32

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Bello, Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. ISMIR 2007.

Khadkevich & Omologo, Large-scale cover song identification using chord profiles. ISMIR 2013.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

33 of 132

Techniques

Features

  • Melody
  • Chord progressions
  • Chroma/Pitch Class Profiles

33

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.

Silva et al., SiMPle: assessing music similarity using subsequences joins. ISMIR 2016.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

34 of 132

Techniques

Features

  • Melody
  • Chord progressions
  • Chroma/Pitch Class Profiles
  • Features’ self-similarity

34

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

35 of 132

Techniques

Transposition invariance

35

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

36 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin

36

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Gómez & Herrera, The song remains the same: identifying versions of the same pieces using tonal descriptors. ISMIR 2006.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

37 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding

37

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Sailer & Dressler, Finding cover songs by melodic similarity. MIREX 2006.

Ahonen & Lemström, Identifying cover songs using normalized compression distance. MML 2008.

+4, -4, +2, +2, ...

melody-mean(melody)

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

38 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding
  • Considering all transpositions

38

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.

Khadkevich & Omologo, Large-scale cover song identification using chord profiles. ISMIR 2013.

Original

+1

+2

+3

+4

+5

+6

+7

+8

+9

+10

+11

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

39 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding
  • Considering all transpositions
  • Considering optimal transposition indices (OTIs)

39

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Serrà et al., Transposing chroma representations to a common key. USRMMO 2008.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

40 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding
  • Considering all transpositions
  • Considering optimal transposition indices (OTIs)
  • Fourier transform (2D)

40

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.

Humphrey et al., Data driven and discriminative projections for large-scale cover song identification. ISMIR 2013.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

41 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding
  • Considering all transpositions
  • Considering optimal transposition indices (OTIs)
  • Fourier transform (2D)
  • Convolutional kernels

41

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.

Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.

Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.

Image Reference

https://www.cc.gatech.edu/~san37/post/dlhc-cnn/

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

42 of 132

Techniques

Transposition invariance

  • Key estimation + Transpose to CMaj/Amin
  • Relative feature encoding
  • Considering all transpositions
  • Considering optimal transposition indices (OTIs)
  • Fourier transform (2D)
  • Convolutional kernels
  • Key-invariant convolution module

42

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Conv layer

12 x W

Max-pool

12 x 1

T

23

T`

12

T`

1

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

43 of 132

Techniques

Tempo/timing invariance

43

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

44 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features

44

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

45 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features
  • Fourier transform (2D)

45

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.

Humphrey et al., Data driven and discriminative projections for large-scale cover song identification. ISMIR 2013.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

46 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features
  • Fourier transform (2D)
  • Alignment techniques (e.g., dynamic time warping)

46

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Gómez & Herrera, The song remains the same: identifying versions of the same pieces using tonal descriptors. ISMIR 2006.

Serrà et al., Chroma binary similarity and local alignment applied to cover song identification. IEEE-TASLP, 2008.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

47 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features
  • Fourier transform (2D)
  • Alignment techniques (e.g., dynamic time warping)
  • Strided convolutions and pooling

47

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.

Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.

Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.

2

4

1

2

1

1

1

1

3

4

8

6

7

6

9

2

4

1

8

9

Max-Pool

(Stride 2)

2

4

1

2

1

1

1

1

3

4

8

6

7

6

9

2

Max-Pool

(Stride 1)

4

4

4

7

1

7

8

9

9

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

48 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features
  • Fourier transform (2D)
  • Alignment techniques (e.g., dynamic time warping)
  • Strided convolutions and pooling
  • Recurrent kernels (RNN, GRU, LSTM, …)

48

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Ye et al., Supervised Deep Hashing for Highly Efficient Cover Song Detection. MIPR 2019.

h1

x1

z1

h0

h2

x2

z2

h3

x3

z3

h4

x4

z4

h5

x5

z5

hn

xn

zn

...

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

49 of 132

Techniques

Tempo/timing invariance

  • Beat-synchronous features
  • Fourier transform (2D)
  • Alignment techniques (e.g., dynamic time warping)
  • Strided convolutions and pooling
  • Recurrent kernels (RNN, GRU, LSTM, …)
  • Dilated temporal pyramid convolutional kernels

49

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.

Image Reference

https://www.nature.com/articles/s41598-018-24304-3

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

50 of 132

Techniques

Structure invariance

50

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

51 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section

51

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Marolt, A mid-level melody-based representation for calculating audio similarity. ISMIR 2006.

Gómez et al., Automatic tonal analysis from music summaries for version identification. AES 2006.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

52 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing

52

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Grosche et al., Towards cover group thumbnailing. ACM-MM 2013.

Van Balen et al., Cognition-inspired descriptors for scalable cover song retrieval. ISMIR 2014.

Silva et al., Summarizing and comparing music data and its application on cover song identification. ISMIR 2018.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

53 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling

53

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Müller et al., Audio matching via chroma-based statistical features. ISMIR 2005.

Casey & Slaney, Song intersection by approximate nearest neighbor search. ISMIR 2006.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

54 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling
  • Local alignment techniques

54

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

Chen et al., Fusing similarity functions for cover song identification. MTAP 2018.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

55 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling
  • Local alignment techniques
  • Convolutional kernels

55

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.

Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.

Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.

Image Reference

https://www.cc.gatech.edu/~san37/post/dlhc-cnn/

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

56 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling
  • Local alignment techniques
  • Convolutional kernels
  • Global pooling operations (max-pool, mean-pool, ...)

56

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

References

Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018.

Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.

Doras & Peeters., Cover detection using dominant melody embeddings. ISMIR 2019.

2

4

1

2

1

1

1

1

3

4

8

6

7

6

9

2

4

1

8

9

9

2

4

1

2

1

1

1

1

3

4

8

6

7

6

9

2

Max-Pool

GlobalMax-Pool

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

57 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling
  • Local alignment techniques
  • Convolutional kernels
  • Global pooling operations (max-pool, mean-pool, ...)
  • Multi-channel attention

57

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

T`

T`

T`

T`

1

Softmax

Split in half

Dot product

2

References

Serrà et al., Towards a universal neural network encoder for time series. CCIA 2018.

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

58 of 132

Techniques

Structure invariance

  • Section segmentation/repeated section
  • Thumbnailing
  • Shingling
  • Local alignment techniques
  • Convolutional kernels
  • Global pooling operations (max-pool, mean-pool, ...)
  • Multi-channel attention
  • Temporal self-attention

58

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.

Image References

https://www.youtube.com/watch?v=5T38-2J5CcY

https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

59 of 132

Techniques

Learning from data

59

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

60 of 132

Techniques

Learning from data

  • Autoencoders

60

References

Stamenovic, Identifying cover songs using deep neural networks. 2015

Fang et al., Deep feature learning for cover song identification. MTA 2017

Encoder

Embedding

Input

Decoder

Reconstructed Input

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

61 of 132

Techniques

Learning from data

  • Autoencoders
  • Classification-based training

61

References

Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018.

Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.

Yu et al., Learning a representation for cover song identification using convolutional neural network. ICASSP 2020.

Feature extractor

FC

FC

Class likelihood

Embedding

Input

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

62 of 132

Techniques

Learning from data

  • Autoencoders
  • Classification-based training
  • Similarity-based training
    • Metric learning

62

References

Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.

Qi et al., Triplet convolutional network for music version identification. LNCS 2018.

Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Feature extractor

FC

Embedding

Input

Minimize distance

Maximize distance

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

63 of 132

Techniques

Data augmentation

63

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

64 of 132

Techniques

Data augmentation

  • Pitch transposition

64

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

+1

+2

+3

+4

+5

+6

+7

+8

+9

+10

Original

+11

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

65 of 132

Techniques

Data augmentation

  • Pitch transposition
  • Time stretching

65

References

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Original

x0.7

x1.2

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

66 of 132

Techniques

Data augmentation

  • Pitch transposition
  • Time stretching
  • Time warping

66

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

Silence

0 5 4 8

1 8 5 9

5 3 5 7

5 6 8 7

1 0 9 6

0 0 4 8

1 0 5 9

5 0 5 7

5 0 8 7

1 0 9 6

0 5 5 4 8

1 8 8 5 9

5 3 3 5 7

5 6 6 8 7

1 0 0 9 6

0 4 8

1 5 9

5 5 7

5 8 7

1 9 6

Duplicate

Remove

Original

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

67 of 132

Techniques

Data augmentation

  • Pitch transposition
  • Time stretching
  • Time warping
  • Input patch sampling

67

References

Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.

Timbre

Tempo

Timing

Key

Structure

Harmony

Lyrics

Noise

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

68 of 132

Examples of Version Identification Systems

68

3

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

69 of 132

Example Systems: Alignment-based

69

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

70 of 132

Example Systems: Alignment-based

Overview

  • Feature extraction: HPCP
  • Transposition invariance: OTI
  • State-space embeddings
  • Cross-recurrence plot
  • Local alignment: Qmax

70

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

71 of 132

Example Systems: Alignment-based

Feature extraction: Harmonic pitch class profiles (HPCPs)

71

Reference

Gómez, Tonal description of music audio signals. PhD Thesis, 2006.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

72 of 132

Example Systems: Alignment-based

Transposition invariance: Optimal transposition index (OTI)

72

Reference

Serrà et al., Transposing chroma representations to a common key. USRMMO 2008.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

73 of 132

Example Systems: Alignment-based

Frame-level similarities

  • Adding context with state-space embeddings

73

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

74 of 132

Example Systems: Alignment-based

Frame-level similarities

  • Adding context with state-space embeddings

  • Pairwise binary similarity with a cross recurrence plot

74

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

75 of 132

Example Systems: Alignment-based

Frame-level similarities

  • Adding context with state-space embeddings
  • Pairwise binary similarity with a cross recurrence plot

75

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

76 of 132

Example Systems: Alignment-based

Local alignment with Qmax

76

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

77 of 132

Example Systems: Alignment-based

Local alignment with Qmax

77

Reference

Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

78 of 132

Example Systems: Embedding-based

78

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

79 of 132

Example Systems: Embedding-based

Overview

  • Feature extraction: cremaPCP
  • Input data augmentation
  • Key-invariant convolution module
  • Dilated convolutions
  • Summarizing the temporal content
  • Standardizing embedding components
  • Metric learning

79

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Transposition invariance module

Expanding the receptive field

Summarizing the temporal content

Creating & standardizing embeddings

d

Data augmentation

Distance metric

Distance value

Input

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

80 of 132

Example Systems: Embedding-based

Feature extraction: cremaPCP

  • Intermediate output of a chord estimation model
  • Improved performance vs HPCP

80

References

McFee & Bello, Structured training for large vocabulary chord recognition. ISMIR 2017.

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

cremaPCP

HPCP

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

81 of 132

Example Systems: Embedding-based

Data augmentation

  • Increased robustness against modifiable characteristics
  • Pitch transposition
  • Time stretching & warping
  • Input patch sampling

81

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Original

Pitch transposition

Time stretch

& warping

Input patch

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

82 of 132

Example Systems: Embedding-based

Transposition invariance: Key-invariant convolution module

82

References

Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Conv layer

12 x W

Max-pool

12 x 1

T

23

T`

12

T`

1

crema-PCP

crema-PCP

Duplicate & concatenate

Remove the last row

12

T = 1800

T = 1800

T = 1800

12

12

23

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

83 of 132

Example Systems: Embedding-based

Expanding the receptive field

  • Adding more temporal context with dilated convolutions

83

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Key-invariant

convolution module

Conv layer

Kernel size: 5

Dilation: 1

Filters: 256

Higher level non-linearity without expanding the receptive field

Conv layer

Kernel size: 5

Dilation: 20

Filters: 256

Expanding the receptive field with dilation

Conv layer

Kernel size: 5

Dilation: 1

Filters: 256

Higher level non-linearity without expanding the receptive field

Conv layer

Kernel size: 5

Dilation: 13

Filters: 512

Expanding the receptive field with dilation

23

T

512 channels

T` overlapping

30s-time frames

Output shape: B x 512 x T`

Image Reference

https://www.nature.com/articles/s41598-018-24304-3

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

84 of 132

Example Systems: Embedding-based

Summarizing the temporal content

  • The goal: having C features per-song (output shape of B x C) regardless of the song duration
    • We don’t know which part of the song is more important
    • We can deal with songs with varying durations
  • The network should figure out which time steps are important, and how important they are

84

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

256

256

T`

256

T`

256

T`

256

T`

256

1

Channel-wise softmax

Split in half

Channel-wise dot product

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

85 of 132

Example Systems: Embedding-based

Standardizing the embedding components

  • Important for easier metric learning
  • To control the range of values of each embedding component
  • To make sure that all the dimensions of the latent space are utilized similarly
    • Due to being centered around 0 and having similar spreads

85

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Transposition invariance module

Expanding the receptive field

Summarizing the temporal content

Fully-connected layer

Input: B x 256

Output: B x d

23

T

1

d

Batch Normalization

Input: B x d

Output: B x d

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

86 of 132

Example Systems: Embedding-based

Similarity-based training

  • Metric learning with triplet loss

86

Reference

Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.

Anchor

Negative

Positive

Anchor

Negative

Positive

m

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

87 of 132

Alignment vs. Embedding: Considerations

  • Learning vs rule-based
  • Storage requirements
  • Retrieval speed
  • Interpretability

87

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

88 of 132

Complementary

Ideas

88

4

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

89 of 132

Motivation

  • Chroma often works fine….

89

Toto “Africa”

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

90 of 132

Motivation

  • ...but sometimes it doesn’t!

  • And even within chroma, there are many ways to compute similarity
  • No free lunch with a particular feature choice/algorithm!

90

Run D.M.C. “Tricky”

Zappa Drum Covers

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

91 of 132

Motivation

  • Some of the best-performing algorithms do not scale well
    • Leverage weaker systems to prune in multiple stages
  • Networks of songs can improve weaker similarity measures

91

Reference

Yesiler et al., Da-TACOS: A dataset for cover song identification and understanding. ISMIR 2019.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

92 of 132

Ensemble Systems

  • Fusing similar beat-synchronous features chroma features with different tempo levels
  • Training SVM-based classifier

92

Reference

Ravuri & Ellis, Cover song detection: from high scores to general classification". ICASSP 2010

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

93 of 132

Ensemble Systems

  • Similar to fusion in Ravuri, but with HPCP/Melody/Bass Line Chroma

93

Reference

Salamon et al., Melody, bass line, and harmony representations for music version identification. AdMIRe 2012.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

94 of 132

Ensemble Systems / Early Fusion

  • Use NMF to separate melody/accompaniment, CSM for each
  • Late fusion at distance
  • Early fusion on CSMs

94

Reference

Foucard et al., Multimodal similarity between musical streams for cover version detection. ICASSP 2010.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

95 of 132

Ensemble Systems

  • Normalize N distance measures from each feature to [0, 1], treat each as a dimension in N-dimensional Euclidean space

95

Reference

Degani et al., A heuristic for distance fusion in cover song identification. WIAMIS 2013.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

96 of 132

Ensemble Systems

  • Instead of training classifiers, use rank aggregation
    • Can use simple rules such as median/mean aggregation
    • Automatically scale invariant!

96

Reference

Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.

1

2

Mean

Median

1.33

2.67

3

3.33

1

3

3

4

2.67

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

97 of 132

Ensemble Systems

  • Instead of training classifiers, use rank aggregation
    • Kemeny optimal ranks minimize sum of Kendall Tau distance to all ranks
    • Local kemenization creates greedy swaps to find local min

97

Reference

Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.

1

Concordant

Discordant

3/6

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

98 of 132

Ensemble Systems

  • Instead of training classifiers, use rank aggregation
    • Kemeny optimal ranks minimize sum of Kendall Tau distance to all ranks
    • Local kemenization creates greedy swaps to find local min

98

Reference

Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.

1

Concordant

Discordant

2/6

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

99 of 132

Ensemble Systems

  • Rank aggregation results on Million Songs SHS dataset

99

Reference

Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

100 of 132

Ensemble Systems / Clique Enhancement

  • Joint community structure can denoise clusters and fuse features
    • Step 1: Normalize each distance measure based on joint neighborhood sizes
    • Step 2: Joint random walk between features

100

References

Wang et al., Similarity network fusion for aggregating data types on a genomic scale. NM 2013.

Wang et al., Unsupervised metric fusion bycross diffusion. CVPR 2012.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

101 of 132

Ensemble Systems

  • Use SNF to fuse all pairs scores as measured by two different alignment schemes on HPCP
  • Example of individual pair shown below

101

References

Chen et al., Fusing similarity functions for cover song identification. MTA 2018.

Wang et al., Similarity network fusion for aggregating data types on a genomic scale. NM 2013.

Wang et al., Unsupervised metric fusion bycross diffusion. CVPR 2012.

QMax

DMax

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

102 of 132

Early Fusion

  • Can also use at the beat/frame level within a song!
  • Now possible to fuse pitch and non-pitch features

102

References

Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.

Tralie & McFee, Enhanced hierarchical music structure annotations via feature level similarity fusion. ICASSP 2019.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

103 of 132

Early Fusion

  • To extend to cross-similarity matrices, find SSM between concatenated songs

103

Reference

Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

104 of 132

Early Fusion

  • Fusing features cleans up diagonals in cross-similarity matrices

104

Reference

Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

105 of 132

Early Fusion

  • Fusing features cleans up diagonals in cross-similarity matrices

  • Leading to higher alignment scores

  • Still possible to do late SNF on full network of songs after obtaining these scores

105

Reference

Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

106 of 132

Pruning

  • Downside of SNF at frame level is it must be done between all pairs of query song and reference songs (very slow!)
  • Consider structure-based features on top of SNF on SSM instead

106

References

Tralie & McFee, Enhanced hierarchical music structure annotations via feature level similarity fusion. ICASSP 2019.

Yesiler et al., Da-TACOS: A dataset for cover song identification and understanding. ISMIR 2019.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

107 of 132

Pruning

  • Weak rejectors based on features such as
    • FTM 2D
    • Rough structure similarity measures

107

References

Cai et al. Two-layer large-scale cover song identification system based on music structure segmentation. IEEE MMSP 2016.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

108 of 132

Pruning

  • Weak rejectors based on features such as
    • Duration, tempo
    • Bag of words chroma

108

References

Osmalskyj et al., Efficient database pruning for large-scale cover song recognition. ICASSP 2013.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

109 of 132

Pruning

  • Weak rejectors based on features such as
    • TF-IDF on song titles and lyrics
    • Good performance on predominantly English datasets

109

Reference

Correya et al., Large-scale cover song detection in digital music libraries using metadata, lyrics and audio features. ISMIR 2018..

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

110 of 132

Clique Enhancement

  • Create a feature space in which each dimension is a distance from the query song to a song in a predefined “core set” of songs
  • Can do metric learning on true covers to improve embedding

110

Reference

Heo et al., Cover song identification with metric learning using distance as a feature. ISMIR 2017.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

111 of 132

Clique Enhancement

  • Community structure can denoise clusters

111

References

Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.

Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

112 of 132

Clique Enhancement

  • Community structure can denoise clusters

112

References

Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.

Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

113 of 132

Clique Enhancement

  • Possible to use measures of centrality to determine original version

113

References

Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.

Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

114 of 132

Current Challenges &

Future Directions

114

5

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

115 of 132

Open issues

Task definition

  • MIR definition of work != Legal definition of work

115

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

116 of 132

Open issues

Task definition

  • MIR definition of work != Legal definition of work
  • Evaluation metrics don’t reflect practical needs

116

MAP

MR1

MRR

Top1

TP

FP

FN

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

117 of 132

Open issues

Task definition

  • MIR definition of work != Legal definition of work
  • Evaluation metrics don’t reflect practical needs
  • Focus on western music cultures

117

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

118 of 132

Open issues

Improving systems

  • Incorporating more descriptors (rhythm, lyrics, …)

118

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

119 of 132

Open issues

Improving systems

  • Incorporating more descriptors (rhythm, lyrics, …)
  • End-to-end learning

119

d

Encoder

Input

d

Encoder

Input

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

120 of 132

Open issues

Improving systems

  • Incorporating more descriptors (rhythm, lyrics, …)
  • End-to-end learning
  • More focus on post-processing

120

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

121 of 132

Open issues

Evaluation data

  • More testing on edge cases (cross-genre, acapella, …)

121

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

122 of 132

Open issues

Evaluation data

  • More testing on edge cases (cross-genre, acapella, …)
  • Wider range of genres

122

Image Reference

http://www.icrex.fi/popular-music-genres-world/

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

123 of 132

Open issues

Evaluation data

  • More testing on edge cases (cross-genre, acapella, …)
  • Wider range of genres
  • Industry-scale datasets

123

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

124 of 132

Open issues

Improving scalability

  • Reporting performance metrics and run times

124

Image Reference

https://programmingblah.com/Big-O-Notation-Part-2/

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

125 of 132

Open issues

Improving scalability

  • Reporting performance metrics and run times
  • Scaling down the amount of training data for learning

125

Image Reference

https://towardsdatascience.com/the-quiet-semi-supervised-revolution-edec1e9ad8c

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

126 of 132

Open issues

Emphasis on application scenarios

  • Working with short queries

126

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

127 of 132

Open issues

Emphasis on application scenarios

  • Working with short queries
  • Streaming VI

127

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

128 of 132

Open issues

Emphasis on application scenarios

  • Working with short queries
  • Streaming VI
  • Plagiarism detection

128

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

129 of 132

Open issues

Emphasis on application scenarios

  • Working with short queries
  • Streaming VI
  • Plagiarism detection
  • Tracing back the origins of musical phrases

129

Image Reference

https://www.researchgate.net/publication/277677598_Unsupervised_analysis_of_similarities_between_musicians_and_musical_genres_using_spectrograms

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

130 of 132

Final Words

130

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

131 of 132

There is nothing that says a great song cannot be interpreted at any time in any way.

Phil Ramone

131

‘‘

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020

132 of 132

Q&A

132

F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020