Version Identification
in the 20s
Furkan Yesiler
Christopher Tralie
Joan Serrà
Speakers
Furkan Yesiler
Education
Experience
Research interests
Music recognition, Music information retrieval, Machine learning, Metric learning, Music similarity, Model compression
2
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Speakers
Christopher Tralie
Education
Experience
Research interests
Music information retrieval, topological data analysis, time series analysis, applied geometry
3
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Speakers
Joan Serrà
Education
Experience
Research interests
Machine learning, Generative models, Time series, Speech processing, Recommender systems, Music information retrieval
4
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Acknowledgments
5
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Outline
6
4
5
Complementary Ideas
Current Challenges &
Future Directions
1
3
2
Introduction
Techniques for Version Identification
Examples of Version Identification Systems
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
What is not covered in this tutorial
7
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Introduction
8
1
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Songs need new voices to sing them in places they’ve never been sung
in order to stay alive.
Emmylou Harris
9
‘‘
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Defining versions
“Different renditions of the same underlying musical piece”
10
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Defining versions
“Different renditions of the same underlying musical piece”
11
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Why not “cover songs”?
12
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Types of versions
13
Remaster
Instrumental
Demo
Cross-genre
Acoustic
Mashup
Medley
Live
Standard
Remix
Quotation
Karaoke
Parody
Difficulty in identification
Commonness
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Types of versions
14
Cross-genre
Instrumental
Mashup
Acoustic
Demo
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Modifiable characteristics
15
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Task overview
Definition
“Automatic detection of recordings that correspond to the same underlying musical piece”
Applications
16
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Task overview
First papers / Precursors
17
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Version identification overview
Popularity
18
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Datasets
Publicly-available datasets
19
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
SecondHandSongs.com
VI researchers’ best friend
20
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
SecondHandSongs.com
21
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
MIREX
22
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Evaluation metrics
23
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Evaluation metrics
24
Prec@k
rel@k
R
1/1
1
R
2/2
1
2/3
0
2/4
0
R
3/5
1
3/6
0
3/7
0
APrec@k
1/1 * 1
2/2 * 1
2/3 * 0
2/4 * 0
3/5 * 1
3/6 * 0
3/7 * 0
+
+
+
+
+
+
3
Sorted items
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Evaluation metrics
25
R
R
R
rankrel1=1
R
R
rankrel1=2
R
rankrel1=5
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Evaluation metrics
26
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Evaluation metrics
27
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques for
Version Identification
28
2
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Recall the principal invariances we want to deal with:
29
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Features
Emphasis on pitch/tonal features
30
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Features
31
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Marolt, A mid-level melody-based representation for calculating audio similarity. ISMIR 2006.
Salamon et al., Melody, bass line, and harmony representations for music version identification. AdMIRe 2012.
Doras & Peeters. Cover detection using dominant melody embeddings. ISMIR 2019.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Features
32
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Bello, Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. ISMIR 2007.
Khadkevich & Omologo, Large-scale cover song identification using chord profiles. ISMIR 2013.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Features
33
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.
Silva et al., SiMPle: assessing music similarity using subsequences joins. ISMIR 2016.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Features
34
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
35
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
36
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Gómez & Herrera, The song remains the same: identifying versions of the same pieces using tonal descriptors. ISMIR 2006.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
37
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Sailer & Dressler, Finding cover songs by melodic similarity. MIREX 2006.
Ahonen & Lemström, Identifying cover songs using normalized compression distance. MML 2008.
+4, -4, +2, +2, ...
melody-mean(melody)
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
38
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.
Khadkevich & Omologo, Large-scale cover song identification using chord profiles. ISMIR 2013.
Original
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
39
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Serrà et al., Transposing chroma representations to a common key. USRMMO 2008.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
40
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.
Humphrey et al., Data driven and discriminative projections for large-scale cover song identification. ISMIR 2013.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
41
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.
Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.
Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.
Image Reference
https://www.cc.gatech.edu/~san37/post/dlhc-cnn/
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Transposition invariance
42
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Conv layer
12 x W
Max-pool
12 x 1
T
23
…
T`
12
T`
1
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
43
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
44
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Ellis & Polliner, Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. ICASSP 2007.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
45
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Bertin-Mahieux & Ellis, Large-scale cover song recognition using the 2D Fourier transform magnitude. ISMIR 2012.
Humphrey et al., Data driven and discriminative projections for large-scale cover song identification. ISMIR 2013.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
46
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Gómez & Herrera, The song remains the same: identifying versions of the same pieces using tonal descriptors. ISMIR 2006.
Serrà et al., Chroma binary similarity and local alignment applied to cover song identification. IEEE-TASLP, 2008.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
47
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.
Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.
Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.
2
4
1
2
1
1
1
1
3
4
8
6
7
6
9
2
4
1
8
9
Max-Pool
(Stride 2)
2
4
1
2
1
1
1
1
3
4
8
6
7
6
9
2
Max-Pool
(Stride 1)
4
4
4
7
1
7
8
9
9
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
48
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Ye et al., Supervised Deep Hashing for Highly Efficient Cover Song Detection. MIPR 2019.
h1
x1
z1
h0
h2
x2
z2
h3
x3
z3
h4
x4
z4
h5
x5
z5
hn
xn
zn
...
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Tempo/timing invariance
49
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.
Image Reference
https://www.nature.com/articles/s41598-018-24304-3
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
50
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
51
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Marolt, A mid-level melody-based representation for calculating audio similarity. ISMIR 2006.
Gómez et al., Automatic tonal analysis from music summaries for version identification. AES 2006.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
52
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Grosche et al., Towards cover group thumbnailing. ACM-MM 2013.
Van Balen et al., Cognition-inspired descriptors for scalable cover song retrieval. ISMIR 2014.
Silva et al., Summarizing and comparing music data and its application on cover song identification. ISMIR 2018.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
53
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Müller et al., Audio matching via chroma-based statistical features. ISMIR 2005.
Casey & Slaney, Song intersection by approximate nearest neighbor search. ISMIR 2006.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
54
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
Chen et al., Fusing similarity functions for cover song identification. MTAP 2018.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
55
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.
Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.
Doras & Peeters, A Prototypical Triplet Loss for Cover Detection. ICASSP 2020.
Image Reference
https://www.cc.gatech.edu/~san37/post/dlhc-cnn/
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
56
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
References
Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018.
Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.
Doras & Peeters., Cover detection using dominant melody embeddings. ISMIR 2019.
2
4
1
2
1
1
1
1
3
4
8
6
7
6
9
2
4
1
8
9
9
2
4
1
2
1
1
1
1
3
4
8
6
7
6
9
2
Max-Pool
GlobalMax-Pool
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
57
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
T`
T`
T`
T`
1
Softmax
Split in half
Dot product
2
References
Serrà et al., Towards a universal neural network encoder for time series. CCIA 2018.
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Structure invariance
58
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.
Image References
https://www.youtube.com/watch?v=5T38-2J5CcY
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Learning from data
59
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Learning from data
60
References
Stamenovic, Identifying cover songs using deep neural networks. 2015
Fang et al., Deep feature learning for cover song identification. MTA 2017
Encoder
Embedding
Input
Decoder
Reconstructed Input
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Learning from data
61
References
Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018.
Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.
Yu et al., Learning a representation for cover song identification using convolutional neural network. ICASSP 2020.
Feature extractor
FC
FC
Class likelihood
Embedding
Input
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Learning from data
62
References
Stamenovic, Towards cover song detection with siamese convolutional neural networks. ICML 2017.
Qi et al., Triplet convolutional network for music version identification. LNCS 2018.
Doras & Peeters, Cover detection using dominant melody embeddings. ISMIR 2019.
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Feature extractor
FC
Embedding
Input
Minimize distance
Maximize distance
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Data augmentation
63
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Data augmentation
64
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
Original
+11
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Data augmentation
65
References
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Original
x0.7
x1.2
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Data augmentation
66
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
Silence
0 5 4 8
1 8 5 9
5 3 5 7
5 6 8 7
1 0 9 6
0 0 4 8
1 0 5 9
5 0 5 7
5 0 8 7
1 0 9 6
0 5 5 4 8
1 8 8 5 9
5 3 3 5 7
5 6 6 8 7
1 0 0 9 6
0 4 8
1 5 9
5 5 7
5 8 7
1 9 6
Duplicate
Remove
Original
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Techniques
Data augmentation
67
References
Yu et al., Temporal pyramid pooling convolutional neural network for cover song identification. IJCAI 2019.
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Jiang et al., Learn a robust representation for cover song identification via aggregating local and global music temporal context. ICME 2020.
Timbre | Tempo | Timing | Key | Structure | Harmony | Lyrics | Noise |
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Examples of Version Identification Systems
68
3
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
69
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Overview
70
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Feature extraction: Harmonic pitch class profiles (HPCPs)
71
Reference
Gómez, Tonal description of music audio signals. PhD Thesis, 2006.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Transposition invariance: Optimal transposition index (OTI)
72
Reference
Serrà et al., Transposing chroma representations to a common key. USRMMO 2008.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Frame-level similarities
73
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Frame-level similarities
74
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Frame-level similarities
75
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Local alignment with Qmax
76
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Alignment-based
Local alignment with Qmax
77
Reference
Serrà et al., Cross recurrence quantification for cover song identification. NJP 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
78
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Overview
79
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Transposition invariance module
Expanding the receptive field
Summarizing the temporal content
Creating & standardizing embeddings
d
Data augmentation
Distance metric
Distance value
Input
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Feature extraction: cremaPCP
80
References
McFee & Bello, Structured training for large vocabulary chord recognition. ISMIR 2017.
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
cremaPCP
HPCP
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Data augmentation
81
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Original
Pitch transposition
Time stretch
& warping
Input patch
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Transposition invariance: Key-invariant convolution module
82
References
Xu et al., Key-invariant convolutional neural network toward efficient cover song identification. ICME 2018
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Conv layer
12 x W
Max-pool
12 x 1
T
23
…
T`
12
T`
1
crema-PCP
crema-PCP
Duplicate & concatenate
Remove the last row
12
T = 1800
T = 1800
T = 1800
12
12
23
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Expanding the receptive field
83
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Key-invariant
convolution module
Conv layer
Kernel size: 5
Dilation: 1
Filters: 256
Higher level non-linearity without expanding the receptive field
Conv layer
Kernel size: 5
Dilation: 20
Filters: 256
Expanding the receptive field with dilation
Conv layer
Kernel size: 5
Dilation: 1
Filters: 256
Higher level non-linearity without expanding the receptive field
Conv layer
Kernel size: 5
Dilation: 13
Filters: 512
Expanding the receptive field with dilation
23
T
512 channels
T` overlapping
30s-time frames
Output shape: B x 512 x T`
Image Reference
https://www.nature.com/articles/s41598-018-24304-3
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Summarizing the temporal content
84
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
256
256
T`
256
T`
256
T`
256
T`
256
1
Channel-wise softmax
Split in half
Channel-wise dot product
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Standardizing the embedding components
85
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Transposition invariance module
Expanding the receptive field
Summarizing the temporal content
Fully-connected layer
Input: B x 256
Output: B x d
23
T
1
d
Batch Normalization
Input: B x d
Output: B x d
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Example Systems: Embedding-based
Similarity-based training
86
Reference
Yesiler et al., Accurate and scalable version identification using musically-motivated embeddings. ICASSP 2020.
Anchor
Negative
Positive
Anchor
Negative
Positive
m
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Alignment vs. Embedding: Considerations
87
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Complementary
Ideas
88
4
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Motivation
89
Toto “Africa”
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Motivation
90
Run D.M.C. “Tricky”
Zappa Drum Covers
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Motivation
91
Reference
Yesiler et al., Da-TACOS: A dataset for cover song identification and understanding. ISMIR 2019.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
92
Reference
Ravuri & Ellis, Cover song detection: from high scores to general classification". ICASSP 2010
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
93
Reference
Salamon et al., Melody, bass line, and harmony representations for music version identification. AdMIRe 2012.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems / Early Fusion
94
Reference
Foucard et al., Multimodal similarity between musical streams for cover version detection. ICASSP 2010.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
95
Reference
Degani et al., A heuristic for distance fusion in cover song identification. WIAMIS 2013.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
96
Reference
Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.
1
2
Mean
Median
1.33
2.67
3
3.33
1
3
3
4
2.67
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
97
Reference
Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.
1
Concordant
Discordant
3/6
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
98
Reference
Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.
1
Concordant
Discordant
2/6
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
99
Reference
Osmalskyj et al., Enhancing cover song identification with hierarchical rank aggregation. ISMIR 2016.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems / Clique Enhancement
100
References
Wang et al., Similarity network fusion for aggregating data types on a genomic scale. NM 2013.
Wang et al., Unsupervised metric fusion bycross diffusion. CVPR 2012.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Ensemble Systems
101
References
Chen et al., Fusing similarity functions for cover song identification. MTA 2018.
Wang et al., Similarity network fusion for aggregating data types on a genomic scale. NM 2013.
Wang et al., Unsupervised metric fusion bycross diffusion. CVPR 2012.
QMax
DMax
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Early Fusion
102
References
Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.
Tralie & McFee, Enhanced hierarchical music structure annotations via feature level similarity fusion. ICASSP 2019.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Early Fusion
103
Reference
Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Early Fusion
104
Reference
Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Early Fusion
105
Reference
Tralie, Early MFCC and HPCP fusion for robust cover song identification. ISMIR 2017.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Pruning
106
References
Tralie & McFee, Enhanced hierarchical music structure annotations via feature level similarity fusion. ICASSP 2019.
Yesiler et al., Da-TACOS: A dataset for cover song identification and understanding. ISMIR 2019.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Pruning
107
References
Cai et al. Two-layer large-scale cover song identification system based on music structure segmentation. IEEE MMSP 2016.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Pruning
108
References
Osmalskyj et al., Efficient database pruning for large-scale cover song recognition. ICASSP 2013.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Pruning
109
Reference
Correya et al., Large-scale cover song detection in digital music libraries using metadata, lyrics and audio features. ISMIR 2018..
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Clique Enhancement
110
Reference
Heo et al., Cover song identification with metric learning using distance as a feature. ISMIR 2017.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Clique Enhancement
111
References
Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.
Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Clique Enhancement
112
References
Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.
Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Clique Enhancement
113
References
Serrà et al., Characterization and exploitation of community structure in cover song networks. PRL 2012.
Serrà et al., Unsupervised detection of cover song sets: accuracy improvement and original identification. ISMIR 2009.
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Current Challenges &
Future Directions
114
5
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Task definition
115
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Task definition
116
MAP
MR1
MRR
Top1
TP
FP
FN
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Task definition
117
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Improving systems
118
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Improving systems
119
d
Encoder
Input
d
Encoder
Input
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Improving systems
120
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Evaluation data
121
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Evaluation data
122
Image Reference
http://www.icrex.fi/popular-music-genres-world/
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Evaluation data
123
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Improving scalability
124
Image Reference
https://programmingblah.com/Big-O-Notation-Part-2/
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Improving scalability
125
Image Reference
https://towardsdatascience.com/the-quiet-semi-supervised-revolution-edec1e9ad8c
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Emphasis on application scenarios
126
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Emphasis on application scenarios
127
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Emphasis on application scenarios
128
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Open issues
Emphasis on application scenarios
129
Image Reference
https://www.researchgate.net/publication/277677598_Unsupervised_analysis_of_similarities_between_musicians_and_musical_genres_using_spectrograms
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Final Words
130
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
There is nothing that says a great song cannot be interpreted at any time in any way.
Phil Ramone
131
‘‘
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020
Q&A
132
F. Yesiler, C. Tralie, & J. Serrà. Version Identification in the 20s. ISMIR2020