Invariant Audio Prints �for Music Indexing and Alignment
Rémi Mignot¹, Geoffroy Peeters²
¹ STMS Lab – IRCAM, Sorbonne Université, CNRS (UMR-9912), Paris, France
² LTCI - Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France
CBMI 2024
21st International Conference on Content-based Multimedia Indexing, �September 18-20, Reykjavik, Iceland
test
Introduction
Find the “reference song” from a music catalog based on the signal content of a given audio excerpt
+ metadata
→ time stretching, pitch shifting, noise addition, distortion, audio effects, and different instruments (for alignment)
query
catalog/ database / table
output
Different instruments & pitch shifting
Search the time mapping between two occurrences of the same music (covers e.g.)
Original
Modified �tempo
Original
Degraded
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
HD-Key
Reduction
Hashing
Search
Time �alignment
(2) Dimension reduction (40) → learning of a linear projection robust to degradations,
Method overview
Derivation of codes that are: ✓ robust to transformations / degradations and
✓ relevant to the musical content (unlike spectrogram peak-pairs methods)
x(t)
Audio �signal
query
fn
High-dimensional
descriptors
vn
Reduced
descriptors
hn
Hash
codes
hn
Query
h*n (found reference)
Reference:
- metadata
- time position
- stretch factor
- time mapping
(1)
(2)
(3)
(4)
Hash table
briefly presented here
(1) High-dimensional audio keys (1056) → design of audio descriptors robust to some transformations,
(4) Time alignment → DTW-based alignment to estimate the time mapping.
(3) Hashing → hash codes tolerant to bit corruption (LSH-based),
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
High-dimensional Audio Keys
(1)
→ The descriptors are robust by design to:
Pitch and time changes, and noise, filtering,
→ dimension 1056…
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
Robust dimensional reduction
(2)
Dimensions:
1056
1026
80
80
40
40
ICCR
LDA
ICA
HT
HD-Key
x(t)
vn
OMPCA
ICCR
LDA
ICA
xi(t)
HD-Key
Audio
Effects
OMPCA
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
Two experiments (see the paper)
Original medley
Transformed medley
+ realigned medley (right channel)
Original synthesized MIDI song
Transformed MIDI song
+ realigned MIDI (right channel)
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
Bonus experiment
Acoustic guitar + voice cover
“Little Wing” by Corey Heuvel
Tempo: ~60 BPM
with accelerations/decelerations,
some longer transitions,
quite different scores
but respected structure (at the beginning)
Original recording
“Little Wing” (Jimi Hendrix)
Tempo: ~70 BPM
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
Bonus experiment
left channel: unchanged cover sound
right channel: synchronized original recording
all channels: synchronized original recording
Remark: longer transitions, e.g. between the 2nd verse and the solo, at 1:40.
→ the original is strongly stretched
R. Mignot & G. Peeters, “Invariant Audio Prints for Music Indexing and Alignment,” CBMI 2024
THANK YOU
For more details: