JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 25

Machine Learning for Music

Artist Similarity Mapping via Dimensionality Reduction

Michael Rossetti

https://github.com/s2t2/ml-music-2023/

2 of 25

Background

3 of 25

Inspiration

Spotify AI provides customized song recommendations

Let's use machine learning to determine which artists are most similar to each other, based on patterns in their song data.

4 of 25

Research Plan

Objectives and Constraints

Goals / Objectives:

Produce a two-dimensional mapping of artists, using audio data from Youtube
Visually observe which artists are most similar (i.e. closest on the map)
Provide recommendation list of similar artists

Assumptions / Constraints:

Use unsupervised machine learning methods only
Focus on a small number of songs and artists to start (because we can only download so much data from YouTube, and because we can only see so many points on a plot)

5 of 25

Methods

6 of 25

Step 1: Downloading Audio from YouTube

Collect music videos for a variety of artists (7 to 10 songs per artist)

Using Pytube package in Python: https://pytube.io/

7 of 25

Youtube Audio Dataset v1

Contains 206 videos, by 24 artists, spanning over 15 hours

artist_name	video_count	total_minutes
miles_davis	8	59.0
john_coltrane	7	51.5
taylor_swift	12	50.0
coldplay	12	49.5
ariana_grande	12	46.0
chris_stapleton	12	46.0
jay_z	11	46.0
bruce_springsteen	10	43.5
john_mayer	10	41.5
pink_floyd	7	41.5
beethoven	7	37.5
adele	8	37.0

artist_name	video_count	total_minutes
tupac	8	35.0
bach	8	35.0
dr_dre	8	34.5
maggie_rogers	9	34.0
led_zeppelin	7	33.5
alicia_keys	8	32.0
rihanna	7	31.5
jason_aldean	8	27.0
john_legend	7	27.0
andrea_bocelli	7	26.0
ac_dc	6	24.5
frank_sinatra	7	20.0

8 of 25

Step 2: Cutting Tracks

Choose uniform track length of 3 or 30 seconds

Final remainder discarded (variable length)

Using Librosa package in Python: https://librosa.org/

Track 1 (length s seconds)

Audio File

(variable full length)

Track 2 (length s seconds)

Track 3 (length s seconds)

9 of 25

Step 3: Encoding Audio Features

Choose number of MFCCs between 8 and 20 (e.g. 13)

Using Librosa package in Python: https://librosa.org/

feature encodings

Track 1 (length s seconds)

Track 2 (length s seconds)

Track 3 (length s seconds)

Track n (length s seconds)

. . .

MFCC for each track is a matrix where rows represent time (track length) and there is a column for each each coefficient. So we have to summarize the time dimension to get the average values for the entire track. We take the average and variance of each MFCC over time, arriving at two columns per MFCC.

FEATURES: 'tempo', 'chroma_stft_mean', 'chroma_stft_var', 'rms_mean', 'rms_var', 'spectral_centroid_mean', 'spectral_centroid_var', 'spectral_bandwidth_mean', 'spectral_bandwidth_var', 'spectral_rolloff_mean', 'spectral_rolloff_var', 'zero_crossing_rate_mean', 'zero_crossing_rate_var', 'tonnetz_mean', 'tonnetz_var', 'mfcc_1_mean', 'mfcc_1_var', 'mfcc_2_mean', 'mfcc_2_var', 'mfcc_3_mean', 'mfcc_3_var', 'mfcc_4_mean', 'mfcc_4_var', 'mfcc_5_mean', 'mfcc_5_var', 'mfcc_6_mean', 'mfcc_6_var', 'mfcc_7_mean', 'mfcc_7_var', 'mfcc_8_mean', 'mfcc_8_var', 'mfcc_9_mean', 'mfcc_9_var', 'mfcc_10_mean', 'mfcc_10_var', 'mfcc_11_mean', 'mfcc_11_var', 'mfcc_12_mean', 'mfcc_12_var', 'mfcc_13_mean', 'mfcc_13_var'

10 of 25

Step 4: Dimensionality Reduction

Conduct PCA using 2 or 3 components

reduced "embeddings"

feature encodings

Using Scikit-learn package in Python: https://scikit-learn.org

11 of 25

Dimensionality Reduction Techniques

Principal Component Analysis (PCA)

Benefits: interpretable / explainable
Drawbacks: affected by outliers

T-distributed Stochastic Neighbor Embedding (T-SNE)

Benefits: more robust to outliers
Drawbacks: less interpretable / explainable

Uniform Manifold Approximation & Projection (UMAP)

Benefits: fast, scalable, may perform better on complex data
Drawbacks: less interpretable / explainable

etc.

12 of 25

Step 5: Plotting the Embeddings

Inspect the results to see which tracks and artists are most similar

Using Plotly package in Python: https://plotly.com/python

reduced "embeddings"

13 of 25

Step 6: Plotting the Centroids

Characterize the center point for each track or artist

Using Plotly package in Python: https://plotly.com/python

embedding centroids

(mean X and Y for all tracks)

14 of 25

Results / Demo!

15 of 25

Results

30 second tracks: PCA

16 of 25

Results

30 second tracks: T-SNE

17 of 25

Results

30 second tracks: UMAP

18 of 25

Results

3 second tracks: PCA

19 of 25

Results

3 second tracks: T-SNE

20 of 25

Results

3 second tracks: UMAP

21 of 25

Conclusions

These methods can be used by music platforms to provide artist recommendations
Dimensionality reduction, when applied to audio features such as MFCCs, can detect which artists are related to each other
These methods, due to their unsupervised nature, can provide recommendations for new artists that may emerge over time, and may reduce manual effort and costs associated with human-provided recommendations

22 of 25

Thank you!

23 of 25

Extra Slides / Appendix

24 of 25

Contact

Presentation by: Michael Rossetti

Contact: Email | LinkedIn | GitHub

Interests:

Machine Learning (AI/ML)
Unsupervised Learning
Deep Learning
Music Information Retrieval

25 of 25

PCA

Tuning Number of Components