1 of 25

Machine Learning for Music

Artist Similarity Mapping via Dimensionality Reduction

2 of 25

Background

3 of 25

Inspiration

Spotify AI provides customized song recommendations

Let's use machine learning to determine which artists are most similar to each other, based on patterns in their song data.

4 of 25

Research Plan

Objectives and Constraints

Goals / Objectives:

  1. Produce a two-dimensional mapping of artists, using audio data from Youtube
  2. Visually observe which artists are most similar (i.e. closest on the map)
  3. Provide recommendation list of similar artists

Assumptions / Constraints:

  • Use unsupervised machine learning methods only
  • Focus on a small number of songs and artists to start (because we can only download so much data from YouTube, and because we can only see so many points on a plot)

5 of 25

Methods

6 of 25

Step 1: Downloading Audio from YouTube

Collect music videos for a variety of artists (7 to 10 songs per artist)

Using Pytube package in Python: https://pytube.io/

7 of 25

Youtube Audio Dataset v1

Contains 206 videos, by 24 artists, spanning over 15 hours

artist_name

video_count

total_minutes

miles_davis

8

59.0

john_coltrane

7

51.5

taylor_swift

12

50.0

coldplay

12

49.5

ariana_grande

12

46.0

chris_stapleton

12

46.0

jay_z

11

46.0

bruce_springsteen

10

43.5

john_mayer

10

41.5

pink_floyd

7

41.5

beethoven

7

37.5

adele

8

37.0

artist_name

video_count

total_minutes

tupac

8

35.0

bach

8

35.0

dr_dre

8

34.5

maggie_rogers

9

34.0

led_zeppelin

7

33.5

alicia_keys

8

32.0

rihanna

7

31.5

jason_aldean

8

27.0

john_legend

7

27.0

andrea_bocelli

7

26.0

ac_dc

6

24.5

frank_sinatra

7

20.0

8 of 25

Step 2: Cutting Tracks

Choose uniform track length of 3 or 30 seconds

Final remainder discarded (variable length)

Using Librosa package in Python: https://librosa.org/

Track 1 (length s seconds)

Audio File

(variable full length)

Track 2 (length s seconds)

Track 3 (length s seconds)

9 of 25

Step 3: Encoding Audio Features

Choose number of MFCCs between 8 and 20 (e.g. 13)

Using Librosa package in Python: https://librosa.org/

feature encodings

Track 1 (length s seconds)

Track 2 (length s seconds)

Track 3 (length s seconds)

Track n (length s seconds)

. . .

10 of 25

Step 4: Dimensionality Reduction

Conduct PCA using 2 or 3 components

reduced "embeddings"

feature encodings

Using Scikit-learn package in Python: https://scikit-learn.org

11 of 25

Dimensionality Reduction Techniques

  • Principal Component Analysis (PCA)
    • Benefits: interpretable / explainable
    • Drawbacks: affected by outliers
  • T-distributed Stochastic Neighbor Embedding (T-SNE)
    • Benefits: more robust to outliers
    • Drawbacks: less interpretable / explainable
  • Uniform Manifold Approximation & Projection (UMAP)
    • Benefits: fast, scalable, may perform better on complex data
    • Drawbacks: less interpretable / explainable
  • etc.

12 of 25

Step 5: Plotting the Embeddings

Inspect the results to see which tracks and artists are most similar

Using Plotly package in Python: https://plotly.com/python

reduced "embeddings"

13 of 25

Step 6: Plotting the Centroids

Characterize the center point for each track or artist

Using Plotly package in Python: https://plotly.com/python

embedding centroids

(mean X and Y for all tracks)

14 of 25

Results / Demo!

  1. Methods Demo Notebook
  2. Results Repository:
  3. length_3_mfcc_13/pca_2.html
  4. length_30_mfcc_13/pca_2_centroids.html
  5. length_3_mfcc_13/pca_3_centroids.html

15 of 25

Results

30 second tracks: PCA

16 of 25

Results

30 second tracks: T-SNE

17 of 25

Results

30 second tracks: UMAP

18 of 25

Results

3 second tracks: PCA

19 of 25

Results

3 second tracks: T-SNE

20 of 25

Results

3 second tracks: UMAP

21 of 25

Conclusions

  • These methods can be used by music platforms to provide artist recommendations
  • Dimensionality reduction, when applied to audio features such as MFCCs, can detect which artists are related to each other
  • These methods, due to their unsupervised nature, can provide recommendations for new artists that may emerge over time, and may reduce manual effort and costs associated with human-provided recommendations

22 of 25

Thank you!

23 of 25

Extra Slides / Appendix

24 of 25

Contact

Presentation by: Michael Rossetti

Contact: Email | LinkedIn | GitHub

Interests:

  • Machine Learning (AI/ML)
  • Unsupervised Learning
  • Deep Learning
  • Music Information Retrieval

25 of 25

PCA

Tuning Number of Components