Machine Learning for Music
Artist Similarity Mapping via Dimensionality Reduction
Background
Inspiration
Spotify AI provides customized song recommendations
Let's use machine learning to determine which artists are most similar to each other, based on patterns in their song data.
Research Plan
Objectives and Constraints
Goals / Objectives:
Assumptions / Constraints:
Methods
Step 1: Downloading Audio from YouTube
Collect music videos for a variety of artists (7 to 10 songs per artist)
Using Pytube package in Python: https://pytube.io/
Youtube Audio Dataset v1
Contains 206 videos, by 24 artists, spanning over 15 hours
artist_name | video_count | total_minutes |
miles_davis | 8 | 59.0 |
john_coltrane | 7 | 51.5 |
taylor_swift | 12 | 50.0 |
coldplay | 12 | 49.5 |
ariana_grande | 12 | 46.0 |
chris_stapleton | 12 | 46.0 |
jay_z | 11 | 46.0 |
bruce_springsteen | 10 | 43.5 |
john_mayer | 10 | 41.5 |
pink_floyd | 7 | 41.5 |
beethoven | 7 | 37.5 |
adele | 8 | 37.0 |
artist_name | video_count | total_minutes |
tupac | 8 | 35.0 |
bach | 8 | 35.0 |
dr_dre | 8 | 34.5 |
maggie_rogers | 9 | 34.0 |
led_zeppelin | 7 | 33.5 |
alicia_keys | 8 | 32.0 |
rihanna | 7 | 31.5 |
jason_aldean | 8 | 27.0 |
john_legend | 7 | 27.0 |
andrea_bocelli | 7 | 26.0 |
ac_dc | 6 | 24.5 |
frank_sinatra | 7 | 20.0 |
Step 2: Cutting Tracks
Choose uniform track length of 3 or 30 seconds
Final remainder discarded (variable length)
Using Librosa package in Python: https://librosa.org/
Track 1 (length s seconds)
Audio File
(variable full length)
Track 2 (length s seconds)
Track 3 (length s seconds)
Step 3: Encoding Audio Features
Choose number of MFCCs between 8 and 20 (e.g. 13)
Using Librosa package in Python: https://librosa.org/
feature encodings
Track 1 (length s seconds)
Track 2 (length s seconds)
Track 3 (length s seconds)
Track n (length s seconds)
. . .
Step 4: Dimensionality Reduction
Conduct PCA using 2 or 3 components
reduced "embeddings"
feature encodings
Using Scikit-learn package in Python: https://scikit-learn.org
Dimensionality Reduction Techniques
Step 5: Plotting the Embeddings
Inspect the results to see which tracks and artists are most similar
Using Plotly package in Python: https://plotly.com/python
reduced "embeddings"
Step 6: Plotting the Centroids
Characterize the center point for each track or artist
Using Plotly package in Python: https://plotly.com/python
embedding centroids
(mean X and Y for all tracks)
Results / Demo!
Results
30 second tracks: PCA
Results
30 second tracks: T-SNE
Results
30 second tracks: UMAP
Results
3 second tracks: PCA
Results
3 second tracks: T-SNE
Results
3 second tracks: UMAP
Conclusions
Thank you!
Extra Slides / Appendix
Contact
Presentation by: Michael Rossetti
Contact: Email | LinkedIn | GitHub
Interests:
PCA
Tuning Number of Components