Music recommendation System
B20204 - Jayantakumar
What is a recommendation system?
Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users.
Examples include Netflix for movie recommendations , Amazon for product recommendation , Spotify for music . Other types of recommendations might include news recommendation , Advertisement recommendation etc.
So how can we model something like a recommendation system? , like always the answer lies in two things , one the philosophy of similarity and two the underlying mathematics to use.
Philosophies and math!
Given a system of users and their preferences with respect to the items , we have two ways to approach things.
How do you get user preferences ?
Mathematica
The Utility matrix
The Utility matrix is a n*m matrix where n is the number of users and m is the number of items.
The entry Uij contains the rating of the j th item by the i th user.
User/Item | Biriyani | Paneer Tikka | Gulab Jamun | Ice Cream |
User 1 | 5 | | 4 | |
User 2 | | 3 | | 2 |
What is our Objective?
Our objective would be to predict the values of U_ij for a given user i , if we can come up with a way to fill the entries of this matrix , then we have figured a way to recommend items to our user.
In collaborative filtering we will do this process of filling the matrix by looking into the entries of other users and trying to find users with similar taste.
We will use the idea of matrix factorisation , and then by multiplying them will try to retune the factors so as to minimise the Root mean square error , in this manner we have an iterative heuristic that might help us fill the utility matrix.
Alternating least square model
This is exactly the idea behind Alternating least square model.
Scribed by Haoming Li, Bangzheng He, Michael Lublin, Yonathan Perez.
What have we done?
We have used the The Echonest Taste profile subset dataset.
How do we decide the hyperparameters?
We simply ran a cross validator with 16 different models and picked the one with the least RMSE.
For our model the configuration that we got was
We made a bad decision
Making Count as a direct parameter means that we are now biased towards recommending overly popular songs.
To make sure we don't fall into this trap , we normalise the play count of every song with its total listening count.
This improved our results by a huge margin!
An Example result
Top 10 recommendation by the model
+-----------+-----------+----------+--------------------+
| songId| userId| rating| songTitle|
+-----------+-----------+----------+--------------------+
| 618915878|-2125095252|0.46518123| Por quererte|
| -344769466|-2125095252|0.39064926| What|
|-1092166790|-2125095252|0.32093266| Besame|
| 174191907|-2125095252|0.27977067| Remember Me|
| -582761021|-2125095252|0.26700243| Under Pressure|
|-1407064031|-2125095252|0.26247233| Dile al amor|
| 1580252322|-2125095252|0.24478583|Guerrilla Monsoon...|
| 2109294155|-2125095252|0.24304876| My Little Red Book|
|-1375191242|-2125095252|0.23820923|Don't Cry (Original)|
| -265443618|-2125095252|0.21124649| Intermission 1|
+-----------+-----------+----------+--------------------+
Actual playCount of the user in real world
+-----------+-----------+---------+--------------------+
| songId| userId|playCount| songTitle|
+-----------+-----------+---------+--------------------+
| 618915878|-2125095252| 10| Por quererte|
| 1200624010|-2125095252| 6| Solo Dejate Amar|
| -344769466|-2125095252| 6| What|
|-1092166790|-2125095252| 6| Besame|
| 1580252322|-2125095252| 5|Guerrilla Monsoon...|
|-1407064031|-2125095252| 5| Dile al amor|
|-2024781075|-2125095252| 5|The Christmas Son...|
| 623026281|-2125095252| 4| Fruta Fresca|
| -569916139|-2125095252| 3| Fique Em Silencio|
| -14492029|-2125095252| 3|Ojalá que llueva ...|
+-----------+-----------+---------+--------------------+
Artist recommendation
The strategy used
Output Comparison
+-----------+----------+------------------+-------------+
| userId| rating| ArtistId| ArtistName|
+-----------+----------+------------------+-------------+
|-2130942721| 0.9124969|ARHAEXZ1187FB54F3C| N.E.R.D.|
|-2130942721| 0.8111042|ARBCZ031187B9B9CB9| Colossal|
|-2130942721|0.76041245|ARUJZFJ1187B9B135F| Shania Twain|
|-2130942721| 0.7055794|ARFCWSZ123526A0AFD|Justin Bieber|
|-2130942721| 0.7019551|AREHKZU122E5C4FC81| Mike Posner|
+-----------+----------+------------------+-------------+
Recommendation systems recommended Artists
+------------------+-----------+--------------+---------------+
| ArtistId| userId|sum(playCount)| ArtistName|
+------------------+-----------+--------------+---------------+
|AREHKZU122E5C4FC81|-2130942721| 7| Mike Posner|
|ARBCZ031187B9B9CB9|-2130942721| 6| Colossal|
|ARZO9UQ1187FB4D261|-2130942721| 3|Alliance Ethnik|
|ARUJZFJ1187B9B135F|-2130942721| 2| Shania Twain|
|ARHAEXZ1187FB54F3C|-2130942721| 1| N.E.R.D.|
|ARNSECX11E2835DB52|-2130942721| 1| Amity in fame|
|ARFCWSZ123526A0AFD|-2130942721| 1| Justin Bieber|
+------------------+-----------+--------------+---------------+
Users Actual Artists ordered by play count
Demo Time
Jayantakumar�b20204@students.iitmandi.ac.in
Complete set available at�https://github.com/jayantakumar/MusicRecommendationSystem
Shortcomings and Future
Way forward and questions that came up to me!
Resources that were extremely useful and recommended for you to look up!!
How Spotify used spark and als in its recommendation system back in 2014?
Music Recommendations at Scale with Spark - Christopher Johnson (Spotify)
Need to implement the math yourself , this is a beautiful article on ALS by victor.
https://medium.com/radon-dev/als-implicit-collaborative-filtering-5ed653ba39fe
Spark and ALS , and how to go about using it?
Need a Quick intro to the million songs dataset , look right here!
A Closer Look at the Million Song Dataset | by Jeremy | Modeling Music | Medium
Chapter on Collaborative filtering recommendation systems: (J. Ben Schafer, Dan Frankowski, Jon Herlocker & Shilad Sen )
https://link.springer.com/chapter/10.1007/978-3-540-72079-9_9
Some more links if you are interested in the way forward and into handling big data and music
Tons of learnings
Thank You!! , can ask some q/A if u have any 😀