2 of 22

What is a recommendation system?

Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users.

Examples include Netflix for movie recommendations , Amazon for product recommendation , Spotify for music . Other types of recommendations might include news recommendation , Advertisement recommendation etc.

So how can we model something like a recommendation system? , like always the answer lies in two things , one the philosophy of similarity and two the underlying mathematics to use.

3 of 22

Philosophies and math!

Given a system of users and their preferences with respect to the items , we have two ways to approach things.

Recommend based on similarity between users - Collaborative
Recommend based on similarity between items - Content based

How do you get user preferences ?

Explicit - ask them for ratings
Implicit - extract implicit ratings by using metadata

5 of 22

The Utility matrix

The Utility matrix is a n*m matrix where n is the number of users and m is the number of items.

The entry U_ijcontains the rating of the j th item by the i th user.

User/Item	Biriyani	Paneer Tikka	Gulab Jamun	Ice Cream
User 1	5		4
User 2		3		2

6 of 22

What is our Objective?

Our objective would be to predict the values of U_ij for a given user i , if we can come up with a way to fill the entries of this matrix , then we have figured a way to recommend items to our user.

In collaborative filtering we will do this process of filling the matrix by looking into the entries of other users and trying to find users with similar taste.

We will use the idea of matrix factorisation , and then by multiplying them will try to retune the factors so as to minimise the Root mean square error , in this manner we have an iterative heuristic that might help us fill the utility matrix.

7 of 22

Alternating least square model

This is exactly the idea behind Alternating least square model.

8 of 22

Scribed by Haoming Li, Bangzheng He, Michael Lublin, Yonathan Perez.

9 of 22

What have we done?

We have used the The Echonest Taste profile subset dataset.

The dataset contains real user - play counts from undisclosed partners, all songs already matched to the MSD.
The Echo Nest Taste profile subset, the official user data collection for the Million Song
Dataset, available at: http://millionsongdataset.com/tasteprofile
The first hand idea was to use the play count as an implicit rating by the user

10 of 22

How do we decide the hyperparameters?

We simply ran a cross validator with 16 different models and picked the one with the least RMSE.

For our model the configuration that we got was

11 of 22

We made a bad decision

Making Count as a direct parameter means that we are now biased towards recommending overly popular songs.

To make sure we don't fall into this trap , we normalise the play count of every song with its total listening count.

This improved our results by a huge margin!

12 of 22

An Example result

Top 10 recommendation by the model

+-----------+-----------+----------+--------------------+

+-----------+-----------+----------+--------------------+

| 618915878|-2125095252|0.46518123| Por quererte|

| -344769466|-2125095252|0.39064926| What|

|-1092166790|-2125095252|0.32093266| Besame|

| 174191907|-2125095252|0.27977067| Remember Me|

| -582761021|-2125095252|0.26700243| Under Pressure|

|-1407064031|-2125095252|0.26247233| Dile al amor|

| 1580252322|-2125095252|0.24478583|Guerrilla Monsoon...|

| 2109294155|-2125095252|0.24304876| My Little Red Book|

|-1375191242|-2125095252|0.23820923|Don't Cry (Original)|

| -265443618|-2125095252|0.21124649| Intermission 1|

+-----------+-----------+----------+--------------------+

Actual playCount of the user in real world

+-----------+-----------+---------+--------------------+

+-----------+-----------+---------+--------------------+

| 618915878|-2125095252| 10| Por quererte|

| 1200624010|-2125095252| 6| Solo Dejate Amar|

| -344769466|-2125095252| 6| What|

|-1092166790|-2125095252| 6| Besame|

| 1580252322|-2125095252| 5|Guerrilla Monsoon...|

|-1407064031|-2125095252| 5| Dile al amor|

|-2024781075|-2125095252| 5|The Christmas Son...|

| 623026281|-2125095252| 4| Fruta Fresca|

| -569916139|-2125095252| 3| Fique Em Silencio|

| -14492029|-2125095252| 3|Ojalá que llueva ...|

+-----------+-----------+---------+--------------------+

13 of 22

Artist recommendation

14 of 22

The strategy used

We did a very similar process to that of song recommendation
Used PySpark to preprocess the data so as to get artist and user relations
Groupby s and little bit of filtering boils us down to the Utility matrix of the Artist recommendation system.

16 of 22

Output Comparison

+-----------+----------+------------------+-------------+

+-----------+----------+------------------+-------------+

|-2130942721| 0.9124969|ARHAEXZ1187FB54F3C| N.E.R.D.|

|-2130942721| 0.8111042|ARBCZ031187B9B9CB9| Colossal|

|-2130942721|0.76041245|ARUJZFJ1187B9B135F| Shania Twain|

|-2130942721| 0.7055794|ARFCWSZ123526A0AFD|Justin Bieber|

|-2130942721| 0.7019551|AREHKZU122E5C4FC81| Mike Posner|

+-----------+----------+------------------+-------------+

Recommendation systems recommended Artists

+------------------+-----------+--------------+---------------+

+------------------+-----------+--------------+---------------+

|AREHKZU122E5C4FC81|-2130942721| 7| Mike Posner|

|ARBCZ031187B9B9CB9|-2130942721| 6| Colossal|

|ARZO9UQ1187FB4D261|-2130942721| 3|Alliance Ethnik|

|ARUJZFJ1187B9B135F|-2130942721| 2| Shania Twain|

|ARHAEXZ1187FB54F3C|-2130942721| 1| N.E.R.D.|

|ARNSECX11E2835DB52|-2130942721| 1| Amity in fame|

|ARFCWSZ123526A0AFD|-2130942721| 1| Justin Bieber|

+------------------+-----------+--------------+---------------+

Users Actual Artists ordered by play count

17 of 22

Demo Time

Jayantakumar�b20204@students.iitmandi.ac.in

Complete set available at�https://github.com/jayantakumar/MusicRecommendationSystem

18 of 22

Shortcomings and Future

Tried to do it in flink
Flink great for data streams , but was a pain to set up
They had a ALS package is earlier versions , but have moved the ML packages into a separate repo and is not included in the normal installation version
We tried going till trying to manually compile their old ML libraries for use , and was successful to some extent , but a lot of interdependencies between new versions and old versions meant we could not do achieve it.
But lot of learnings from the process nevertheless. 😀
So how can we improve this further.

19 of 22

Way forward and questions that came up to me!

What we have done is a extremely simple recommendation system that uses Matrix factorisation and a very crude model for the rating system in our case the play count itself
Building novel Recommendation and similarity feature set : we can go deeper in developing a rating system that takes in account not only the play count , but the play time , regional and seasonal trends and other meta factors to come up with a better recommendation system.
Heterogeneous systems are the way forward ( leading firms like Spotify , Amazon are going that path ).
Podcasts and how do you go about dealing with them ? different type of content and is not replayed often , so how do you implicitly try to model them?
Implement a clustering based recommender : Spotify does make playlists out of mood , genre and region and has a system that recommends user the right group , this needs content metadata.
Doing the reverse : Item based Collaborative filtering (Item-based collaborative filtering recommendation algorithms , paper on this idea (University of Minnesota) ), curated playlists.

20 of 22

Resources that were extremely useful and recommended for you to look up!!

How Spotify used spark and als in its recommendation system back in 2014?

Music Recommendations at Scale with Spark - Christopher Johnson (Spotify)

Need to implement the math yourself , this is a beautiful article on ALS by victor.

https://medium.com/radon-dev/als-implicit-collaborative-filtering-5ed653ba39fe

Spark and ALS , and how to go about using it?

https://medium.com/rahasak/collaborative-filtering-based-book-recommendation-system-with-spark-ml-and-scala-1e5980ceba5e

Need a Quick intro to the million songs dataset , look right here!

A Closer Look at the Million Song Dataset | by Jeremy | Modeling Music | Medium

Chapter on Collaborative filtering recommendation systems: (J. Ben Schafer, Dan Frankowski, Jon Herlocker & Shilad Sen )

https://link.springer.com/chapter/10.1007/978-3-540-72079-9_9

21 of 22

Some more links if you are interested in the way forward and into handling big data and music

Improving music recommendation by incorporating social influence ,Jinpeng Chen, Pinguang Ying & Ming Zou ( Recommendation systems on top of social networks)

Improving music recommendation by incorporating social influence | SpringerLink

Spotify and the democratisation of music - Thomas Hodgson ( Part of my term paper , great read on the social dynamics associated with developing a large scale recommendation system )
Writing a Big Data history of music - Stephen Rose, Sandra Tuppen, Loukia Drosopoulou - Quantitative analysis of Ethnomusicology and how it reveals cultural transitions.

1 of 22

2 of 22

3 of 22

4 of 22

5 of 22

6 of 22

7 of 22

8 of 22

9 of 22

10 of 22

11 of 22

12 of 22

13 of 22

14 of 22

15 of 22

16 of 22

17 of 22

18 of 22

19 of 22

20 of 22

21 of 22

22 of 22