Recommendation System
CMSC 320 - Introduction to Data Science,
2025
OUTLINE
RECOMMENDER SYSTEMS: THE TASK
Plays an Justin Bieber song
What should we recommend next?
"Recommendation Engines" and "Recommendation Systems" are terms that are often used interchangeably,
What is a Recommender System?
Algorithms that recommends a particular product(s)/ service(s) to users they are likely to consume based on their preferences, behavior, or past interactions.
Goal: Answer "Will person X like product Y?
Examples
...pretty much everything
Netflix Prize Open AI Competition 2006:
held to improve its movie recommendation algorithm. It began in 2006 and involved a dataset of movie ratings by users. Participants were tasked with developing an algorithm that could predict user ratings for movies accurately.
https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data
Netflix Prize (2006): Competition to improve recommendation accuracy using user ratings.
HOW RECOMMENDER SYSTEMS WORK
AN EXAMPLE: Build a MOVIE RECOMMENDATION system
How to Measure User Preference? How do I know if someone liked something (products, movies etc.)?
Date needed: What will our data be?
What will our label be?
Explicit Feedback (Direct user input) Thumbs up/down, Star ratings (e.g., 1-5 stars), Written reviews.
Implicit Feedback (Indirect behavior): Watch time (% of movie watched), Re-watches (repeated views), Purchases/saves.
Feature(x): Movie attributes (genre, director, actors, runtime), vectorized in some way.
For a movie, we can compute all of these metrics and turn that or transform that movie into a vector. Example for Inception:
Action, Sci-Fi, Nolan, DiCaprio → [1, 1, 1, 1]
Labels (y): Ratings from the users Create class label: Liked / didn't like based on user ratings
OUR ALGORITHM
We can train a supervised learning model (e.g., logistic regression, random forest) using features like:
<Runtime, Genre, Budget, Actors, Director> → Thumbs_Up (Yes/No)
Once trained, when we get a new item with features:
<Runtime, Genre, Budget, Actors, Director>
the model predicts whether the user would give it a thumbs up (i.e., find it relevant).
This is characteristic of a content-based recommendation system (NEXT TOPIC) using a supervised learning approach.
Where we use item features and past user feedback to make personalized predictions.
PARADIGMS OF RECOMMENDER SYSTEMS
PARADIGMS OF RECOMMENDER SYSTEMS
Content based approaches (Use prior information about users and/or items)
PARADIGMS OF RECOMMENDER SYSTEMS
Collaborative approaches (Solely rely on the user-item interaction matrix.)
PARADIGMS OF RECOMMENDER SYSTEMS
Hybrid approaches (Combine multiple recommendation approaches.)
RECOMMENDATION SYSTEM
CONTENT-BASED RECOMMENDATION
Goal: Predict what a user will like based on their past likes and item features.
(popular and widely used approach to provide personalized recommendations to users.)�
Requires:
MAIN IDEA: Predict what a user will like based on their past likes and item features.
What it uses: The features of the item (e.g., genre, category, keywords).
How it works: Recommends items similar to what the user liked before.�
CONTENT-BASED RECOMMENDATION: Leveraging Item Features and User Profiles
GOAL: Recommend items to a user that are similar to items that they have previously interacted with (liked before).
Recommend items to customer x similar to previous items rated highly by x.
Movie recommendations:
Recommend movies with same actor(s), director, genre, …
Websites, blogs, news
Recommend other sites with similar types or words
CONTENT-BASED RECOMMENDATION
Content based Recommendation REQUIRE information about the items.
What do we need:
The Task:
PROCESS OF CONTENT-BASED RECOMMENDATION
Key Point: "Show users items like what they’ve already enjoyed."
if a set of users likes action movies by Jackie Chan, this algorithm may recommend the movies having the below characteristics.
EXAMPLE: CONTENT-BASED RECOMMENDATION
User Likes → [Action, Jackie Chan] → Recommends "Rush Hour" (Action, Jackie Chan)
Pros and Cons: Content Based Systems
Pros | Cons |
No Need for Other Users: Works independently of other users’ data� Personalized: Recommends items based on unique user tastes. Support New/Unpopular Items: Can recommend niche items� Explainable: Easy to justify recommendations based on item features Privacy-Friendly: Doesn’t require aggregating data across users. | Feature Selection is Hard: Choosing meaningful item features is challenging� Cold Start for Users: Struggles with new users and sparse histories� Limited Diversity: Recommends similar items repeatedly, may create a "filter bubble" (only suggests items too similar to past likes).� Missed Opportunities: Ignores items outside user’s known preferences. Cannot recommend items lacking sufficient features (e.g., new movies with no tags).� Maintenance Overhead: Needs retraining as user tastes evolve or new items are added. |
(2) COLLABORATIVE FILTERING
- BASED
RECOMMENDATION SYSTEM
COLLABORATIVE FILTERING
What to recommend?
“Find similar users and recommend items that they like”!
Insight: People who are similar may enjoy similar things.
Remember, unlike content-based recommendation (use additional information or features about users and/or items) Collaborative filtering does NOT require any information/features about the items.
COLLABORATIVE FILTERING
COLLABORATIVE FILTERING
In collaborative filtering, a recommendation system recommends a user the products on the basis of the preferences of the other users with similar tastes.
if you are listening to music on Spotify, then it is likely that the music liked by the other users with the similar taste will be suggested to you.
COLLABORATIVE FILTERING: TYPES
Two types:
User-based : Finds users similar to the target user and recommends items those similar users liked. | Item-based: Finds items similar to the items the target user liked and recommends those similar items. |
Calculate similarity between users based on their item ratings or interactions. | Calculate similarity between items based on users’ ratings or interactions. |
COLLABORATIVE FILTERING: TYPES
User-based nearest-neighbor collaborative filtering:
Recommendations are made based on the preferences and behaviors of similar users (users ‘N’ with similar preferences to a target user ‘I’).
Idea: “users who have similar tastes in the past will continue to have similar tastes in the future”.
COLLABORATIVE FILTERING: TYPES
User-based nearest-neighbor collaborative filtering:
Example: If User A and User B have similar preferences,
“John (User A) receives recommendations for movies that Harry (User B) has enjoyed, as they share similar movie preferences.”
COLLABORATIVE FILTERING: TYPES
User-based nearest-neighbor collaborative filtering:
Example 2: John (User A) receives recommendations for energy drinks, inspired by Harry (User B) choice, as they both share a fondness for fitness-related items like pie and protein salad
Question: How do we find people who are similar?
OVERVIEW OF THE STEPS: USER-BASED COLLABORATIVE FILTERING
Refine and Update based on new user feedback over time.
(Users → Match → Predict → Recommend)
Introducing concept of “Utility Matrix” or “User-Item Matrix”
We have two entities: N = set of Users , M = set of Items
A utility matrix is the matrix that captures interactions between N users and M items
Utility function u: N × M → R
LET’S SEE AN EXAMPLE WITH STEPS
Problem Statement: I need to know what movie to recommend to my users?
Step: Build a User-Item Matrix
Key idea: Past similar preferences help predict future ones.
Users rate movies from 1 (awful) to 5 (loved it)—not every movie is rated.
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice | 4 | | | 5 | 1 | | |
Ben | 5 | 5 | 4 | | | | |
Clif (C) | | 2 | 1 | 2 | 4 | 5 | |
David(D) | 1 | | 1 | 1 | 1 | 5 | |
*HP = Harry Potter
*TW = Twilight
*SW = Star Wars
Q: What should I recommend today to the user Alice?
Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]
VECTORIZE USERS: USER-ITEM MATRIX
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice | 4 | | | 5 | 1 | | |
Ben | 5 | 5 | 4 | | | | |
Clif (C) | | 2 | 1 | 2 | 4 | 5 | |
David(D) | 1 | | 1 | 1 | 1 | 5 | |
Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]
VECTORIZE USERS: USER-ITEM MATRIX
Rating vector for Alice
Next Question is: Which user (Ben, or Clif or David) is more similar to Alice?
Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice | 4 | | | 5 | 1 | | |
Ben | 5 | 5 | 4 | | | | |
Clif (C) | | 2 | 1 | 2 | 4 | 5 | |
David(D) | 1 | | 1 | 1 | 1 | 5 | |
Let’s say, The size of the neighbourhood, |N| =2 (we will consider the two most similar users)
Step : Calculate Similarity between users based on preferences and Select Neighborhood of most similar users.
Find Similar Users
Ques: What is the intuition from sim(A,B) vs sim(A,C) vs Sim(A,D) from above?
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice(A) | 4 | | | 5 | 1 | | |
Ben(B) | 5 | 5 | 4 | | | | |
Clif (C) | | 2 | 1 | 2 | 4 | 5 | |
David(D) | 1 | | 1 | 1 | 1 | 5 | |
Find Similar Users: our intuition?
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice (A) | 4 | | | 5 | 1 | | |
Ben (B) | 5 | 5 | 4 | | | | |
Clif (C) | | 2 | 1 | 2 | 4 | 5 | |
David(D) | 1 | | 1 | 1 | 1 | 5 | |
Intuition from above: sim(A,B) > sim(A,C)
HOW DO WE FIND SIMILARITY?
HOW DO WE FIND SIMILARITY?: SOME WAYS
OPTION 1: JACCARD SIMILARITY
sim(A,B) = |ra ∩ rb | / |ra∪ rb |
= 1 / 5 =0.2
sim(A,C) = 2/6 =0.33
sim(A,D)= 3/5 =0.6
OPTION 2: COSINE SIMILARITY
Think of “empty” as 0
0.305
sim(A,D)= 0.286 (we will ignore from the rest of the calculations just for simplicity)
OPTION 2: COSINE SIMILARITY
Sim (A,B)=0.38, Sim(A,C) =0.31
OPTION 2: COSINE SIMILARITY
Sim (A,B)=0.38, Sim(A,C) =0.31
Try:
OPTION 3: CENTERED COSINE (Pearson Correlation)
Step B: Normalize “Actual Ratings” by Subtracting Row Means from each of their rated items.
→ row mean 10/3
→ row mean 14/3
→ row mean 14/5
| HP1 | HP2 | HP3 | TW | SW1 | SW2 | SW3 |
Alice (A) | 2/3 | | | 5/3 | -7/3 | | |
Ben (B) | 1/3 | 1/3 | -2/3 | | | | |
Clif (C) | | -4/5 | -9/5 | -4/5 | 6/5 | 11/55 | |
Step A: Calculate “Mean” of individual Rows
Apply CENTERED COSINE (Pearson Correlation)
Calculate:
Observe:
Now A and C are (correctly) way further apart than A and B
Step: Predict Ratings and Recommendations
Leveraging KNN for Rating Prediction
NEXT:
Predict the user's rating for an item i based on the weighted average of the ratings of those k nearest neighbors.
PREDICT AND RECOMMEND
Given,
Predict: Which movie to Recommend to User A next? HP2 or HP3?
Predict the user's rating based on the weighted average of those users
Prediction for HP2: PA,HP2= [S(A,B)*5 + S(A,C)*2] / S(A,B) + S(A,C) = 1.42
Same: PA,HP3= [S(A,B)*4 + S(A,C)*1] / S(A,B) + S(A,C) = 0.42
So, we will recommend HP2 to Alice next
End of Slide
Next Class: