1 of 50

Recommendation System

Fardina Fathmiul Alam

CMSC 320 - Introduction to Data Science,

2025

2 of 50

OUTLINE

What are Recommender Systems ?
Content Based Filtering
Collaborative Filtering

3 of 50

RECOMMENDER SYSTEMS: THE TASK

Plays an Justin Bieber song

What should we recommend next?

"Recommendation Engines" and "Recommendation Systems" are terms that are often used interchangeably,

4 of 50

What is a Recommender System?

Algorithms that recommends a particular product(s)/ service(s) to users they are likely to consume based on their preferences, behavior, or past interactions.

Goal: Answer "Will person X like product Y?

5 of 50

Examples

Netflix (movie/show recommendations)
Amazon (product recommendations)
Spotify (music recommendations)
YouTube (video suggestions)
Facebook/Instagram (friend or content suggestions)

...pretty much everything

6 of 50

Netflix Prize Open AI Competition 2006:

held to improve its movie recommendation algorithm. It began in 2006 and involved a dataset of movie ratings by users. Participants were tasked with developing an algorithm that could predict user ratings for movies accurately.

https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data

Netflix Prize (2006): Competition to improve recommendation accuracy using user ratings.

7 of 50

HOW RECOMMENDER SYSTEMS WORK

Input: User data (ratings, preferences) + item features.
Process: Compute relevance scores (e.g., similarity, predictions).
Output: Ranked list of items (context-aware, diverse).

8 of 50

AN EXAMPLE: Build a MOVIE RECOMMENDATION system

How to Measure User Preference? How do I know if someone liked something (products, movies etc.)?

Date needed: What will our data be?

What will our label be?

Explicit Feedback (Direct user input) Thumbs up/down, Star ratings (e.g., 1-5 stars), Written reviews.

Implicit Feedback (Indirect behavior): Watch time (% of movie watched), Re-watches (repeated views), Purchases/saves.

Feature(x): Movie attributes (genre, director, actors, runtime), vectorized in some way.

For a movie, we can compute all of these metrics and turn that or transform that movie into a vector. Example for Inception:

Action, Sci-Fi, Nolan, DiCaprio → [1, 1, 1, 1]

Labels (y): Ratings from the users Create class label: Liked / didn't like based on user ratings

9 of 50

OUR ALGORITHM

We can train a supervised learning model (e.g., logistic regression, random forest) using features like:

<Runtime, Genre, Budget, Actors, Director> → Thumbs_Up (Yes/No)

Once trained, when we get a new item with features:

<Runtime, Genre, Budget, Actors, Director>

the model predicts whether the user would give it a thumbs up (i.e., find it relevant).

This is characteristic of a content-based recommendation system (NEXT TOPIC) using a supervised learning approach.

Where we use item features and past user feedback to make personalized predictions.

10 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Content based approaches
Collaborative approaches
Hybrid systems, and more.

11 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Content based approaches (Use prior information about users and/or items)

Techniques include regression or classification models.
Recommendations are based on the attributes or characteristics of items and the preferences of users.

12 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Collaborative approaches (Solely rely on the user-item interaction matrix.)

Techniques include user-user, item-item and matrix factorisation.
User-User: Recommendations based on similar users' preferences.
Item-Item: Recommendations based on similar items.
Matrix Factorization: Decompose the user-item interaction matrix into lower-dimensional matrices to capture latent factors.

13 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Hybrid approaches (Combine multiple recommendation approaches.)

Leverage the strengths of both content-based and collaborative methods.matrices to capture latent factors.

14 of 50

CONTENT - BASED

RECOMMENDATION SYSTEM

15 of 50

CONTENT-BASED RECOMMENDATION

Goal: Predict what a user will like based on their past likes and item features.

(popular and widely used approach to provide personalized recommendations to users.)�

Requires:

Item information (e.g., genre, director)
User profile (preferences)

MAIN IDEA: Predict what a user will like based on their past likes and item features.

What it uses: The features of the item (e.g., genre, category, keywords).

How it works: Recommends items similar to what the user liked before.�

16 of 50

CONTENT-BASED RECOMMENDATION: Leveraging Item Features and User Profiles

GOAL: Recommend items to a user that are similar to items that they have previously interacted with (liked before).

Recommend items to customer x similar to previous items rated highly by x.

Movie recommendations:

Recommend movies with same actor(s), director, genre, …

Websites, blogs, news

Recommend other sites with similar types or words

17 of 50

CONTENT-BASED RECOMMENDATION

Content based Recommendation REQUIRE information about the items.

What do we need:

Some information about the available items such as the genre ("content")
Some sort of user profile describing what the user likes (the preferences)

The Task:

Learn user preferences
Locate/recommend items that are "similar" to the user preferences.

18 of 50

PROCESS OF CONTENT-BASED RECOMMENDATION

Featurize Items

Convert each item into a vector (e.g., genre, director, actors).

Calculate Similarity

Compare vectors using cosine similarity (or other metrics - Euclidean Distance, Manhattan Distance, Jaccard Distance etc.)

Learn User Preferences

Build a taste profile from features of items the user liked in the past.

Recommend

Suggest items with high similarity to the user’s profile.

Key Point: "Show users items like what they’ve already enjoyed."

19 of 50

if a set of users likes action movies by Jackie Chan, this algorithm may recommend the movies having the below characteristics.

Having a genre of action
Having Jackie Chan in the cast.

EXAMPLE: CONTENT-BASED RECOMMENDATION

User Likes → [Action, Jackie Chan] → Recommends "Rush Hour" (Action, Jackie Chan)

20 of 50

Pros and Cons: Content Based Systems

Pros	Cons
No Need for Other Users: Works independently of other users’ data� Personalized: Recommends items based on unique user tastes. Support New/Unpopular Items: Can recommend niche items� Explainable: Easy to justify recommendations based on item features Privacy-Friendly: Doesn’t require aggregating data across users.	Feature Selection is Hard: Choosing meaningful item features is challenging� Cold Start for Users: Struggles with new users and sparse histories� Limited Diversity: Recommends similar items repeatedly, may create a "filter bubble" (only suggests items too similar to past likes).� Missed Opportunities: Ignores items outside user’s known preferences. Cannot recommend items lacking sufficient features (e.g., new movies with no tags).� Maintenance Overhead: Needs retraining as user tastes evolve or new items are added.

21 of 50

(2) COLLABORATIVE FILTERING

- BASED

RECOMMENDATION SYSTEM

22 of 50

COLLABORATIVE FILTERING

What to recommend?

“Find similar users and recommend items that they like”!

Use other people’s ratings to help rank and recommend things to other people

Insight: People who are similar may enjoy similar things.

23 of 50

Remember, unlike content-based recommendation (use additional information or features about users and/or items) Collaborative filtering does NOT require any information/features about the items.

COLLABORATIVE FILTERING

24 of 50

COLLABORATIVE FILTERING

In collaborative filtering, a recommendation system recommends a user the products on the basis of the preferences of the other users with similar tastes.

if you are listening to music on Spotify, then it is likely that the music liked by the other users with the similar taste will be suggested to you.

25 of 50

COLLABORATIVE FILTERING: TYPES

Two types:

User-based nearest-neighbor collaborative filtering
Item-based collaborative filtering

User-based : Finds users similar to the target user and recommends items those similar users liked.	Item-based: Finds items similar to the items the target user liked and recommends those similar items.
Calculate similarity between users based on their item ratings or interactions.	Calculate similarity between items based on users’ ratings or interactions.

26 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Recommendations are made based on the preferences and behaviors of similar users (users ‘N’ with similar preferences to a target user ‘I’).

Idea: “users who have similar tastes in the past will continue to have similar tastes in the future”.

27 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Example: If User A and User B have similar preferences,

Items liked or recommended by User B BUT NOT YET SEEN by User A might be suggested to User A.

“John (User A) receives recommendations for movies that Harry (User B) has enjoyed, as they share similar movie preferences.”

28 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Example 2: John (User A) receives recommendations for energy drinks, inspired by Harry (User B) choice, as they both share a fondness for fitness-related items like pie and protein salad

Question: How do we find people who are similar?

29 of 50

OVERVIEW OF THE STEPS: USER-BASED COLLABORATIVE FILTERING

Collect User Interaction Data (e.g., ratings, views, likes)
Build a User-Item Matrix to represent interactions (who liked what).
Calculate Similarity between users based on preferences.
Select Neighborhood of most similar users
Predict Ratings for unseen items using neighbors’ data
Recommend Top-N Items with highest predicted ratings

Refine and Update based on new user feedback over time.

(Users → Match → Predict → Recommend)

30 of 50

Introducing concept of “Utility Matrix” or “User-Item Matrix”

We have two entities: N = set of Users , M = set of Items

A utility matrix is the matrix that captures interactions between N users and M items

Utility function u: N × M → R

R = set of ratings
R is a totally ordered set
e.g., 1-5 stars, real number in [0,1]

31 of 50

LET’S SEE AN EXAMPLE WITH STEPS

Problem Statement: I need to know what movie to recommend to my users?

32 of 50

Step: Build a User-Item Matrix

33 of 50

Key idea: Past similar preferences help predict future ones.

Users rate movies from 1 (awful) to 5 (loved it)—not every movie is rated.

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice	4			5	1
Ben	5	5	4
Clif (C)		2	1	2	4	5
David(D)	1		1	1	1	5

*HP = Harry Potter

*TW = Twilight

*SW = Star Wars

Q: What should I recommend today to the user Alice?

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

34 of 50

VECTORIZE USERS: USER-ITEM MATRIX

Users = Rows in the utility matrix
Similar users have similar rating patterns (vectors)

IDEA: Two users are similar if their vectors are similar!

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice	4			5	1
Ben	5	5	4
Clif (C)		2	1	2	4	5
David(D)	1		1	1	1	5

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

35 of 50

VECTORIZE USERS: USER-ITEM MATRIX

Consider user x and y with rating vector r_x and r_y

Rating vector for Alice

Next Question is: Which user (Ben, or Clif or David) is more similar to Alice?

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice	4			5	1
Ben	5	5	4
Clif (C)		2	1	2	4	5
David(D)	1		1	1	1	5

Let’s say, The size of the neighbourhood, |N| =2 (we will consider the two most similar users)

36 of 50

Step : Calculate Similarity between users based on preferences and Select Neighborhood of most similar users.

37 of 50

Find Similar Users

Consider user x and y with rating vector r_x and r_y
We need a similarity matrix sim(x,y)

Ques: What is the intuition from sim(A,B) vs sim(A,C) vs Sim(A,D) from above?

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice(A)	4			5	1
Ben(B)	5	5	4
Clif (C)		2	1	2	4	5
David(D)	1		1	1	1	5

38 of 50

Find Similar Users: our intuition?

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice (A)	4			5	1
Ben (B)	5	5	4
Clif (C)		2	1	2	4	5
David(D)	1		1	1	1	5

Consider user x and y with rating vector r_x and r_y
We need a similarity matrix sim(x,y)

Intuition from above: sim(A,B) > sim(A,C)

39 of 50

HOW DO WE FIND SIMILARITY?

Jaccard Coefficient/ Similarity
Cosine Similarity

40 of 50

HOW DO WE FIND SIMILARITY?: SOME WAYS

41 of 50

OPTION 1: JACCARD SIMILARITY

sim(A,B) = |r_a ∩ r_b | / |r_a∪ r_b |

= 1 / 5 =0.2

sim(A,C) = 2/6 =0.33

sim(A,D)= 3/5 =0.6

Result: sim(A,B) < sim(A,C)< sim(A,D)
Problem: Ignores rating values

42 of 50

OPTION 2: COSINE SIMILARITY

Think of “empty” as 0

0.305

sim(A,D)= 0.286 (we will ignore from the rest of the calculations just for simplicity)

43 of 50

OPTION 2: COSINE SIMILARITY

Sim (A,B)=0.38, Sim(A,C) =0.31

Sim(A,B) > Sim(A,C) but not so much
Problem: treats missing ratings as negative

if a user hasn't provided a rating for an item, the system assumes the user dislikes that item.

44 of 50

OPTION 2: COSINE SIMILARITY

Sim (A,B)=0.38, Sim(A,C) =0.31

Sim(A,B) > Sim(A,C) but not so much
Problem: treats missing ratings as negative

if a user hasn't provided a rating for an item, the system assumes the user dislikes that item.

Try:

Centered Cosine Similarity
Pearson Correlation Coefficient works well too

45 of 50

OPTION 3: CENTERED COSINE (Pearson Correlation)

Step B: Normalize “Actual Ratings” by Subtracting Row Means from each of their rated items.

→ row mean 10/3

→ row mean 14/3

→ row mean 14/5

	HP1	HP2	HP3	TW	SW1	SW2	SW3
Alice (A)	2/3			5/3	-7/3
Ben (B)	1/3	1/3	-2/3
Clif (C)		-4/5	-9/5	-4/5	6/5	11/55

Step A: Calculate “Mean” of individual Rows

46 of 50

Apply CENTERED COSINE (Pearson Correlation)

Calculate:

Sim (A,B)=cos(r_a, r_b)= 0.09,
Sim(A,C) =-0.56

Observe:

Sim(A, B) > Sim(A, C): A and B are more similar; A and C are dissimilar (Sim = -0.56).
Centering removes bias from tough/easy raters.
Missing ratings become 0 (no contribution).
Like Pearson correlation:

0 = no similarity
Negative = opposite preferences

Now A and C are (correctly) way further apart than A and B

47 of 50

Step: Predict Ratings and Recommendations

48 of 50

Leveraging KNN for Rating Prediction

Given a user, identify their k most similar users (use cosine similarity or Pearson correlation etc.) - DONE ALREADY

The ratings from more similar users (those with higher similarity scores) will have more influence.

49 of 50

PREDICT AND RECOMMEND

Given,

Sim (A,B)=cos(r_a, r_b)= 0.09,
Sim(A,C) =-0.56

Predict: Which movie to Recommend to User A next? HP2 or HP3?

Predict the user's rating based on the weighted average of those users

Prediction for HP2: P_A,HP2= [S(A,B)*5 + S(A,C)*2] / S(A,B) + S(A,C) = 1.42

Same: P_A,HP3= [S(A,B)*4 + S(A,C)*1] / S(A,B) + S(A,C) = 0.42

So, we will recommend HP2 to Alice next

50 of 50

End of Slide

Next Class:

Item-based collaborative filtering
Evaluation of RS
More RS