1 of 50

Recommendation System

  • Fardina Fathmiul Alam

CMSC 320 - Introduction to Data Science,

2025

2 of 50

OUTLINE

  • What are Recommender Systems ?
  • Content Based Filtering
  • Collaborative Filtering

3 of 50

RECOMMENDER SYSTEMS: THE TASK

Plays an Justin Bieber song

What should we recommend next?

"Recommendation Engines" and "Recommendation Systems" are terms that are often used interchangeably,

4 of 50

What is a Recommender System?

Algorithms that recommends a particular product(s)/ service(s) to users they are likely to consume based on their preferences, behavior, or past interactions.

Goal: Answer "Will person X like product Y?

5 of 50

Examples

  • Netflix (movie/show recommendations)
  • Amazon (product recommendations)
  • Spotify (music recommendations)
  • YouTube (video suggestions)
  • Facebook/Instagram (friend or content suggestions)

...pretty much everything

6 of 50

Netflix Prize Open AI Competition 2006:

held to improve its movie recommendation algorithm. It began in 2006 and involved a dataset of movie ratings by users. Participants were tasked with developing an algorithm that could predict user ratings for movies accurately.

https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data

Netflix Prize (2006): Competition to improve recommendation accuracy using user ratings.

7 of 50

HOW RECOMMENDER SYSTEMS WORK

  1. Input: User data (ratings, preferences) + item features.
  2. Process: Compute relevance scores (e.g., similarity, predictions).
  3. Output: Ranked list of items (context-aware, diverse).

8 of 50

AN EXAMPLE: Build a MOVIE RECOMMENDATION system

How to Measure User Preference? How do I know if someone liked something (products, movies etc.)?

Date needed: What will our data be?

What will our label be?

Explicit Feedback (Direct user input) Thumbs up/down, Star ratings (e.g., 1-5 stars), Written reviews.

Implicit Feedback (Indirect behavior): Watch time (% of movie watched), Re-watches (repeated views), Purchases/saves.

Feature(x): Movie attributes (genre, director, actors, runtime), vectorized in some way.

For a movie, we can compute all of these metrics and turn that or transform that movie into a vector. Example for Inception:

Action, Sci-Fi, Nolan, DiCaprio → [1, 1, 1, 1]

Labels (y): Ratings from the users Create class label: Liked / didn't like based on user ratings

9 of 50

OUR ALGORITHM

We can train a supervised learning model (e.g., logistic regression, random forest) using features like:

<Runtime, Genre, Budget, Actors, Director>Thumbs_Up (Yes/No)

Once trained, when we get a new item with features:

<Runtime, Genre, Budget, Actors, Director>

the model predicts whether the user would give it a thumbs up (i.e., find it relevant).

This is characteristic of a content-based recommendation system (NEXT TOPIC) using a supervised learning approach.

Where we use item features and past user feedback to make personalized predictions.

10 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

  • Content based approaches
  • Collaborative approaches
  • Hybrid systems, and more.

11 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Content based approaches (Use prior information about users and/or items)

      • Techniques include regression or classification models.
      • Recommendations are based on the attributes or characteristics of items and the preferences of users.

12 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Collaborative approaches (Solely rely on the user-item interaction matrix.)

      • Techniques include user-user, item-item and matrix factorisation.
      • User-User: Recommendations based on similar users' preferences.
      • Item-Item: Recommendations based on similar items.
      • Matrix Factorization: Decompose the user-item interaction matrix into lower-dimensional matrices to capture latent factors.

13 of 50

PARADIGMS OF RECOMMENDER SYSTEMS

Hybrid approaches (Combine multiple recommendation approaches.)

      • Leverage the strengths of both content-based and collaborative methods.matrices to capture latent factors.

14 of 50

  1. CONTENT - BASED

RECOMMENDATION SYSTEM

15 of 50

CONTENT-BASED RECOMMENDATION

Goal: Predict what a user will like based on their past likes and item features.

(popular and widely used approach to provide personalized recommendations to users.)�

Requires:

  • Item information (e.g., genre, director)
  • User profile (preferences)

MAIN IDEA: Predict what a user will like based on their past likes and item features.

What it uses: The features of the item (e.g., genre, category, keywords).

How it works: Recommends items similar to what the user liked before.�

16 of 50

CONTENT-BASED RECOMMENDATION: Leveraging Item Features and User Profiles

GOAL: Recommend items to a user that are similar to items that they have previously interacted with (liked before).

Recommend items to customer x similar to previous items rated highly by x.

Movie recommendations:

Recommend movies with same actor(s), director, genre, …

Websites, blogs, news

Recommend other sites with similar types or words

17 of 50

CONTENT-BASED RECOMMENDATION

Content based Recommendation REQUIRE information about the items.

What do we need:

  • Some information about the available items such as the genre ("content")
  • Some sort of user profile describing what the user likes (the preferences)

The Task:

  • Learn user preferences
  • Locate/recommend items that are "similar" to the user preferences.

18 of 50

PROCESS OF CONTENT-BASED RECOMMENDATION

  1. Featurize Items
    • Convert each item into a vector (e.g., genre, director, actors).
  2. Calculate Similarity
    • Compare vectors using cosine similarity (or other metrics - Euclidean Distance, Manhattan Distance, Jaccard Distance etc.)
  3. Learn User Preferences
    • Build a taste profile from features of items the user liked in the past.
  4. Recommend
    • Suggest items with high similarity to the user’s profile.

Key Point: "Show users items like what they’ve already enjoyed."

19 of 50

if a set of users likes action movies by Jackie Chan, this algorithm may recommend the movies having the below characteristics.

  • Having a genre of action
  • Having Jackie Chan in the cast.

EXAMPLE: CONTENT-BASED RECOMMENDATION

User Likes → [Action, Jackie Chan] → Recommends "Rush Hour" (Action, Jackie Chan)

20 of 50

Pros and Cons: Content Based Systems

Pros

Cons

No Need for Other Users: Works independently of other users’ data�

Personalized: Recommends items based on unique user tastes.

Support New/Unpopular Items: Can recommend niche items�

Explainable: Easy to justify recommendations based on item features

Privacy-Friendly: Doesn’t require aggregating data across users.

Feature Selection is Hard: Choosing meaningful item features is challenging�

Cold Start for Users: Struggles with new users and sparse histories�

Limited Diversity: Recommends similar items repeatedly, may create a "filter bubble" (only suggests items too similar to past likes).�

Missed Opportunities: Ignores items outside user’s known preferences. Cannot recommend items lacking sufficient features (e.g., new movies with no tags).�

Maintenance Overhead: Needs retraining as user tastes evolve or new items are added.

21 of 50

(2) COLLABORATIVE FILTERING

- BASED

RECOMMENDATION SYSTEM

22 of 50

COLLABORATIVE FILTERING

What to recommend?

“Find similar users and recommend items that they like”!

  • Use other people’s ratings to help rank and recommend things to other people

Insight: People who are similar may enjoy similar things.

23 of 50

Remember, unlike content-based recommendation (use additional information or features about users and/or items) Collaborative filtering does NOT require any information/features about the items.

COLLABORATIVE FILTERING

24 of 50

COLLABORATIVE FILTERING

In collaborative filtering, a recommendation system recommends a user the products on the basis of the preferences of the other users with similar tastes.

if you are listening to music on Spotify, then it is likely that the music liked by the other users with the similar taste will be suggested to you.

25 of 50

COLLABORATIVE FILTERING: TYPES

Two types:

  1. User-based nearest-neighbor collaborative filtering
  2. Item-based collaborative filtering

User-based : Finds users similar to the target user and recommends items those similar users liked.

Item-based: Finds items similar to the items the target user liked and recommends those similar items.

Calculate similarity between users based on their item ratings or interactions.

Calculate similarity between items based on users’ ratings or interactions.

26 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Recommendations are made based on the preferences and behaviors of similar users (users ‘N’ with similar preferences to a target user ‘I’).

Idea: “users who have similar tastes in the past will continue to have similar tastes in the future”.

27 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Example: If User A and User B have similar preferences,

  • Items liked or recommended by User B BUT NOT YET SEEN by User A might be suggested to User A.

“John (User A) receives recommendations for movies that Harry (User B) has enjoyed, as they share similar movie preferences.”

28 of 50

COLLABORATIVE FILTERING: TYPES

User-based nearest-neighbor collaborative filtering:

Example 2: John (User A) receives recommendations for energy drinks, inspired by Harry (User B) choice, as they both share a fondness for fitness-related items like pie and protein salad

Question: How do we find people who are similar?

29 of 50

OVERVIEW OF THE STEPS: USER-BASED COLLABORATIVE FILTERING

  • Collect User Interaction Data (e.g., ratings, views, likes)
  • Build a User-Item Matrix to represent interactions (who liked what).
  • Calculate Similarity between users based on preferences.
  • Select Neighborhood of most similar users
  • Predict Ratings for unseen items using neighbors’ data
  • Recommend Top-N Items with highest predicted ratings

Refine and Update based on new user feedback over time.

(Users → Match → Predict → Recommend)

30 of 50

Introducing concept of “Utility Matrix” or “User-Item Matrix”

We have two entities: N = set of Users , M = set of Items

A utility matrix is the matrix that captures interactions between N users and M items

Utility function u: N × M → R

  • R = set of ratings
  • R is a totally ordered set
  • e.g., 1-5 stars, real number in [0,1]

31 of 50

LET’S SEE AN EXAMPLE WITH STEPS

Problem Statement: I need to know what movie to recommend to my users?

32 of 50

Step: Build a User-Item Matrix

33 of 50

Key idea: Past similar preferences help predict future ones.

Users rate movies from 1 (awful) to 5 (loved it)—not every movie is rated.

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice

4

5

1

Ben

5

5

4

Clif (C)

2

1

2

4

5

David(D)

1

1

1

1

5

*HP = Harry Potter

*TW = Twilight

*SW = Star Wars

Q: What should I recommend today to the user Alice?

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

34 of 50

VECTORIZE USERS: USER-ITEM MATRIX

  • Users = Rows in the utility matrix
  • Similar users have similar rating patterns (vectors)
    • IDEA: Two users are similar if their vectors are similar!

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice

4

5

1

Ben

5

5

4

Clif (C)

2

1

2

4

5

David(D)

1

1

1

1

5

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

35 of 50

VECTORIZE USERS: USER-ITEM MATRIX

  • Consider user x and y with rating vector rx and ry

Rating vector for Alice

Next Question is: Which user (Ben, or Clif or David) is more similar to Alice?

Figure: User - Item Interaction Matrix where Movie Ratings is between [1-5]

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice

4

5

1

Ben

5

5

4

Clif (C)

2

1

2

4

5

David(D)

1

1

1

1

5

Let’s say, The size of the neighbourhood, |N| =2 (we will consider the two most similar users)

36 of 50

Step : Calculate Similarity between users based on preferences and Select Neighborhood of most similar users.

37 of 50

Find Similar Users

  • Consider user x and y with rating vector rx and ry
  • We need a similarity matrix sim(x,y)

Ques: What is the intuition from sim(A,B) vs sim(A,C) vs Sim(A,D) from above?

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice(A)

4

5

1

Ben(B)

5

5

4

Clif (C)

2

1

2

4

5

David(D)

1

1

1

1

5

38 of 50

Find Similar Users: our intuition?

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice (A)

4

5

1

Ben (B)

5

5

4

Clif (C)

2

1

2

4

5

David(D)

1

1

1

1

5

  • Consider user x and y with rating vector rx and ry
  • We need a similarity matrix sim(x,y)

Intuition from above: sim(A,B) > sim(A,C)

39 of 50

HOW DO WE FIND SIMILARITY?

  • Jaccard Coefficient/ Similarity
  • Cosine Similarity

40 of 50

HOW DO WE FIND SIMILARITY?: SOME WAYS

41 of 50

OPTION 1: JACCARD SIMILARITY

sim(A,B) = |ra ∩ rb | / |ra∪ rb |

= 1 / 5 =0.2

sim(A,C) = 2/6 =0.33

sim(A,D)= 3/5 =0.6

  • Result: sim(A,B) < sim(A,C)< sim(A,D)
  • Problem: Ignores rating values

42 of 50

OPTION 2: COSINE SIMILARITY

Think of “empty” as 0

0.305

sim(A,D)= 0.286 (we will ignore from the rest of the calculations just for simplicity)

43 of 50

OPTION 2: COSINE SIMILARITY

Sim (A,B)=0.38, Sim(A,C) =0.31

  • Sim(A,B) > Sim(A,C) but not so much
  • Problem: treats missing ratings as negative
    • if a user hasn't provided a rating for an item, the system assumes the user dislikes that item.

44 of 50

OPTION 2: COSINE SIMILARITY

Sim (A,B)=0.38, Sim(A,C) =0.31

  • Sim(A,B) > Sim(A,C) but not so much
  • Problem: treats missing ratings as negative
    • if a user hasn't provided a rating for an item, the system assumes the user dislikes that item.

Try:

    • Centered Cosine Similarity
    • Pearson Correlation Coefficient works well too

45 of 50

OPTION 3: CENTERED COSINE (Pearson Correlation)

Step B: Normalize “Actual Ratings” by Subtracting Row Means from each of their rated items.

row mean 10/3

→ row mean 14/3

→ row mean 14/5

HP1

HP2

HP3

TW

SW1

SW2

SW3

Alice (A)

2/3

5/3

-7/3

Ben (B)

1/3

1/3

-2/3

Clif (C)

-4/5

-9/5

-4/5

6/5

11/55

Step A: Calculate “Mean” of individual Rows

46 of 50

Apply CENTERED COSINE (Pearson Correlation)

Calculate:

  • Sim (A,B)=cos(ra, rb)= 0.09,
  • Sim(A,C) =-0.56

Observe:

  • Sim(A, B) > Sim(A, C): A and B are more similar; A and C are dissimilar (Sim = -0.56).
  • Centering removes bias from tough/easy raters.
  • Missing ratings become 0 (no contribution).
  • Like Pearson correlation:
    • 0 = no similarity
    • Negative = opposite preferences

Now A and C are (correctly) way further apart than A and B

47 of 50

Step: Predict Ratings and Recommendations

48 of 50

Leveraging KNN for Rating Prediction

  • Given a user, identify their k most similar users (use cosine similarity or Pearson correlation etc.) - DONE ALREADY

NEXT:

Predict the user's rating for an item i based on the weighted average of the ratings of those k nearest neighbors.

    • The ratings from more similar users (those with higher similarity scores) will have more influence.

49 of 50

PREDICT AND RECOMMEND

Given,

  • Sim (A,B)=cos(ra, rb)= 0.09,
  • Sim(A,C) =-0.56

Predict: Which movie to Recommend to User A next? HP2 or HP3?

Predict the user's rating based on the weighted average of those users

Prediction for HP2: PA,HP2= [S(A,B)*5 + S(A,C)*2] / S(A,B) + S(A,C) = 1.42

Same: PA,HP3= [S(A,B)*4 + S(A,C)*1] / S(A,B) + S(A,C) = 0.42

So, we will recommend HP2 to Alice next

50 of 50

End of Slide

Next Class:

  • Item-based collaborative filtering
  • Evaluation of RS
  • More RS