1 of 11

Predicting Familiarity in Recommender Systems

Group 7

Tavvi Taijala，Haiwei Ma，Wei Chen

2 of 11

Problem

RQ: How does familiarity impact the recommender system experience?

Do users prefer to be recommended more familiar items?
Do users prefer to be recommended less familiar items?
Does this preference vary between users?
Does this preference vary over time?

To answer this question, we need a way to manipulate familiarity experimentally.

Our approach was to build a model that could predict familiarity (accurately).

3 of 11

Dataset

Collected a gold standard familiarity dataset on MovieLens users.

1,000 MovieLens users
40,000 (user, movie, familiarity) data points

Familiarity was measured on a 6-point ordinal scale, condensed to a 3-point scale.

1 2 3 4 5 6

Low Med High

6 Classes:

3 Classes:

Never

heard of

Only heard

the name

Heard a

little about

Heard some

about

Heard a

lot about

Seen it

4 of 11

Approach

User

Movie

Full MovieLens

Rating

Dataset

BPR

User

Movie

Gold

Standard Familiarity

Dataset

Classifier

Low

Med

High

Familiarity

Prediction

User Features

100

Movie Features

Interaction Terms

100

5 of 11

Models and Algorithms

Matrix Factorization Models

BPR
AdaptiveBPR
WRMF
FunkSVD

Classification Models

OrdRec
AdaBoost
LogitBoost
RandomForest
Ordinal Random Forest
Deep Neural Networks

6 of 11

Ordinal�Regressor

Serve as a

baseline:0.511

AdaBoost

100 trees: 0.598

LogitBoost

100 trees: 0.5976

Random�Forest

100 trees: 0.60674

Deep Neural�Network Classifier

3 layers[10,20,10]: 0.625532

4 layers[10,20,20,10]: 0.632290

5 layers [40,20,20,20,10]: 0.7602594

Ordinal�Random�Forest

100 trees: 0.623

K-folder Cross Validation Results

7 of 11

Random Forest Vs Ordinal Random Forest

Accuray

Non-ordinal random forest: 0.61454
Ordinal random forest: 0.60674

Confusion matrix

Non-ordinal random forest: Ordinal random forest:

predict/true	1	2	3
1	13789	4947	2473
2	1762	3912	2199
3	809	3521	6538

predict/true	1	2	3
1	13795	4848	2581
2	1510	4425	2298
3	1055	3107	6331

8 of 11

Experimental Setup

Evaluation Metrics

Accuracy
Per-class accuracy
Confusion matrices

Crossfold Validation Approaches

K folds
K stratified folds
K out of sample folds

9 of 11

Out of Sample Validation Results

Algorithm	Parameters	Mean (Std Dev) Accuracy
Baseline	Popularity	47.65% (1.36%)
OrdRec		54.97% (1.92%)
Random Forest	300 Trees	53.03% (0.88%)
AdaBoost	300 Trees	53.53% (1.51%)
LogitBoost	300 Trees	56.55% (1.10%)
Neural Network	4 Layers	57.35% (2.22%)
Neural Network	5 Layers	57.92% (1.07%)
Neural Network	6 Layers	57.34% (2.40%)

10 of 11

Next Steps

Neural Networks

Pretraining
Ordinal Classification

Synthetic Training Data

11 of 11

Thanks!