Predicting Familiarity in Recommender Systems
Group 7
Tavvi Taijala,Haiwei Ma,Wei Chen
Problem
RQ: How does familiarity impact the recommender system experience?
To answer this question, we need a way to manipulate familiarity experimentally.
Our approach was to build a model that could predict familiarity (accurately).
Dataset
Collected a gold standard familiarity dataset on MovieLens users.
Familiarity was measured on a 6-point ordinal scale, condensed to a 3-point scale.
1 2 3 4 5 6
Low Med High
6 Classes:
3 Classes:
Never
heard of
Only heard
the name
Heard a
little about
Heard some
about
Heard a
lot about
Seen it
Approach
User
Movie
Full MovieLens
Rating
Dataset
BPR
User
Movie
Gold
Standard Familiarity
Dataset
Classifier
Low
Med
High
Familiarity
Prediction
User Features
100
Movie Features
Interaction Terms
100
100
Models and Algorithms
Matrix Factorization Models
Classification Models
Ordinal�Regressor
Serve as a
baseline:0.511
AdaBoost
100 trees: 0.598
LogitBoost
100 trees: 0.5976
Random�Forest
100 trees: 0.60674
Deep Neural�Network Classifier
3 layers[10,20,10]: 0.625532
4 layers[10,20,20,10]: 0.632290
5 layers [40,20,20,20,10]: 0.7602594
Ordinal�Random�Forest
100 trees: 0.623
K-folder Cross Validation Results
Random Forest Vs Ordinal Random Forest
predict/true | 1 | 2 | 3 |
1 | 13789 | 4947 | 2473 |
2 | 1762 | 3912 | 2199 |
3 | 809 | 3521 | 6538 |
predict/true | 1 | 2 | 3 |
1 | 13795 | 4848 | 2581 |
2 | 1510 | 4425 | 2298 |
3 | 1055 | 3107 | 6331 |
Experimental Setup
Evaluation Metrics
Crossfold Validation Approaches
Out of Sample Validation Results
Algorithm | Parameters | Mean (Std Dev) Accuracy |
Baseline | Popularity | 47.65% (1.36%) |
OrdRec | | 54.97% (1.92%) |
Random Forest | 300 Trees | 53.03% (0.88%) |
AdaBoost | 300 Trees | 53.53% (1.51%) |
LogitBoost | 300 Trees | 56.55% (1.10%) |
Neural Network | 4 Layers | 57.35% (2.22%) |
Neural Network | 5 Layers | 57.92% (1.07%) |
Neural Network | 6 Layers | 57.34% (2.40%) |
Next Steps
Neural Networks
Synthetic Training Data
Thanks!