1 of 18

RNN-Based� Recommender Systems

Paper Study: Personal Recommendation Using Deep Recurrent Neural Networks in NetEase

03/10/2019

Christine Chen

2 of 18

How Are RNNs Used in Recommender Systems?

To produce language/speech/music features

Similar to using CNN to produce image features

To provide short-term recommendations (✓)

Based on users’ short-term activity
Useful when the scenario involves sessions

In this paper, the scenario is e-commerce

Each session consists of a sequence of webpage visits
Each webpage is a state
Want to predict what the user will buy in this session

3 of 18

Long-Term vs. Short-Term

Long-Term

Collaborative Filtering
Predicts what a user will eventually buy
Uses all historical data
Represents a user’s personal interests/preferences
Output: A user’s rating for each item

Short-Term

RNN
Predicts what a user will immediately buy
Uses data from this session
Represents a user’s current need and intent
Output: The probability a user will buy each item

Reference

Hybrid

4 of 18

FNN for Collaborative Filtering

Matrix of all users’ purchase history of all items

Probability of a user buying each item

5 of 18

Basic RNN

activation function

vector of probabilities

To Train: Backpropagation Through Time

6 of 18

RNN for Sessions

L hidden layers

E neurons

M webpages

N items

N < E < M

FNN

(Same output as FNN ⇒ Combine)

7 of 18

FNN + RNN Combined Model

8 of 18

History State (1/2)

In practice, we cannot �maintain infinite number�of states

⇒ Keep only N states

N is a hyperparameter

Trade-off between accuracy and computation overhead

9 of 18

History State (2/2)

Training

Use only the last N states

Prediction

Maintain a sliding window of N latest states
Oldest state is removed and aggregated into history state

History state is computed as

Vector representing one state (page)

Aging factor

θ(pt): bias, the time the user spent on page t

10 of 18

System Architecture

(MongoDB)

11 of 18

Implementation Details

Caffe
Optimization: SGD
Activation: ReLU
Scripts to generate model code from given model architecture hyperparameters

12 of 18

Genetic Algorithms for Hyperparameter Tuning

Each chromosome represents a configuration of hyperparameters (w, l, h, a1, a2, … , aL, …)
Crossover

Exchanges and merges values of two chromosomes

Merge

Randomly updates some genes of a chromosome

Will converge to local optimum

13 of 18

Experiment: Reference

Alternate Least Squares (ALS) (A popular collaborative filtering algorithm)

Very poor!

14 of 18

Experiment: Batch Size

15 of 18

Experiment: Combining with FNN

16 of 18

Experiment: Top-K Results

The item recommendation section can accommodate multiple items
Recommend K items, and then compute accuracy based on whether any of the K items was purchased

17 of 18

Experiment: Using History State

18 of 18

Experiment: Convergence Rate