1 of 18

RNN-Based� Recommender Systems

Paper Study: Personal Recommendation Using Deep Recurrent Neural Networks in NetEase

03/10/2019

Christine Chen

2 of 18

How Are RNNs Used in Recommender Systems?

  • To produce language/speech/music features
    • Similar to using CNN to produce image features
  • To provide short-term recommendations (✓)
    • Based on users’ short-term activity
    • Useful when the scenario involves sessions
  • In this paper, the scenario is e-commerce
    • Each session consists of a sequence of webpage visits
    • Each webpage is a state
    • Want to predict what the user will buy in this session

3 of 18

Long-Term vs. Short-Term

Long-Term

  • Collaborative Filtering
  • Predicts what a user will eventually buy
  • Uses all historical data
  • Represents a user’s personal interests/preferences
  • Output: A user’s rating for each item

Short-Term

  • RNN
  • Predicts what a user will immediately buy
  • Uses data from this session
  • Represents a user’s current need and intent
  • Output: The probability a user will buy each item

Hybrid

4 of 18

FNN for Collaborative Filtering

Matrix of all users’ purchase history of all items

Probability of a user buying each item

5 of 18

Basic RNN

activation function

vector of probabilities

6 of 18

RNN for Sessions

L hidden layers

E neurons

M webpages

N items

N < E < M

FNN

(Same output as FNN ⇒ Combine)

7 of 18

FNN + RNN Combined Model

8 of 18

History State (1/2)

  • In practice, we cannot �maintain infinite number�of states

⇒ Keep only N states

  • N is a hyperparameter
    • Trade-off between accuracy and computation overhead

9 of 18

History State (2/2)

  • Training
    • Use only the last N states
  • Prediction
    • Maintain a sliding window of N latest states
    • Oldest state is removed and aggregated into history state
  • History state is computed as

Vector representing one state (page)

Aging factor

θ(pt): bias, the time the user spent on page t

10 of 18

System Architecture

(MongoDB)

11 of 18

Implementation Details

  • Caffe
  • Optimization: SGD
  • Activation: ReLU
  • Scripts to generate model code from given model architecture hyperparameters

12 of 18

Genetic Algorithms for Hyperparameter Tuning

  • Each chromosome represents a configuration of hyperparameters (w, l, h, a1, a2, … , aL, …)
  • Crossover
    • Exchanges and merges values of two chromosomes
  • Merge
    • Randomly updates some genes of a chromosome
  • Will converge to local optimum

13 of 18

Experiment: Reference

  • Alternate Least Squares (ALS) (A popular collaborative filtering algorithm)

Very poor!

14 of 18

Experiment: Batch Size

15 of 18

Experiment: Combining with FNN

16 of 18

Experiment: Top-K Results

  • The item recommendation section can accommodate multiple items
  • Recommend K items, and then compute accuracy based on whether any of the K items was purchased

17 of 18

Experiment: Using History State

18 of 18

Experiment: Convergence Rate