1 of 47

DEEP REINFORCEMENT LEARNING APPROACHES FOR MULTI-OBJECTIVE PROBLEM IN RECOMMENDATION SYSTEM

Master by research in Intelligent System�Supervisor: Assoc. Prof. Dr. Nurfadhlina Mohd. Sharef

Co-Supervisor: Assoc. Prof. Dr. Razali Yaakob

Dr. Khairul Azhar Kasmiran

Presentation byEE YEO KEAT

GS56054

VIVA VOCE

2 of 47

Outline

  • Introduction
  • Research Problem, Objective, Contribution
  • Literature Review
  • Research Methodology
  • Proposed Approach & Experiment Result
  • Discussion
  • Conclusion

3 of 47

  • Evaluation metrics for recommendation system: precision, novelty, and diversity.
  • Dealing with problem involves more than one constraint, optimization is necessary.

Introduction

 

  • Two categories of MO optimization (Gunantara, 2018; Weck, 2004): scalarization method and pareto method.

Method

Equation

Scalarization: transform all MO functions into single scalar fitness function by summed up with varying assigned weights.

Pareto: all the solution vectors are assembled by non-dominated solutions and dominated solutions in objective space.

4 of 47

Research Problem, Objective, Contribution

Problem Statement

Objective

Contribution

Most of the existing MORSs applied CF combined with EC and only focus on dual-objective (Cui et al., 2016; Geng et al., 2015; S. Wang et al., 2016).

CF method has cold start issue and vulnerable to highly sparse data. (Golbeck & Kuter, 2009; Hyup et al., 2003; Manouselis et al., 2010)

It also must relied on EC for optimization task despite the EC is suffering premature convergence issue. (Abbas et al., 2015;Horvath & Carvalho, 2016)

Furthermore, the combination has difficulty to incorporate side features which provide more advantages in learning.

  1. To propose a MORS using DRL framework that optimize precision, novelty, and diversity concurrently.
  2. To propose the DRL algorithms based on DQN approach and recurrent enhanced DQN in order to incorporate user latent and sequential rating features as additional side feature input.
  1. The proposed MORS that using DRL framework exhibited the ability of optimization in the three conflicting metrics. It surpassed the benchmark work in terms of novelty and diversity metrics, as a trade-off, precision is affected.
  2. The proposed MORS that incorporate user latent input earned better precision, novelty, and diversity in recommendation. The DRL agent that captured sequential rating data achieved better diversity compared with other algorithms.

5 of 47

MORS Approaches

Precision

Novelty

Diversity

Strengths

Weaknesses

CF + NNIA (Geng et al., 2015)

Yes

No

Yes

Able to search near-optimal solution from large population.

  • Compulsory to rely on rating predictor to fill up the sparse rating matrix before optimization process.
  • Premature convergence issue.
  • Difficult to achieve good density of points.
  • Ignore the valuable side information such as user latent features or sequential rating data.

CF + NSGA (Zuo et al., 2015)

Yes

No

Yes

CF + Decomposition-based MOEA (MOEA/D) (S. Wang et al., 2016)

Yes

Yes

No

Lower computational complexity compared to NSGA method.

CF + PMOEA (Cui et al., 2016)

Yes

Yes

Yes

Handle three objective optimization.

Literature Review

6 of 47

Literature Review

No

DRL Approaches

Application

References

1.

DQN based approach associated with scalarized methods extended sensor’s lifetime by three times longer

Internet-of-Things (IoT) devices

Hribar et al., 2019

2.

DRL technique surpassed the EC in term of pareto optimality on MO optimization between power controlling and voltage stability

Power management

Liao et al., 2010

3.

DRL approach is achieved more promising results in MO optimization problem compared to the genetic algorithms in aspects of the solution convergence, sparsity data handling and computing time.

Multi-objective travelling salesman problem (MOTSP)

K. Li et al., 2020

4.

DQN algorithm managed to correlated equilibrium solutions of workflow scheduling and the performance surpassed the EC algorithms included NSGA-II

Multi-workflows controlling over infrastructure-as-a-service clouds

Y. Wang et al., 2019

Recently some DRL algorithms that designed for solving MO sequential decision-making problems in various fields have been presented.

7 of 47

Hypothesis

Literature

References

Incorporate user latent input will benefit the agent learn more context about user-item relationship and generate better result.

By including these additional features as latent states, the quality of personalized recommendation potentially to be improved since different users may have significantly diverse tastes and behaviours

S. Chen et al., 2018;

Lei & Li, 2019

Learning sequential rating data will enhance agent to learn the trend of the data then generate better recommendation result.

One of the scholar highlighted that the order of past rating data will affect the current ratings significantly, and is thus potentially to elicit more accurate predictions

Lee et al., 2016

Hypothesis

8 of 47

Research Methodology

9 of 47

Proposed Approach

DQNMORS

10 of 47

Hyperparameter

Experiment Values

Learning Rate

0.01, 0.001, 0.0001

Discount Factor

0.1, 0.3, 0.6, 0.9

Minimum Epsilon

0.1, 0.5, 0.9

Training Epoch Number

Max. 60 - scalarization method

Max. 15 - Pareto method

Analysis on average value and standard deviation of precision, novelty, diversity for 10 sample users

Recommended movieID for all users

Compute Precision, Novelty and diversity for each recommendation list

Input State

Optimum learning rate, discount factor, minimum epsilon, training epoch for algorithm

User features

Movie-rating features

User 1

[131, 366, 184, 203, 49, 257, 121, 99, 654, 237]

User 2

[121, 99, 143, 184, 366, 203, 257, 131, 237, 49]

User N

[257, 366, 121, 49, 99, 131, 143, 237, 184, 203]

User 1

Metrics

[131, 366, 184, 203, 49, 257, 121, 99, 654, 237]

p = 0.500, n = 5.653,

d = 6.823

User 2

Metrics

[121, 99, 143, 184, 366, 203, 257, 131, 237, 49]

p = 0.800, n = 2.590,

d = 3.651

User N

Metrics

[257, 366, 121, 49, 99, 131, 143, 237, 18, 203]

p = 0.300, n = 4.662,

d = 5.216

Pareto method

Scalarization method

Tuning

Optimization

Experiment for DQNMORS

11 of 47

Experiment ID

Objective

Metrics

Benchmark

Exp_DQNMORS_1

To identify optimum values for hyperparameters in algorithm

Mean and standard deviation of Precision, Novelty, and Diversity

-

Exp_DQNMORS_2

To investigate the effect of user latent input on the performance of DQNMORS

Mean of Precision, Novelty, and Diversity

No user latent input

Exp_DQNMORS_3

To identify which optimization method contributes better in performance

Mean of Precision, Novelty, and Diversity

Scalarization method

Experiment for DQNMORS

12 of 47

Hyperparameter

DQNMORS_ws_m_u

DQNMORS_pf_m_u

DQNMORS_pf_m

Input Feature

User Feature

User Feature

Movie-rating

Optimization

Weighted Sum

Pareto Filtering

Pareto Filtering

Learning Rate

0.0001

Discount Factor

0.10

Epsilon

3.0

Min. Epsilon

0.5

Epsilon Decay Rate

0.90

0.70

Optimum Epoch Number

50

10

Finalized algorithm settings and optimum hyperparameter values

Exp_DQNMORS_1: Results

13 of 47

Average metrics values of recommendation results obtained by DQNMORS_pf_m_u and DQNMORS_pf_m for of 10 sample users.

Findings:

  • DQNMORS_pf_m_u achieved better result in terms of precision (19.80%), novelty (20.46%), diversity (1.60% ).
  • Included user latent in input state provide agent with more information to learn the interest of user.
  • The hypothesis of incorporating user latent input will benefit the agent in recommendation is accepted.

Exp_DQNMORS_2: Results

Effect of user latent input

User Feature

Movie-rating Feature

User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating

User ID, Rated Movie ID, Rating

DQNMORS_pf_m_u

DQNMORS_pf_m

Compare

14 of 47

Average metrics values of recommendation results obtained by DQNMORS_ws_m_u approach and DQNMORS_pf_m_u approach for of 10 sample users

Findings:

  • DQNMORS_pf_m_u surpassed the DQNMORS_ws_m_u in terms of average precision and novelty by 13.04% and 2.56% respectively, but 8.42% lower average diversity.
  • Pareto method is selected as optimization method and it is used in the subsequence experiments.

Exp_DQNMORS_3 – results Scalarize VS Pareto

DQNMORS_ws_m_u

DQNMORS_pf_m_u

Compare

weighted sum method

Pareto method

15 of 47

recDQNMORS

Proposed Approach

16 of 47

RESEARCH METHODOLOGY

17 of 47

Experiment ID

Objective

Metrics

Benchmark

Exp_recDQNMORS_1

To identify optimum values for parameters in algorithm

Mean and standard deviation of Precision, Novelty, and Diversity

-

Exp_recDQNMORS_2

To investigate the influential of learning sequential input data

Mean of Precision, Novelty, and Diversity

DQNMORS

Exp_recDQNMORS_3

To investigate the effect of user latent input on the performance of recDQNMORS

Mean of Precision, Novelty, and Diversity

recDQNMORS with no user latent input

Experiment for recDQNMORS

18 of 47

Hyperparameter

recDQNMORS_pf_m_u

recDQNMORS_pf_m

Input Feature

User Feature

Movie-rating Feature

Optimization

Pareto Filtering

Learning Rate

0.0001

Discount Factor

0.9

Epsilon

3.0

Min. Epsilon

0.5

Epsilon Decay Rate

0.7

Optimum Epoch Number

10

Finalized algorithm settings and optimum hyperparameter values

Exp_recDQNMORS_1: Results

19 of 47

Average metrics values of recommendation results obtained by DQNMORS_pf_m_u approach and recDQNMORS_pf_m_u approach for of 10 sample users.

Findings:

  • recDQNMORS_pf_m_u (with LSTM layer that taking sequence of rated movies into account) shows precision was lower than DQNMORS_pf_m_u by 17.57% and novelty by 4.68%.
  • However, recDQNMORS_pf_m_u is surpassed the opponent by 2.66% in diversity.
  • Result implied that considering past ratings information in sequence using LSTM layer does not contribute to improvement.

Exp_recDQNMORS_2 – learning sequential data result

recDQNMORS_pf_m_u

DQNMORS_pf_m_u

Compare

with LSTM layer

without LSTM

20 of 47

The reason for this rather contradictory result on effect of learning sequential data may due following:

  • Deficient representation of the input state information learned by recDQNMORS_pf_m_u.
  • The direct transition information of the user-movie rating across the timestamp may not represent trend about the users’ dynamic preferences.
  • Only short temporal information is insufficient. Based on a research of RS, it will be better to combine short and long temporal in learning.
  • For the occasion in movie recommendation, the changing of user-movie rating information did not provide the useful representation to the agent.

DISCUSSION

21 of 47

Average metrics values of recommendation results obtained by recDQNMORS_pf_m_u and recDQNMORS_pf_m for of 10 sample users.

Findings:

  • recDQNMORS_pf_m_u outperformed the recDQNMORS_pf_m in precision and diversity by 29.38%, and 8.36%, but lower than that by 17.65%.
  • This outcome support to previous finding and hypothesis which user latent feature make agent perform better.

Exp_recDQNMORS_3 – Effect of user latent input result

User Feature

Movie-rating Feature

User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating

User ID, Rated Movie ID, Rating

recDQNMORS_pf_m_u

recDQNMORS_pf_m

Compare

22 of 47

The average of mean, minimum, and maximum metrics values from all the algorithms on 10 sample users.

PMOEA vs DQNMORS vs recDQNMORS

23 of 47

PMOEA vs DQNMORS vs recDQNMORS

24 of 47

PMOEA vs DQNMORS vs recDQNMORS

25 of 47

  • Despite PMOEA+CF_User has good performance in precision, but it has rather weak performance in term of novelty and diversity.
  • All of the proposed DRL based approaches have higher average of mean novelty than PMOEA+CF_User, as well as in all sample users.
  • In average of maximum novelty, all of the proposed DRL approaches outperformed all of the PMOEA based approaches. As overview, the proposed DQN based approaches has better novelty results compared to PMOEA based approaches.
  • Both DQNMORS and recDQNMORS approaches have most striking result in diversity of recommendation list as both outperformed all of the PMOEA approaches in all users on average mean diversity.
  • Both PMOEA+CF_Item and recDQNMORS_pf_m has lowest mean precision when compared among its variants, but they have highest mean novelty among its variant respectively. It is in good agreement to the conflict relationship between novelty with precision (S. Wang et al. 2016)

DISCUSSION

26 of 47

  • Overall, there is no perfect algorithm that championed all metric together at the same time. For instance, regardless of DQNMORS_ws_m_u scored highest diversity value among the algorithms, it is not highest at precision or novelty, same as PMOEA+CF_User which achieved highest precision, but it is not the best in other metrics.
  • By summarized the metrics value, the PMOEA + CF_User is outperformed others on precision, and the recDQNMORS_pf_m is excelled in novelty, whereas DQNMORS_ws_m_u achieved top in diversity.
  • The performance of DQNMORS and recDQNMORS approaches on novelty and diversity are better than PMOEA approaches by sacrificing a certain degree of precision.
  • The precision was lower than best record from benchmark and is certainly room for improvement. However, the DQNMORS_ws_m_u and DQNMORS_pf_m_u are greatly potential to perform better based on the average of mean and maximum precision score.
  • This study has identified additional further evidence that DRL approaches have substantial competence in MORS problem.

DISCUSSION

27 of 47

The DRL approaches has comparable results with the benchmark PMOEA. With the advantages of exploration-exploitation by DRL agent, the proposed method able to avoid perplexed at local minimal and achieved better novelty and diversity. All the objectives of the research have been achieved.

As for suggestion to future works:

  • Perform study on additional feature information such as movie title, genre, with effective embedding technique.
  • Allow dynamic changes of learning rate across training.
  • The proposed DQN based approaches still have room of improvement for increase the precision while maintain good track on other evaluation metrics.

Conclusions

28 of 47

Thank You

29 of 47

  1. Gunantara, N. (2018). A review of multi-objective optimization : Methods and its applications. Cogent Engineering, 5(1), 1–16. https://doi.org/10.1080/23311916.2018.1502242
  2. Weck, O. L. de. (2004). Multiobjective Optimization: History and Promise. Proceedings of 3rd China-Japan- Korea Joint Symposium on Optimization of Structural and Mechanical Systems. https://doi.org/10.1109/TEVC.2009.2017515
  3. Cui, L., Ou, P., Fu, X., Wen, Z., & Lu, N. (2016). A novel multi-objective evolutionary algorithm for recommendation systems. J. Parallel Distrib. Comput. https://doi.org/10.1016/j.jpdc.2016.10.014
  4. Geng, B., Li, L., Jiao, L., Gong, M., Cai, Q., & Wu, Y. (2015). NNIA-RS : A multi-objective optimization based recommender system. Physica A, xxxx. https://doi.org/10.1016/j.physa.2015.01.007
  5. Wang, S., Gong, M., Li, H., & Yang, J. (2016). Multi-objective optimization for long tail recommendation. Knowledge-Based Systems, 104, 145–155. https://doi.org/10.1016/j.knosys.2016.04.018

References

30 of 47

  1. Golbeck, J., & Kuter, U. (2009). Trust Metrics in Recommender Systems. Computing with Social Trust, 169–181. https://doi.org/10.1007/978-1-84800-356-9
  2. Hyup, T., Joo, K., & Han, I. (2003). The collaborative filtering recommendation based on SOM cluster-indexing CBR. Expert Systems with Applications, 25, 413–423. https://doi.org/10.1016/S0957-4174(03)00067-8
  3. Manouselis, N., Verbert, K., Drachsler, H., & Santos, O. C. (2010). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. RecSys’10 - Proceedings of the 4th ACM Conference on Recommender Systems, 17(6), 377. https://doi.org/10.1145/1864708.1864797
  4. Abbas, A., Zhang, L., & Khan, S. U. (2015). A survey on context-aware recommender systems based on computational intelligence techniques. Computing. https://doi.org/10.1007/s00607-015-0448-7
  5. Horvath, T., & Carvalho, A. C. P. L. F. de. (2016). Evolutionary computing in recommender systems : a review of recent research. Natural Computing, 16, 441–462. https://doi.org/10.1007/s11047-016-9540-y

References

31 of 47

  1. Zuo, Y., Gong, M., Zeng, J., & Jiao, L. (2015). Personalized Recommendation Based on Evolutionary Multi-Objective Optimization. IEEE Computational Intelligence Magazine, 10(1), 52–62. https://doi.org/10.1109/MCI.2014.2369894
  2. Hribar, J., Marinescu, A., Ropokis, G. A., & Dasilva, L. A. (2019). Using Deep Q-learning To Prolong the Lifetime of Correlated Internet of Things Devices. 2019 IEEE International Conference on Communications Workshops (ICC Workshops). https://doi.org/10.1109/ICCW.2019.8756759
  3. Liao, H. L., Member, S., Wu, Q. H., Member, S., & Jiang, L. (2010). Multi-Objective Optimization by Reinforcement Learning For Power System Dispatch and Voltage Stability. 2010 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT Europe), 1–8. https://doi.org/10.1109/ISGTEUROPE.2010.5638914
  4. Wang, Y., Liu, H., Zheng, W., Xia, Y., Li, Y., Chen, P., Guo, K., & Xie, H. (2019). Multi-Objective Workflow Scheduling With Reinforcement Learning. IEEE Access, 7, 39974–39982. https://doi.org/10.1109/ACCESS.2019.2902846
  5. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., & Zhang, Y. (2018). Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling. CoRR, abs/1810.1.

References

32 of 47

  1. Lee, J., Oh, B., Yang, J., & Park, U. (2016). RLCF : A Collaborative Filtering Approach Based on Reinforcement Learning With Sequential Ratings. Intelligent Automation & Soft Computing, 8587(October), 1–6. https://doi.org/10.1080/10798587.2016.1231510

References

33 of 47

APPENDICES

34 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning learning rate

35 of 47

Throughout this experiment, the α = 0.0001 allowed agent perform better and it is selected as optimum value for both DQNMORS algorithms because the performance in precision and diversity is better compared to larger learning rate.

Exp_DQNMORS_1 – tuning learning rate

36 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning discount factor

37 of 47

Therefore, through this experiment, the γ = 0.1 is selected as the optimum value for discount factor hyperparameter.

Exp_DQNMORS_1 – tuning discount factor

38 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning minimum epsilon

39 of 47

In summary, the εmin = 0.5 is considered as optimum value of minimum epsilon because it provides ample exploration rate to agent meanwhile maintain exploitation.

Exp_DQNMORS_1 – tuning minimum epsilon

40 of 47

Average of cumulative average for total reward obtained by agents in DQNMORS_ws_m_u and DQNMORS_pf_m_u approaches

Based on the results, the DQNMORS_ws_m_u agent achieved saturated performance before epoch 50 as shown by the average of cumulative average of total reward.

For DQNMORS_pf_m_u, the cumulative average of reward mostly shows slightly decreased after epoch 10. The agent is considered reached optimum performance around epoch 10 and further training considered be trivial. For that reason, the best epoch number for DQNMORS_ws_m_u and DQNMORS_pf_m_u is set to 50 and 10 respectively.

Exp_DQNMORS_1 – tuning epoch

41 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate in recDQNMORS_pf_m_u.

By taking consideration of overall performance, the α = 0.0001 is selected as optimum value for learning rate hyperparameter

Exp_recDQNMORS_1 – Tuning learning rate

42 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor in recDQNMORS_pf_m_u

The γ = 0.9 is selected as optimum value for the discount factor hyperparameter

Exp_recDQNMORS_1 – Tuning discount factor

43 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon value in recDQNMORS_pf_m_u.

the agent with εmin = 0.5 shows the better results and it decided as optimum value of minimum epsilon value for recDQNMORS_pf_m_u

Exp_recDQNMORS_1 – Tuning minimum epsilon

44 of 47

Average of cumulative average of total reward obtained by recDQNMORS agent

The optimum epoch for recDQNMORS_pf_m_u algorithm is 10 epochs under the optimum hyperparameter setting.

Exp_recDQNMORS_1 – Tuning training epoch

45 of 47

Average

PMOEA + ProbS

PMOEA + CF_User

PMOEA + CF_Item

DQNMORS_ws_m_u

DQNMORS_pf_m_u

DQNMORS_pf_m

recDQNMORS_pf_m_u

recDQNMORS_pf_m

Mean Precision

0.418

0.500

0.368

0.191

0.216

0.174

0.178

0.126

Mean Novelty

1.957

2.307

3.104

3.086

3.165

2.517

3.016

3.558

Mean Diversity

2.925

2.833

2.427

5.756

5.271

5.187

5.412

4.959

The result for average of mean metrics values by algorithms.

46 of 47

Average

PMOEA + ProbS

PMOEA + CF_User

PMOEA + CF_Item

DQNMORS_ws_m_u

DQNMORS_pf_m_u

DQNMORS_pf_m

recDQNMORS_pf_m_u

recDQNMORS_pf_m

Maximum Precision

0.580

0.650

0.560

0.500

0.460

0.370

0.350

0.310

Maximum Novelty

2.386

2.681

3.602

4.489

4.159

3.927

4.784

5.422

Maximum Diversity

3.784

3.638

3.493

8.229

7.646

7.678

8.504

8.123

The result for average of maximum metrics values by algorithms.

47 of 47

The result for average of minimum metrics values by algorithms

Average

PMOEA + ProbS

PMOEA + CF_User

PMOEA + CF_Item

DQNMORS_ws_m_u

DQNMORS_pf_m_u

DQNMORS_pf_m

recDQNMORS_pf_m_u

recDQNMORS_pf_m

Minimum Precision

0.230

0.330

0.170

0.030

0.050

0.020

0.040

0.010

Minimum Novelty

1.689

2.051

2.686

1.768

1.615

1.871

2.002

2.558

Minimum Diversity

1.944

1.879

1.432

3.232

3.432

2.602

2.556

2.364