1 of 47

DEEP REINFORCEMENT LEARNING APPROACHES FOR MULTI-OBJECTIVE PROBLEM IN RECOMMENDATION SYSTEM

Master by research in Intelligent System�Supervisor: Assoc. Prof. Dr. Nurfadhlina Mohd. Sharef

Co-Supervisor: Assoc. Prof. Dr. Razali Yaakob

Dr. Khairul Azhar Kasmiran

Presentation by�EE YEO KEAT

GS56054

VIVA VOCE

2 of 47

Outline

Introduction
Research Problem, Objective, Contribution
Literature Review
Research Methodology
Proposed Approach & Experiment Result
Discussion
Conclusion

3 of 47

Evaluation metrics for recommendation system: precision, novelty, and diversity.
Dealing with problem involves more than one constraint, optimization is necessary.

Introduction

Two categories of MO optimization (Gunantara, 2018; Weck, 2004): scalarization method and pareto method.

Method	Equation
Scalarization: transform all MO functions into single scalar fitness function by summed up with varying assigned weights.
Pareto: all the solution vectors are assembled by non-dominated solutions and dominated solutions in objective space.

4 of 47

Research Problem, Objective, Contribution

Problem Statement	Objective	Contribution
Most of the existing MORSs applied CF combined with EC and only focus on dual-objective (Cui et al., 2016; Geng et al., 2015; S. Wang et al., 2016). CF method has cold start issue and vulnerable to highly sparse data. (Golbeck & Kuter, 2009; Hyup et al., 2003; Manouselis et al., 2010) It also must relied on EC for optimization task despite the EC is suffering premature convergence issue. (Abbas et al., 2015;Horvath & Carvalho, 2016) Furthermore, the combination has difficulty to incorporate side features which provide more advantages in learning.	To propose a MORS using DRL framework that optimize precision, novelty, and diversity concurrently. To propose the DRL algorithms based on DQN approach and recurrent enhanced DQN in order to incorporate user latent and sequential rating features as additional side feature input.	The proposed MORS that using DRL framework exhibited the ability of optimization in the three conflicting metrics. It surpassed the benchmark work in terms of novelty and diversity metrics, as a trade-off, precision is affected. The proposed MORS that incorporate user latent input earned better precision, novelty, and diversity in recommendation. The DRL agent that captured sequential rating data achieved better diversity compared with other algorithms.

5 of 47

MORS Approaches	Precision	Novelty	Diversity	Strengths	Weaknesses
CF + NNIA (Geng et al., 2015)	Yes	No	Yes	Able to search near-optimal solution from large population.	Compulsory to rely on rating predictor to fill up the sparse rating matrix before optimization process. Premature convergence issue. Difficult to achieve good density of points. Ignore the valuable side information such as user latent features or sequential rating data.
CF + NSGA (Zuo et al., 2015)	Yes	No	Yes	Able to search near-optimal solution from large population.
CF + Decomposition-based MOEA (MOEA/D) (S. Wang et al., 2016)	Yes	Yes	No	Lower computational complexity compared to NSGA method.
CF + PMOEA (Cui et al., 2016)	Yes	Yes	Yes	Handle three objective optimization.

Literature Review

6 of 47

Literature Review

No	DRL Approaches	Application	References
1.	DQN based approach associated with scalarized methods extended sensor’s lifetime by three times longer	Internet-of-Things (IoT) devices	Hribar et al., 2019
2.	DRL technique surpassed the EC in term of pareto optimality on MO optimization between power controlling and voltage stability	Power management	Liao et al., 2010
3.	DRL approach is achieved more promising results in MO optimization problem compared to the genetic algorithms in aspects of the solution convergence, sparsity data handling and computing time.	Multi-objective travelling salesman problem (MOTSP)	K. Li et al., 2020
4.	DQN algorithm managed to correlated equilibrium solutions of workflow scheduling and the performance surpassed the EC algorithms included NSGA-II	Multi-workflows controlling over infrastructure-as-a-service clouds	Y. Wang et al., 2019

Recently some DRL algorithms that designed for solving MO sequential decision-making problems in various fields have been presented.

7 of 47

Hypothesis	Literature	References
Incorporate user latent input will benefit the agent learn more context about user-item relationship and generate better result.	By including these additional features as latent states, the quality of personalized recommendation potentially to be improved since different users may have significantly diverse tastes and behaviours	S. Chen et al., 2018; Lei & Li, 2019
Learning sequential rating data will enhance agent to learn the trend of the data then generate better recommendation result.	One of the scholar highlighted that the order of past rating data will affect the current ratings significantly, and is thus potentially to elicit more accurate predictions	Lee et al., 2016

Hypothesis

8 of 47

Research Methodology

9 of 47

Proposed Approach

DQNMORS

10 of 47

Hyperparameter	Experiment Values
Learning Rate	0.01, 0.001, 0.0001
Discount Factor	0.1, 0.3, 0.6, 0.9
Minimum Epsilon	0.1, 0.5, 0.9
Training Epoch Number	Max. 60 - scalarization method Max. 15 - Pareto method

Analysis on average value and standard deviation of precision, novelty, diversity for 10 sample users

Recommended movieID for all users

Compute Precision, Novelty and diversity for each recommendation list

Input State

Optimum learning rate, discount factor, minimum epsilon, training epoch for algorithm

User features

Movie-rating features

User 1
[131, 366, 184, 203, 49, 257, 121, 99, 654, 237]
User 2
[121, 99, 143, 184, 366, 203, 257, 131, 237, 49]
…
User N
[257, 366, 121, 49, 99, 131, 143, 237, 184, 203]

User 1	Metrics
[131, 366, 184, 203, 49, 257, 121, 99, 654, 237]	p = 0.500, n = 5.653, d = 6.823
User 2	Metrics
[121, 99, 143, 184, 366, 203, 257, 131, 237, 49]	p = 0.800, n = 2.590, d = 3.651
…
User N	Metrics
[257, 366, 121, 49, 99, 131, 143, 237, 18, 203]	p = 0.300, n = 4.662, d = 5.216

Pareto method

Scalarization method

Tuning

Optimization

Experiment for DQNMORS

11 of 47

Experiment ID	Objective	Metrics	Benchmark
Exp_DQNMORS_1	To identify optimum values for hyperparameters in algorithm	Mean and standard deviation of Precision, Novelty, and Diversity	-
Exp_DQNMORS_2	To investigate the effect of user latent input on the performance of DQNMORS	Mean of Precision, Novelty, and Diversity	No user latent input
Exp_DQNMORS_3	To identify which optimization method contributes better in performance	Mean of Precision, Novelty, and Diversity	Scalarization method

Experiment for DQNMORS

12 of 47

Hyperparameter	DQNMORS_ws_m_u	DQNMORS_pf_m_u	DQNMORS_pf_m
Input Feature	User Feature	User Feature	Movie-rating
Optimization	Weighted Sum	Pareto Filtering	Pareto Filtering
Learning Rate	0.0001
Discount Factor	0.10
Epsilon	3.0
Min. Epsilon	0.5
Epsilon Decay Rate	0.90	0.70
Optimum Epoch Number	50	10

Finalized algorithm settings and optimum hyperparameter values

Exp_DQNMORS_1: Results

13 of 47

Average metrics values of recommendation results obtained by DQNMORS_pf_m_u and DQNMORS_pf_m for of 10 sample users.

Findings:

DQNMORS_pf_m_u achieved better result in terms of precision (19.80%), novelty (20.46%), diversity (1.60% ).
Included user latent in input state provide agent with more information to learn the interest of user.
The hypothesis of incorporating user latent input will benefit the agent in recommendation is accepted.

Exp_DQNMORS_2: Results

Effect of user latent input

User Feature	Movie-rating Feature
User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating	User ID, Rated Movie ID, Rating

DQNMORS_pf_m_u	DQNMORS_pf_m

Compare

14 of 47

Average metrics values of recommendation results obtained by DQNMORS_ws_m_u approach and DQNMORS_pf_m_u approach for of 10 sample users

Findings:

DQNMORS_pf_m_u surpassed the DQNMORS_ws_m_u in terms of average precision and novelty by 13.04% and 2.56% respectively, but 8.42% lower average diversity.
Pareto method is selected as optimization method and it is used in the subsequence experiments.

Exp_DQNMORS_3 – results Scalarize VS Pareto

DQNMORS_ws_m_u	DQNMORS_pf_m_u

Compare

weighted sum method

Pareto method

15 of 47

recDQNMORS

Proposed Approach

16 of 47

RESEARCH METHODOLOGY

17 of 47

Experiment ID	Objective	Metrics	Benchmark
Exp_recDQNMORS_1	To identify optimum values for parameters in algorithm	Mean and standard deviation of Precision, Novelty, and Diversity	-
Exp_recDQNMORS_2	To investigate the influential of learning sequential input data	Mean of Precision, Novelty, and Diversity	DQNMORS
Exp_recDQNMORS_3	To investigate the effect of user latent input on the performance of recDQNMORS	Mean of Precision, Novelty, and Diversity	recDQNMORS with no user latent input

Experiment for recDQNMORS

18 of 47

Hyperparameter	recDQNMORS_pf_m_u	recDQNMORS_pf_m
Input Feature	User Feature	Movie-rating Feature
Optimization	Pareto Filtering
Learning Rate	0.0001
Discount Factor	0.9
Epsilon	3.0
Min. Epsilon	0.5
Epsilon Decay Rate	0.7
Optimum Epoch Number	10

Finalized algorithm settings and optimum hyperparameter values

Exp_recDQNMORS_1: Results

19 of 47

Average metrics values of recommendation results obtained by DQNMORS_pf_m_u approach and recDQNMORS_pf_m_u approach for of 10 sample users.

Findings:

recDQNMORS_pf_m_u (with LSTM layer that taking sequence of rated movies into account) shows precision was lower than DQNMORS_pf_m_u by 17.57% and novelty by 4.68%.
However, recDQNMORS_pf_m_u is surpassed the opponent by 2.66% in diversity.
Result implied that considering past ratings information in sequence using LSTM layer does not contribute to improvement.

Exp_recDQNMORS_2 – learning sequential data result

recDQNMORS_pf_m_u	DQNMORS_pf_m_u

Compare

with LSTM layer

without LSTM

20 of 47

The reason for this rather contradictory result on effect of learning sequential data may due following:

Deficient representation of the input state information learned by recDQNMORS_pf_m_u.
The direct transition information of the user-movie rating across the timestamp may not represent trend about the users’ dynamic preferences.
Only short temporal information is insufficient. Based on a research of RS, it will be better to combine short and long temporal in learning.
For the occasion in movie recommendation, the changing of user-movie rating information did not provide the useful representation to the agent.

DISCUSSION

21 of 47

Average metrics values of recommendation results obtained by recDQNMORS_pf_m_u and recDQNMORS_pf_m for of 10 sample users.

Findings:

recDQNMORS_pf_m_u outperformed the recDQNMORS_pf_m in precision and diversity by 29.38%, and 8.36%, but lower than that by 17.65%.
This outcome support to previous finding and hypothesis which user latent feature make agent perform better.

Exp_recDQNMORS_3 – Effect of user latent input result

User Feature	Movie-rating Feature
User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating	User ID, Rated Movie ID, Rating

recDQNMORS_pf_m_u	recDQNMORS_pf_m

Compare

22 of 47

The average of mean, minimum, and maximum metrics values from all the algorithms on 10 sample users.

PMOEA vs DQNMORS vs recDQNMORS

23 of 47

PMOEA vs DQNMORS vs recDQNMORS

24 of 47

PMOEA vs DQNMORS vs recDQNMORS

25 of 47

Despite PMOEA+CF_User has good performance in precision, but it has rather weak performance in term of novelty and diversity.
All of the proposed DRL based approaches have higher average of mean novelty than PMOEA+CF_User, as well as in all sample users.
In average of maximum novelty, all of the proposed DRL approaches outperformed all of the PMOEA based approaches. As overview, the proposed DQN based approaches has better novelty results compared to PMOEA based approaches.
Both DQNMORS and recDQNMORS approaches have most striking result in diversity of recommendation list as both outperformed all of the PMOEA approaches in all users on average mean diversity.
Both PMOEA+CF_Item and recDQNMORS_pf_m has lowest mean precision when compared among its variants, but they have highest mean novelty among its variant respectively. It is in good agreement to the conflict relationship between novelty with precision (S. Wang et al. 2016)

DISCUSSION

26 of 47

Overall, there is no perfect algorithm that championed all metric together at the same time. For instance, regardless of DQNMORS_ws_m_u scored highest diversity value among the algorithms, it is not highest at precision or novelty, same as PMOEA+CF_User which achieved highest precision, but it is not the best in other metrics.
By summarized the metrics value, the PMOEA + CF_User is outperformed others on precision, and the recDQNMORS_pf_m is excelled in novelty, whereas DQNMORS_ws_m_u achieved top in diversity.
The performance of DQNMORS and recDQNMORS approaches on novelty and diversity are better than PMOEA approaches by sacrificing a certain degree of precision.
The precision was lower than best record from benchmark and is certainly room for improvement. However, the DQNMORS_ws_m_u and DQNMORS_pf_m_u are greatly potential to perform better based on the average of mean and maximum precision score.
This study has identified additional further evidence that DRL approaches have substantial competence in MORS problem.

DISCUSSION

27 of 47

The DRL approaches has comparable results with the benchmark PMOEA. With the advantages of exploration-exploitation by DRL agent, the proposed method able to avoid perplexed at local minimal and achieved better novelty and diversity. All the objectives of the research have been achieved.

As for suggestion to future works:

Perform study on additional feature information such as movie title, genre, with effective embedding technique.
Allow dynamic changes of learning rate across training.
The proposed DQN based approaches still have room of improvement for increase the precision while maintain good track on other evaluation metrics.

Conclusions

29 of 47

Gunantara, N. (2018). A review of multi-objective optimization : Methods and its applications. Cogent Engineering, 5(1), 1–16. https://doi.org/10.1080/23311916.2018.1502242
Weck, O. L. de. (2004). Multiobjective Optimization: History and Promise. Proceedings of 3rd China-Japan- Korea Joint Symposium on Optimization of Structural and Mechanical Systems. https://doi.org/10.1109/TEVC.2009.2017515
Cui, L., Ou, P., Fu, X., Wen, Z., & Lu, N. (2016). A novel multi-objective evolutionary algorithm for recommendation systems. J. Parallel Distrib. Comput. https://doi.org/10.1016/j.jpdc.2016.10.014
Geng, B., Li, L., Jiao, L., Gong, M., Cai, Q., & Wu, Y. (2015). NNIA-RS : A multi-objective optimization based recommender system. Physica A, xxxx. https://doi.org/10.1016/j.physa.2015.01.007
Wang, S., Gong, M., Li, H., & Yang, J. (2016). Multi-objective optimization for long tail recommendation. Knowledge-Based Systems, 104, 145–155. https://doi.org/10.1016/j.knosys.2016.04.018

References

30 of 47

Golbeck, J., & Kuter, U. (2009). Trust Metrics in Recommender Systems. Computing with Social Trust, 169–181. https://doi.org/10.1007/978-1-84800-356-9
Hyup, T., Joo, K., & Han, I. (2003). The collaborative filtering recommendation based on SOM cluster-indexing CBR. Expert Systems with Applications, 25, 413–423. https://doi.org/10.1016/S0957-4174(03)00067-8
Manouselis, N., Verbert, K., Drachsler, H., & Santos, O. C. (2010). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. RecSys’10 - Proceedings of the 4th ACM Conference on Recommender Systems, 17(6), 377. https://doi.org/10.1145/1864708.1864797
Abbas, A., Zhang, L., & Khan, S. U. (2015). A survey on context-aware recommender systems based on computational intelligence techniques. Computing. https://doi.org/10.1007/s00607-015-0448-7
Horvath, T., & Carvalho, A. C. P. L. F. de. (2016). Evolutionary computing in recommender systems : a review of recent research. Natural Computing, 16, 441–462. https://doi.org/10.1007/s11047-016-9540-y

References

31 of 47

Zuo, Y., Gong, M., Zeng, J., & Jiao, L. (2015). Personalized Recommendation Based on Evolutionary Multi-Objective Optimization. IEEE Computational Intelligence Magazine, 10(1), 52–62. https://doi.org/10.1109/MCI.2014.2369894
Hribar, J., Marinescu, A., Ropokis, G. A., & Dasilva, L. A. (2019). Using Deep Q-learning To Prolong the Lifetime of Correlated Internet of Things Devices. 2019 IEEE International Conference on Communications Workshops (ICC Workshops). https://doi.org/10.1109/ICCW.2019.8756759
Liao, H. L., Member, S., Wu, Q. H., Member, S., & Jiang, L. (2010). Multi-Objective Optimization by Reinforcement Learning For Power System Dispatch and Voltage Stability. 2010 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT Europe), 1–8. https://doi.org/10.1109/ISGTEUROPE.2010.5638914
Wang, Y., Liu, H., Zheng, W., Xia, Y., Li, Y., Chen, P., Guo, K., & Xie, H. (2019). Multi-Objective Workflow Scheduling With Reinforcement Learning. IEEE Access, 7, 39974–39982. https://doi.org/10.1109/ACCESS.2019.2902846
Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., & Zhang, Y. (2018). Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling. CoRR, abs/1810.1.

References

32 of 47

Lee, J., Oh, B., Yang, J., & Park, U. (2016). RLCF : A Collaborative Filtering Approach Based on Reinforcement Learning With Sequential Ratings. Intelligent Automation & Soft Computing, 8587(October), 1–6. https://doi.org/10.1080/10798587.2016.1231510

References

34 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning learning rate

35 of 47

Throughout this experiment, the α = 0.0001 allowed agent perform better and it is selected as optimum value for both DQNMORS algorithms because the performance in precision and diversity is better compared to larger learning rate.

Exp_DQNMORS_1 – tuning learning rate

36 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning discount factor

37 of 47

Therefore, through this experiment, the γ = 0.1 is selected as the optimum value for discount factor hyperparameter.

Exp_DQNMORS_1 – tuning discount factor

38 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon used by DQNMORS_ws_m_u and DQNMORS_pf_m_u

Exp_DQNMORS_1 – tuning minimum epsilon

39 of 47

In summary, the ε_min = 0.5 is considered as optimum value of minimum epsilon because it provides ample exploration rate to agent meanwhile maintain exploitation.

Exp_DQNMORS_1 – tuning minimum epsilon

40 of 47

Average of cumulative average for total reward obtained by agents in DQNMORS_ws_m_u and DQNMORS_pf_m_u approaches

Based on the results, the DQNMORS_ws_m_u agent achieved saturated performance before epoch 50 as shown by the average of cumulative average of total reward.

For DQNMORS_pf_m_u, the cumulative average of reward mostly shows slightly decreased after epoch 10. The agent is considered reached optimum performance around epoch 10 and further training considered be trivial. For that reason, the best epoch number for DQNMORS_ws_m_u and DQNMORS_pf_m_u is set to 50 and 10 respectively.

Exp_DQNMORS_1 – tuning epoch

41 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate in recDQNMORS_pf_m_u.

By taking consideration of overall performance, the α = 0.0001 is selected as optimum value for learning rate hyperparameter

Exp_recDQNMORS_1 – Tuning learning rate

42 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor in recDQNMORS_pf_m_u

The γ = 0.9 is selected as optimum value for the discount factor hyperparameter

Exp_recDQNMORS_1 – Tuning discount factor

43 of 47

Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon value in recDQNMORS_pf_m_u.

the agent with εmin = 0.5 shows the better results and it decided as optimum value of minimum epsilon value for recDQNMORS_pf_m_u

Exp_recDQNMORS_1 – Tuning minimum epsilon

44 of 47

Average of cumulative average of total reward obtained by recDQNMORS agent

The optimum epoch for recDQNMORS_pf_m_u algorithm is 10 epochs under the optimum hyperparameter setting.

Exp_recDQNMORS_1 – Tuning training epoch

45 of 47

Average	PMOEA + ProbS	PMOEA + CF_User	PMOEA + CF_Item	DQNMORS_ws_m_u	DQNMORS_pf_m_u	DQNMORS_pf_m	recDQNMORS_pf_m_u	recDQNMORS_pf_m
Mean Precision	0.418	0.500	0.368	0.191	0.216	0.174	0.178	0.126
Mean Novelty	1.957	2.307	3.104	3.086	3.165	2.517	3.016	3.558
Mean Diversity	2.925	2.833	2.427	5.756	5.271	5.187	5.412	4.959

The result for average of mean metrics values by algorithms.

46 of 47

Average	PMOEA + ProbS	PMOEA + CF_User	PMOEA + CF_Item	DQNMORS_ws_m_u	DQNMORS_pf_m_u	DQNMORS_pf_m	recDQNMORS_pf_m_u	recDQNMORS_pf_m
Maximum Precision	0.580	0.650	0.560	0.500	0.460	0.370	0.350	0.310
Maximum Novelty	2.386	2.681	3.602	4.489	4.159	3.927	4.784	5.422
Maximum Diversity	3.784	3.638	3.493	8.229	7.646	7.678	8.504	8.123

The result for average of maximum metrics values by algorithms.

47 of 47

The result for average of minimum metrics values by algorithms

Average	PMOEA + ProbS	PMOEA + CF_User	PMOEA + CF_Item	DQNMORS_ws_m_u	DQNMORS_pf_m_u	DQNMORS_pf_m	recDQNMORS_pf_m_u	recDQNMORS_pf_m
Minimum Precision	0.230	0.330	0.170	0.030	0.050	0.020	0.040	0.010
Minimum Novelty	1.689	2.051	2.686	1.768	1.615	1.871	2.002	2.558
Minimum Diversity	1.944	1.879	1.432	3.232	3.432	2.602	2.556	2.364

1 of 47

2 of 47

3 of 47

4 of 47

5 of 47

6 of 47

7 of 47

8 of 47

9 of 47

10 of 47

11 of 47

12 of 47

13 of 47

14 of 47

15 of 47

16 of 47

17 of 47

18 of 47

19 of 47

20 of 47

21 of 47

22 of 47

23 of 47

24 of 47

25 of 47

26 of 47

27 of 47

28 of 47

29 of 47

30 of 47

31 of 47

32 of 47

33 of 47

34 of 47

35 of 47

36 of 47

37 of 47

38 of 47

39 of 47

40 of 47

41 of 47

42 of 47

43 of 47

44 of 47

45 of 47

46 of 47

47 of 47