DEEP REINFORCEMENT LEARNING APPROACHES FOR MULTI-OBJECTIVE PROBLEM IN RECOMMENDATION SYSTEM
Master by research in Intelligent System�Supervisor: Assoc. Prof. Dr. Nurfadhlina Mohd. Sharef
Co-Supervisor: Assoc. Prof. Dr. Razali Yaakob
Dr. Khairul Azhar Kasmiran
Presentation by�EE YEO KEAT
GS56054
VIVA VOCE
Outline
Introduction
Method | Equation |
Scalarization: transform all MO functions into single scalar fitness function by summed up with varying assigned weights. | |
Pareto: all the solution vectors are assembled by non-dominated solutions and dominated solutions in objective space. | |
Research Problem, Objective, Contribution
Problem Statement | Objective | Contribution |
Most of the existing MORSs applied CF combined with EC and only focus on dual-objective (Cui et al., 2016; Geng et al., 2015; S. Wang et al., 2016). CF method has cold start issue and vulnerable to highly sparse data. (Golbeck & Kuter, 2009; Hyup et al., 2003; Manouselis et al., 2010) It also must relied on EC for optimization task despite the EC is suffering premature convergence issue. (Abbas et al., 2015;Horvath & Carvalho, 2016) Furthermore, the combination has difficulty to incorporate side features which provide more advantages in learning. |
|
|
MORS Approaches | Precision | Novelty | Diversity | Strengths | Weaknesses |
CF + NNIA (Geng et al., 2015) | Yes | No | Yes | Able to search near-optimal solution from large population. |
|
CF + NSGA (Zuo et al., 2015) | Yes | No | Yes | ||
CF + Decomposition-based MOEA (MOEA/D) (S. Wang et al., 2016) | Yes | Yes | No | Lower computational complexity compared to NSGA method. | |
CF + PMOEA (Cui et al., 2016) | Yes | Yes | Yes | Handle three objective optimization. |
Literature Review
Literature Review
No | DRL Approaches | Application | References |
1. | DQN based approach associated with scalarized methods extended sensor’s lifetime by three times longer | Internet-of-Things (IoT) devices | Hribar et al., 2019 |
2. | DRL technique surpassed the EC in term of pareto optimality on MO optimization between power controlling and voltage stability | Power management | Liao et al., 2010 |
3. | DRL approach is achieved more promising results in MO optimization problem compared to the genetic algorithms in aspects of the solution convergence, sparsity data handling and computing time. | Multi-objective travelling salesman problem (MOTSP) | K. Li et al., 2020 |
4. | DQN algorithm managed to correlated equilibrium solutions of workflow scheduling and the performance surpassed the EC algorithms included NSGA-II | Multi-workflows controlling over infrastructure-as-a-service clouds | Y. Wang et al., 2019 |
Recently some DRL algorithms that designed for solving MO sequential decision-making problems in various fields have been presented.
Hypothesis | Literature | References |
Incorporate user latent input will benefit the agent learn more context about user-item relationship and generate better result. | By including these additional features as latent states, the quality of personalized recommendation potentially to be improved since different users may have significantly diverse tastes and behaviours | S. Chen et al., 2018; Lei & Li, 2019 |
Learning sequential rating data will enhance agent to learn the trend of the data then generate better recommendation result. | One of the scholar highlighted that the order of past rating data will affect the current ratings significantly, and is thus potentially to elicit more accurate predictions | Lee et al., 2016 |
Hypothesis
Research Methodology
Proposed Approach
DQNMORS
Hyperparameter | Experiment Values |
Learning Rate | 0.01, 0.001, 0.0001 |
Discount Factor | 0.1, 0.3, 0.6, 0.9 |
Minimum Epsilon | 0.1, 0.5, 0.9 |
Training Epoch Number | Max. 60 - scalarization method Max. 15 - Pareto method |
Analysis on average value and standard deviation of precision, novelty, diversity for 10 sample users
Recommended movieID for all users
Compute Precision, Novelty and diversity for each recommendation list
Input State
Optimum learning rate, discount factor, minimum epsilon, training epoch for algorithm
User features
Movie-rating features
User 1 |
[131, 366, 184, 203, 49, 257, 121, 99, 654, 237] |
User 2 |
[121, 99, 143, 184, 366, 203, 257, 131, 237, 49] |
… |
User N |
[257, 366, 121, 49, 99, 131, 143, 237, 184, 203] |
User 1 | Metrics |
[131, 366, 184, 203, 49, 257, 121, 99, 654, 237] | p = 0.500, n = 5.653, d = 6.823 |
User 2 | Metrics |
[121, 99, 143, 184, 366, 203, 257, 131, 237, 49] | p = 0.800, n = 2.590, d = 3.651 |
… | |
User N | Metrics |
[257, 366, 121, 49, 99, 131, 143, 237, 18, 203] | p = 0.300, n = 4.662, d = 5.216 |
Pareto method
Scalarization method
Tuning
Optimization
Experiment for DQNMORS
Experiment ID | Objective | Metrics | Benchmark |
Exp_DQNMORS_1 | To identify optimum values for hyperparameters in algorithm | Mean and standard deviation of Precision, Novelty, and Diversity | - |
Exp_DQNMORS_2 | To investigate the effect of user latent input on the performance of DQNMORS | Mean of Precision, Novelty, and Diversity | No user latent input |
Exp_DQNMORS_3 | To identify which optimization method contributes better in performance | Mean of Precision, Novelty, and Diversity | Scalarization method |
Experiment for DQNMORS
Hyperparameter | DQNMORS_ws_m_u | DQNMORS_pf_m_u | DQNMORS_pf_m |
Input Feature | User Feature | User Feature | Movie-rating |
Optimization | Weighted Sum | Pareto Filtering | Pareto Filtering |
Learning Rate | 0.0001 | ||
Discount Factor | 0.10 | ||
Epsilon | 3.0 | ||
Min. Epsilon | 0.5 | ||
Epsilon Decay Rate | 0.90 | 0.70 | |
Optimum Epoch Number | 50 | 10 | |
Finalized algorithm settings and optimum hyperparameter values
Exp_DQNMORS_1: Results
Average metrics values of recommendation results obtained by DQNMORS_pf_m_u and DQNMORS_pf_m for of 10 sample users.
Findings:
Exp_DQNMORS_2: Results
Effect of user latent input
User Feature | Movie-rating Feature |
User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating | User ID, Rated Movie ID, Rating |
DQNMORS_pf_m_u | DQNMORS_pf_m |
Compare
Average metrics values of recommendation results obtained by DQNMORS_ws_m_u approach and DQNMORS_pf_m_u approach for of 10 sample users
Findings:
Exp_DQNMORS_3 – results Scalarize VS Pareto
DQNMORS_ws_m_u | DQNMORS_pf_m_u |
Compare
weighted sum method
Pareto method
recDQNMORS
Proposed Approach
RESEARCH METHODOLOGY
Experiment ID | Objective | Metrics | Benchmark |
Exp_recDQNMORS_1 | To identify optimum values for parameters in algorithm | Mean and standard deviation of Precision, Novelty, and Diversity | - |
Exp_recDQNMORS_2 | To investigate the influential of learning sequential input data | Mean of Precision, Novelty, and Diversity | DQNMORS |
Exp_recDQNMORS_3 | To investigate the effect of user latent input on the performance of recDQNMORS | Mean of Precision, Novelty, and Diversity | recDQNMORS with no user latent input |
Experiment for recDQNMORS
Hyperparameter | recDQNMORS_pf_m_u | recDQNMORS_pf_m |
Input Feature | User Feature | Movie-rating Feature |
Optimization | Pareto Filtering | |
Learning Rate | 0.0001 | |
Discount Factor | 0.9 | |
Epsilon | 3.0 | |
Min. Epsilon | 0.5 | |
Epsilon Decay Rate | 0.7 | |
Optimum Epoch Number | 10 | |
Finalized algorithm settings and optimum hyperparameter values
Exp_recDQNMORS_1: Results
Average metrics values of recommendation results obtained by DQNMORS_pf_m_u approach and recDQNMORS_pf_m_u approach for of 10 sample users.
Findings:
Exp_recDQNMORS_2 – learning sequential data result
recDQNMORS_pf_m_u | DQNMORS_pf_m_u |
Compare
with LSTM layer
without LSTM
The reason for this rather contradictory result on effect of learning sequential data may due following:
DISCUSSION
Average metrics values of recommendation results obtained by recDQNMORS_pf_m_u and recDQNMORS_pf_m for of 10 sample users.
Findings:
Exp_recDQNMORS_3 – Effect of user latent input result
User Feature | Movie-rating Feature |
User Attributes (Age, Gender, Occupation, Zip code), Rated Movie ID, Rating | User ID, Rated Movie ID, Rating |
recDQNMORS_pf_m_u | recDQNMORS_pf_m |
Compare
The average of mean, minimum, and maximum metrics values from all the algorithms on 10 sample users.
PMOEA vs DQNMORS vs recDQNMORS
PMOEA vs DQNMORS vs recDQNMORS
PMOEA vs DQNMORS vs recDQNMORS
DISCUSSION
DISCUSSION
The DRL approaches has comparable results with the benchmark PMOEA. With the advantages of exploration-exploitation by DRL agent, the proposed method able to avoid perplexed at local minimal and achieved better novelty and diversity. All the objectives of the research have been achieved.
As for suggestion to future works:
Conclusions
Thank You
References
References
References
References
APPENDICES
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate used by DQNMORS_ws_m_u and DQNMORS_pf_m_u
Exp_DQNMORS_1 – tuning learning rate
Throughout this experiment, the α = 0.0001 allowed agent perform better and it is selected as optimum value for both DQNMORS algorithms because the performance in precision and diversity is better compared to larger learning rate.
Exp_DQNMORS_1 – tuning learning rate
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor used by DQNMORS_ws_m_u and DQNMORS_pf_m_u
Exp_DQNMORS_1 – tuning discount factor
Therefore, through this experiment, the γ = 0.1 is selected as the optimum value for discount factor hyperparameter.
Exp_DQNMORS_1 – tuning discount factor
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon used by DQNMORS_ws_m_u and DQNMORS_pf_m_u
Exp_DQNMORS_1 – tuning minimum epsilon
In summary, the εmin = 0.5 is considered as optimum value of minimum epsilon because it provides ample exploration rate to agent meanwhile maintain exploitation.
Exp_DQNMORS_1 – tuning minimum epsilon
Average of cumulative average for total reward obtained by agents in DQNMORS_ws_m_u and DQNMORS_pf_m_u approaches
Based on the results, the DQNMORS_ws_m_u agent achieved saturated performance before epoch 50 as shown by the average of cumulative average of total reward.
For DQNMORS_pf_m_u, the cumulative average of reward mostly shows slightly decreased after epoch 10. The agent is considered reached optimum performance around epoch 10 and further training considered be trivial. For that reason, the best epoch number for DQNMORS_ws_m_u and DQNMORS_pf_m_u is set to 50 and 10 respectively.
Exp_DQNMORS_1 – tuning epoch
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different learning rate in recDQNMORS_pf_m_u.
By taking consideration of overall performance, the α = 0.0001 is selected as optimum value for learning rate hyperparameter
Exp_recDQNMORS_1 – Tuning learning rate
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different discount factor in recDQNMORS_pf_m_u
The γ = 0.9 is selected as optimum value for the discount factor hyperparameter
Exp_recDQNMORS_1 – Tuning discount factor
Average metrics values and its standard deviation of 10 sample users’ recommendation list against different minimum epsilon value in recDQNMORS_pf_m_u.
the agent with εmin = 0.5 shows the better results and it decided as optimum value of minimum epsilon value for recDQNMORS_pf_m_u
Exp_recDQNMORS_1 – Tuning minimum epsilon
Average of cumulative average of total reward obtained by recDQNMORS agent
The optimum epoch for recDQNMORS_pf_m_u algorithm is 10 epochs under the optimum hyperparameter setting.
Exp_recDQNMORS_1 – Tuning training epoch
Average | PMOEA + ProbS | PMOEA + CF_User | PMOEA + CF_Item | DQNMORS_ws_m_u | DQNMORS_pf_m_u | DQNMORS_pf_m | recDQNMORS_pf_m_u | recDQNMORS_pf_m |
Mean Precision | 0.418 | 0.500 | 0.368 | 0.191 | 0.216 | 0.174 | 0.178 | 0.126 |
Mean Novelty | 1.957 | 2.307 | 3.104 | 3.086 | 3.165 | 2.517 | 3.016 | 3.558 |
Mean Diversity | 2.925 | 2.833 | 2.427 | 5.756 | 5.271 | 5.187 | 5.412 | 4.959 |
The result for average of mean metrics values by algorithms.
Average | PMOEA + ProbS | PMOEA + CF_User | PMOEA + CF_Item | DQNMORS_ws_m_u | DQNMORS_pf_m_u | DQNMORS_pf_m | recDQNMORS_pf_m_u | recDQNMORS_pf_m |
Maximum Precision | 0.580 | 0.650 | 0.560 | 0.500 | 0.460 | 0.370 | 0.350 | 0.310 |
Maximum Novelty | 2.386 | 2.681 | 3.602 | 4.489 | 4.159 | 3.927 | 4.784 | 5.422 |
Maximum Diversity | 3.784 | 3.638 | 3.493 | 8.229 | 7.646 | 7.678 | 8.504 | 8.123 |
The result for average of maximum metrics values by algorithms.
The result for average of minimum metrics values by algorithms
Average | PMOEA + ProbS | PMOEA + CF_User | PMOEA + CF_Item | DQNMORS_ws_m_u | DQNMORS_pf_m_u | DQNMORS_pf_m | recDQNMORS_pf_m_u | recDQNMORS_pf_m |
Minimum Precision | 0.230 | 0.330 | 0.170 | 0.030 | 0.050 | 0.020 | 0.040 | 0.010 |
Minimum Novelty | 1.689 | 2.051 | 2.686 | 1.768 | 1.615 | 1.871 | 2.002 | 2.558 |
Minimum Diversity | 1.944 | 1.879 | 1.432 | 3.232 | 3.432 | 2.602 | 2.556 | 2.364 |