1 of 27

Comparison between LQR and DQN for Cartpole

UNIST RML-Seok Ju Lee

2 of 27

CONTENTS

Dynamics of Cartpole

01

Control Cartpole using LQR

02

Control Cartpole using DQN

03

Comparison of Result

04

Future Plans

05

3 of 27

  1. Systems of Cartpole
  2. Dynamics of Cartpole
  3. State Space for Cartpole

Dynamics of Cartpole

01

4 of 27

1. System of Cartpole

Cart

Pole

No Friction

 

 

 

5 of 27

2. Dynamics of Cartpole

 

 

 

6 of 27

3. State Space for Cartpole

 

We linearized the nonlinear system of Cartpole to control using LQR

At this time, we set the pendulum equilibrium point is upward

 

7 of 27

  1. Introduction to LQR
  2. Design the LQR controller for Cartpole
  3. Simulation Result of Cartpole using LQR

Control Cartpole using LQR

02

8 of 27

1. Introduction to LQR

 

 

 

 

 

+

-

e

r

9 of 27

1. Introduction to LQR

 

: State Space

 

: Cost

The goal of LQR Controller: Minimize the Cost

 

 

Iterate solving ARE until K converges

 

10 of 27

2. Design the LQR controller for Cartpole

 

 

 

11 of 27

2. Design the LQR controller for Cartpole

 

 

12 of 27

3. Simulation Result of Cartpole using LQR

LQR

No Input

13 of 27

3. Simulation Result of Cartpole using LQR

 

14 of 27

3. Simulation Result of Cartpole using LQR

 

15 of 27

  1. Introduction to DQN
  2. Design the DQN for Cartpole
  3. Simulation Result of Cartpole using DQN

Control Cartpole using DQN

03

16 of 27

1. Introduction to DQN

State

Deep Q Learning

.

.

.

.

.

.

Q-value Action 1

Q-value Action 2

Q-value Action N

.

.

.

17 of 27

1. Introduction to DQN

Reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (Dec 2013). Playing Atari with deep reinforcement learning. Technical Report arXiv:1312.5602 [cs.LG], Deepmind Technologies.

Pseudo Code of DQN

18 of 27

2. Design the DQN for Cartpole

Replaybuffer

Dqn_learn

Dqn_main

Dqn_play

19 of 27

2. Design the DQN for Cartpole

Discount rate: confidence in the future

The data sample size each batch

The size of buffer which save the experience

Learning rate: confidence in the future current experience

The weight of target actor

 

20 of 27

3. Simulation Result of Cartpole using DQN

After Learning (Episode 500)

Before Learning

21 of 27

3. Simulation Result of Cartpole using DQN

  • This graph shows that the reward value changes as the episode increases.

  • By setting the maximum reward to 500, it can be seen that the value of the reward is initially obtained small and then the maximum reward is reached.

  • The values obtained in the following graph are saved as experiences.

22 of 27

3. Simulation Result of Cartpole using DQN

  • This graph shows that the changes of angle.

  • Unlike LQR control, DQN does not converge to zero even if the system is stabilized because dynamics is not considered at all and is only experience-dependent control.

  • Unlike LQR control, DQN can be controlled close to zero from the beginning.

23 of 27

  1. Advantage/Disadvantage of LQR
  2. Advantage/Disadvantage of DQN

Comparison of Result

04

24 of 27

1. Advantage/Disadvantage of LQR

Advantage

  • In a simple system, optimal control can be obtained by adjusting gain.

  • Through the setting of Q and R, it is possible to determine whether input or state should be more weighted.

  • Unlike PID control, which was the existing output feedback controller, it is a state feedback controller, so it can be obtained only by using the system matrix (A, B), not by trial and error.

Disadvantage

  • It is difficult to calculate the system when complex dynamics come out.

  • The difficulty level of the controller operation may vary depending on the state. For example, because of the 3D system, x, y, z position, x, y, z linear velocity, roll, pitch, yaw, angular velocity of roll, pitch, and yaw can be state, and the computational difficulty can be very complicated. In this case, it is difficult to calculate the ARE, and it is not easy to apply the LQR.

25 of 27

2. Advantage/Disadvantage of DQN

Advantage

  • Unlike LQR, it is a method that is controlled only by learning, so there is no need to know the surrounding environment, and accordingly, it is possible to solve the problem without any understanding of dynamics or kinematics.

  • As with the previous simulation results, it can be confirmed that DQN achieves the target value almost from the beginning when taking an action after all learning.

Disadvantage

  • In Cartpole examples, there was no problem because it did not use much data, but in general, DQN uses replay memory, which requires large memory space and uses old data for learning.

  • As shown in the simulation results, since it relies solely on learning without considering dynamics, it can be seen that it continues to vibrate finely around zero.

26 of 27

  1. Implementation for Double Pendulum on Cart

  • Using DQN, Control the complicated system which cannot solve the dynamics

Future Plan

05

27 of 27

THANK YOU