Comparison between LQR and DQN for Cartpole
UNIST RML-Seok Ju Lee
CONTENTS
Dynamics of Cartpole
01
Control Cartpole using LQR
02
Control Cartpole using DQN
03
Comparison of Result
04
Future Plans
05
Dynamics of Cartpole
01
1. System of Cartpole
Cart
Pole
No Friction
2. Dynamics of Cartpole
3. State Space for Cartpole
We linearized the nonlinear system of Cartpole to control using LQR
At this time, we set the pendulum equilibrium point is upward
Control Cartpole using LQR
02
1. Introduction to LQR
+
-
e
r
1. Introduction to LQR
: State Space
: Cost
The goal of LQR Controller: Minimize the Cost
Iterate solving ARE until K converges
2. Design the LQR controller for Cartpole
2. Design the LQR controller for Cartpole
3. Simulation Result of Cartpole using LQR
LQR
No Input
3. Simulation Result of Cartpole using LQR
3. Simulation Result of Cartpole using LQR
Control Cartpole using DQN
03
1. Introduction to DQN
State
Deep Q Learning
.
.
.
.
.
.
Q-value Action 1
Q-value Action 2
Q-value Action N
.
.
.
1. Introduction to DQN
Reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (Dec 2013). Playing Atari with deep reinforcement learning. Technical Report arXiv:1312.5602 [cs.LG], Deepmind Technologies.
Pseudo Code of DQN
2. Design the DQN for Cartpole
Replaybuffer
Dqn_learn
Dqn_main
Dqn_play
2. Design the DQN for Cartpole
Discount rate: confidence in the future
The data sample size each batch
The size of buffer which save the experience
Learning rate: confidence in the future current experience
The weight of target actor
3. Simulation Result of Cartpole using DQN
After Learning (Episode 500)
Before Learning
3. Simulation Result of Cartpole using DQN
3. Simulation Result of Cartpole using DQN
Comparison of Result
04
1. Advantage/Disadvantage of LQR
Advantage
Disadvantage
2. Advantage/Disadvantage of DQN
Advantage
Disadvantage
Future Plan
05
THANK YOU