Lecture 14๏ฟฝContinuous Q learning ๏ฟฝ๏ฟฝ
Sookyung Kim๏ฟฝ
1
Tentative Schedules
Online Video Lecture
2
RECAP: Q-learning
Transition, and aโ doesnโt have to be ๏ฟฝsampled from current policy:๏ฟฝoff-policy RL
3
Recap: Problems of online Q-learning algorithm
(1) Correlated Samples
(2) Target changes as Q changes
4
Recap: Correlated samples in online Q-learning - Experience replay buffer
5
Recap - Target Changes as Q Changes: ๏ฟฝTarget Network
Target (Bellman)
Prediction
6
Recap - Target Changes as Q Changes: ๏ฟฝTarget Network
7
Q-Learning with Continuous Actions
8
Whatโs the problem with continuous action?
9
Optimization after๏ฟฝ discretization of continuous action
10
Use function class that is easy to optimize
11
Learn an approximate maximizer
12
Learn an approximate maximizer
~ NFQCA, TD3, SAC
13
Summary of DDPG
14
Summary of DDPG
15