1 of 19

Lecture 7��Deep Deterministic Policy Gradient (DDPG) Method

1

Instructor: Ercan Atam

Institute for Data Science & Artificial Intelligence

Course: DSAI 642- Advanced Reinforcement Learning

2 of 19

2

List of contents for this lecture

  • The need for DDPG method

  • The intuition behind DDPG method

  • The math behind DDPG method

  • Pseudocode of DDPG method

  • Advantages/disadvantages of DDPG method

3 of 19

3

Relevant readings/videos for this lecture

(Some slides are modified/improved versions from here)

(very good and detailed lecture on DDPG!)

  • Chapter 12 of Miguel Mirales, “Grokking Deep Reinforcement Learning”, Manning, 2020

4 of 19

4

What is the DDPG method?

 

5 of 19

5

The intuition behind the DDPG method

6 of 19

6

Generalizing DQN to continuous actions (1)

7 of 19

7

Generalizing DQN to continuous actions (2)

}

8 of 19

8

From DQN to DDPG

9 of 19

9

The Q-Learning side of the DDPG method (1)

10 of 19

10

The Q-Learning side of the DDPG method (2)

11 of 19

11

The policy learning side of DDPG (1)

12 of 19

12

The policy learning side of DDPG (2)

13 of 19

13

The policy learning side of DDPG (3)

14 of 19

14

Exploration-Exploitation in the DDPG method

15 of 19

15

DDPG algorithm

16 of 19

16

DDPG algorithm explained visually

17 of 19

17

+s, -s

18 of 19

18

Summary

19 of 19

References �(utilized for preparation of lecture notes or MATLAB code)

19