1 of 35

Lecture 2��Introduction to deep reinforcement learning

1

Instructor: Ercan Atam

Institute for Data Science & Artificial Intelligence

Course: DSAI 642- Advanced Reinforcement Learning

2 of 35

2

List of contents for this lecture

Recap for value functions and policy

Classification of deep RL algorithms

Deep learning for RL

Deep RL versus supervised learning

3 of 35

3

Relevant readings for this lecture

Chapter 1 of Laura Graesser and Wah Loon Keng, “Foundations of Deep Reinforcement Learning:

Theory and Practice in Python”, Addison-Wesley Professional, 2019.

4 of 35

4

Value functions

5 of 35

5

What do we try to learn in RL?

6 of 35

6

What is deep reinforcement learning?

7 of 35

7

Families of deep reinforcement learning algorithms

Correspondingly, there are three major families of deep reinforcement learning algorithms:

policy-based,
value-based,
model-based,

which learn policies, value functions, and models, respectively.

Note: There are also “combined methods” in which agents learn more than one of these functions.

—For instance, a policy and a value function, or a value function and a model

8 of 35

8

Deep Reinforcement learning algorithm examples

Figure below gives an overview of the major deep reinforcement learning algorithms in each family and how they are related.

-iLQR: Iterative Linear Quadratic Regulation

-MPC: Model Predictive Control

-MCTS: Monte Carlo Tree Search

Model-based

-Actor-Critic A2C, A3C, GAE

-TRPO: Trust Region Policy Optimization

-PPO: Proximal Policy Optimization

-SAC: Soft actor-Critic

Combined methods:

Value and Policy

-Dyna-Q/Dayna/AC

-AlphaZero

-I2A: Imagination Augmented Agents

-VPN: Value prediction networks

-Deep SARSA

-DQN: Deep Q Networks

-Double DQN

-DQN +Prioritized experience replay

Value-based

-Reinforce

Policy-based

Combined methods:

Model+Value and/or Policy

9 of 35

9

Classification of deep reinforcement learning algorithms

10 of 35

10

Value-based algorithms

Value-Based Algorithms

-Deep SARSA

-DQN: Deep Q Networks

-Double DQN

-DQN +Prioritized experience replay

Value-based

11 of 35

11

Policy-based algorithms

Policy-Based Algorithms

-Reinforce

Policy-based

12 of 35

12

Value-based versus policy-based algorithms

13 of 35

13

Model-based algorithms (1)

-iLQR: Iterative Linear Quadratic Regulation

-MPC: Model Predictive Control

-MCTS: Monte Carlo Tree Search

Model-based

Model-Based Algorithms

14 of 35

14

Model-based algorithms (2)

15 of 35

15

Model-based algorithms (3)

16 of 35

16

Hybrid (combined) algorithms (1)

-Actor-Critic A2C, A3C, GAE

-TRPO: Trust Region Policy Optimization

-PPO: Proximal Policy Optimization

-SAC: Soft actor-Critic

Combined methods:

Value and Policy

-Dyna-Q/Dayna/AC

-AlphaZero

-I2A: Imagination Augmented Agents

-VPN: Value prediction networks

Combined methods:

Model+Value and/or Policy

Combined Algorithms

17 of 35

17

Hybrid (combined) algorithms (2)

18 of 35

18

On-policy versus Off-policy

A final important distinction between deep reinforcement learning algorithms is whether

they are on-policy or off-policy. This affects how training iterations make use of data.

19 of 35

19

On-policy deep reinforcement learning algorithms

20 of 35

20

Off-policy deep reinforcement learning algorithms

21 of 35

21

Deep learning for reinforcement learning

Artificial Neural networks were first combined with reinforcement learning to great effect in 1991 when Gerald Tesauro trained a neural network using reinforcement learning to play master-level backgammon.

However, it wasn’t until 2015 when DeepMind achieved human-level performance on many of the Atari games that they became widely adopted in this field as the underlying function approximation technique.

Since then, all of the major breakthroughs in reinforcement learning have used artificial neural networks to approximate functions.

22 of 35

22

Recap: ANNs and their training procedure (1)

23 of 35

23

Recap: ANNs and their training procedure (2)

24 of 35

24

How is deep learning used in reinforcement learning? (1)

A-) Policy parametrization

ANN

25 of 35

25

How is deep learning used in reinforcement learning? (2)

B-) Action-value parametrization

OR

(Less efficient. Why?)

26 of 35

26

Challenges of using ANN-based function approximation in RL (1)

27 of 35

27

Challenges of using ANN-based function approximation in RL (2)

28 of 35

28

Challenges of using ANN-based function approximation in RL (3)

29 of 35

29

Challenges of using ANN-based function approximation in RL (4)

30 of 35

30

Deep reinforcement learning versus supervised learning

31 of 35

31

DRL versus SL: lack of an oracle (1)

32 of 35

32

DRL versus SL: lack of an oracle (2)

33 of 35

33

DRL versus SL: sparsity of feedback

34 of 35

34

DRL versus SL: data generation

35 of 35

References �(utilized for preparation of lecture notes or Matlab code)

Laura Graesser and Wah Loon Keng, “Foundations of Deep Reinforcement Learning: Theory and Practice in Python”, Addison-Wesley Professional, 2019.
Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction”, Second Edition, MIT Press, Cambridge, MA, 2018.

35