1 of 35

Lecture 2��Introduction to deep reinforcement learning

1

Instructor: Ercan Atam

Institute for Data Science & Artificial Intelligence

Course: DSAI 642- Advanced Reinforcement Learning

2 of 35

2

List of contents for this lecture

  • Recap for value functions and policy

  • Classification of deep RL algorithms

  • Deep learning for RL

  • Deep RL versus supervised learning

3 of 35

3

Relevant readings for this lecture

  • Chapter 1 of Laura Graesser and Wah Loon Keng, “Foundations of Deep Reinforcement Learning:

Theory and Practice in Python”, Addison-Wesley Professional, 2019.

4 of 35

4

Value functions

5 of 35

5

What do we try to learn in RL?

6 of 35

6

What is deep reinforcement learning?

7 of 35

7

Families of deep reinforcement learning algorithms

Correspondingly, there are three major families of deep reinforcement learning algorithms:

    • policy-based,
    • value-based,
    • model-based,

which learn policies, value functions, and models, respectively.

Note: There are also “combined methods” in which agents learn more than one of these functions.

—For instance, a policy and a value function, or a value function and a model

8 of 35

8

Deep Reinforcement learning algorithm examples

Figure below gives an overview of the major deep reinforcement learning algorithms in each family and how they are related.

-iLQR: Iterative Linear Quadratic Regulation

-MPC: Model Predictive Control

-MCTS: Monte Carlo Tree Search

Model-based

-Actor-Critic A2C, A3C, GAE

-TRPO: Trust Region Policy Optimization

-PPO: Proximal Policy Optimization

-SAC: Soft actor-Critic

Combined methods:

Value and Policy

-Dyna-Q/Dayna/AC

-AlphaZero

-I2A: Imagination Augmented Agents

-VPN: Value prediction networks

-Deep SARSA

-DQN: Deep Q Networks

-Double DQN

-DQN +Prioritized experience replay

Value-based

-Reinforce

Policy-based

Combined methods:

Model+Value and/or Policy

9 of 35

9

Classification of deep reinforcement learning algorithms

10 of 35

10

Value-based algorithms

Value-Based Algorithms

-Deep SARSA

-DQN: Deep Q Networks

-Double DQN

-DQN +Prioritized experience replay

Value-based

11 of 35

11

Policy-based algorithms

Policy-Based Algorithms

-Reinforce

Policy-based

12 of 35

12

Value-based versus policy-based algorithms

13 of 35

13

Model-based algorithms (1)

-iLQR: Iterative Linear Quadratic Regulation

-MPC: Model Predictive Control

-MCTS: Monte Carlo Tree Search

Model-based

Model-Based Algorithms

14 of 35

14

Model-based algorithms (2)

15 of 35

15

Model-based algorithms (3)

16 of 35

16

Hybrid (combined) algorithms (1)

-Actor-Critic A2C, A3C, GAE

-TRPO: Trust Region Policy Optimization

-PPO: Proximal Policy Optimization

-SAC: Soft actor-Critic

Combined methods:

Value and Policy

-Dyna-Q/Dayna/AC

-AlphaZero

-I2A: Imagination Augmented Agents

-VPN: Value prediction networks

Combined methods:

Model+Value and/or Policy

Combined Algorithms

17 of 35

17

Hybrid (combined) algorithms (2)

18 of 35

18

On-policy versus Off-policy

A final important distinction between deep reinforcement learning algorithms is whether

they are on-policy or off-policy. This affects how training iterations make use of data.

19 of 35

19

On-policy deep reinforcement learning algorithms

20 of 35

20

Off-policy deep reinforcement learning algorithms

21 of 35

21

Deep learning for reinforcement learning

  • Artificial Neural networks were first combined with reinforcement learning to great effect in 1991 when Gerald Tesauro trained a neural network using reinforcement learning to play master-level backgammon.
  • However, it wasn’t until 2015 when DeepMind achieved human-level performance on many of the Atari games that they became widely adopted in this field as the underlying function approximation technique.
  • Since then, all of the major breakthroughs in reinforcement learning have used artificial neural networks to approximate functions.

22 of 35

22

Recap: ANNs and their training procedure (1)

23 of 35

23

Recap: ANNs and their training procedure (2)

24 of 35

24

How is deep learning used in reinforcement learning? (1)

A-) Policy parametrization

ANN

ANN

25 of 35

25

How is deep learning used in reinforcement learning? (2)

B-) Action-value parametrization

OR

(Less efficient. Why?)

26 of 35

26

Challenges of using ANN-based function approximation in RL (1)

27 of 35

27

Challenges of using ANN-based function approximation in RL (2)

28 of 35

28

Challenges of using ANN-based function approximation in RL (3)

29 of 35

29

Challenges of using ANN-based function approximation in RL (4)

30 of 35

30

Deep reinforcement learning versus supervised learning

31 of 35

31

DRL versus SL: lack of an oracle (1)

32 of 35

32

DRL versus SL: lack of an oracle (2)

33 of 35

33

DRL versus SL: sparsity of feedback

34 of 35

34

DRL versus SL: data generation

35 of 35

References �(utilized for preparation of lecture notes or Matlab code)

  • Laura Graesser and Wah Loon Keng, “Foundations of Deep Reinforcement Learning: Theory and Practice in Python”, Addison-Wesley Professional, 2019.
  • Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction”, Second Edition, MIT Press, Cambridge, MA, 2018.

35