Lecture 2��Introduction to deep reinforcement learning
1
Instructor: Ercan Atam
Institute for Data Science & Artificial Intelligence
Course: DSAI 642- Advanced Reinforcement Learning
2
List of contents for this lecture
3
Relevant readings for this lecture
Theory and Practice in Python”, Addison-Wesley Professional, 2019.
4
Value functions
5
What do we try to learn in RL?
6
What is deep reinforcement learning?
7
Families of deep reinforcement learning algorithms
Correspondingly, there are three major families of deep reinforcement learning algorithms:
which learn policies, value functions, and models, respectively.
Note: There are also “combined methods” in which agents learn more than one of these functions.
—For instance, a policy and a value function, or a value function and a model
8
Deep Reinforcement learning algorithm examples
Figure below gives an overview of the major deep reinforcement learning algorithms in each family and how they are related.
-iLQR: Iterative Linear Quadratic Regulation
-MPC: Model Predictive Control
-MCTS: Monte Carlo Tree Search
Model-based
-Actor-Critic A2C, A3C, GAE
-TRPO: Trust Region Policy Optimization
-PPO: Proximal Policy Optimization
-SAC: Soft actor-Critic
Combined methods:
Value and Policy
-Dyna-Q/Dayna/AC
-AlphaZero
-I2A: Imagination Augmented Agents
-VPN: Value prediction networks
-Deep SARSA
-DQN: Deep Q Networks
-Double DQN
-DQN +Prioritized experience replay
Value-based
-Reinforce
Policy-based
Combined methods:
Model+Value and/or Policy
9
Classification of deep reinforcement learning algorithms
10
Value-based algorithms
Value-Based Algorithms
-Deep SARSA
-DQN: Deep Q Networks
-Double DQN
-DQN +Prioritized experience replay
Value-based
11
Policy-based algorithms
Policy-Based Algorithms
-Reinforce
Policy-based
12
Value-based versus policy-based algorithms
13
Model-based algorithms (1)
-iLQR: Iterative Linear Quadratic Regulation
-MPC: Model Predictive Control
-MCTS: Monte Carlo Tree Search
Model-based
Model-Based Algorithms
14
Model-based algorithms (2)
15
Model-based algorithms (3)
16
Hybrid (combined) algorithms (1)
-Actor-Critic A2C, A3C, GAE
-TRPO: Trust Region Policy Optimization
-PPO: Proximal Policy Optimization
-SAC: Soft actor-Critic
Combined methods:
Value and Policy
-Dyna-Q/Dayna/AC
-AlphaZero
-I2A: Imagination Augmented Agents
-VPN: Value prediction networks
Combined methods:
Model+Value and/or Policy
Combined Algorithms
17
Hybrid (combined) algorithms (2)
18
On-policy versus Off-policy
A final important distinction between deep reinforcement learning algorithms is whether
they are on-policy or off-policy. This affects how training iterations make use of data.
19
On-policy deep reinforcement learning algorithms
20
Off-policy deep reinforcement learning algorithms
21
Deep learning for reinforcement learning
22
Recap: ANNs and their training procedure (1)
23
Recap: ANNs and their training procedure (2)
24
How is deep learning used in reinforcement learning? (1)
A-) Policy parametrization
ANN
ANN
25
How is deep learning used in reinforcement learning? (2)
B-) Action-value parametrization
OR
(Less efficient. Why?)
26
Challenges of using ANN-based function approximation in RL (1)
27
Challenges of using ANN-based function approximation in RL (2)
28
Challenges of using ANN-based function approximation in RL (3)
29
Challenges of using ANN-based function approximation in RL (4)
30
Deep reinforcement learning versus supervised learning
31
DRL versus SL: lack of an oracle (1)
32
DRL versus SL: lack of an oracle (2)
33
DRL versus SL: sparsity of feedback
34
DRL versus SL: data generation
References �(utilized for preparation of lecture notes or Matlab code)
35