1 of 24

Machine Learning II

Value Methods in RL

1

Ken Q. Pu, Associate Professor in Computer Science

Faculty of Science, Ontario Tech University

2 of 24

Basic Definitions

2

3 of 24

Basic Definitions

3

4 of 24

Basic Definitions

4

5 of 24

Return

5

6 of 24

Expected Return of a Policy & Optimal Policy

6

7 of 24

Evaluating States and Actions With Value Functions

7

8 of 24

Evaluating States and Actions With Value Functions

8

9 of 24

Check Your Understanding

9

10 of 24

From Optimal Action-Value Function To Policy

10

11 of 24

Bellman Equations for On Policy Value Functions

11

12 of 24

Bellman Equations for Optimal Value Functions

12

13 of 24

Value Iteration

13

14 of 24

Value Iteration

14

15 of 24

Temporal Difference Learning (TD)

15

16 of 24

Q-Learning

16

17 of 24

Q-Learning

17

18 of 24

Deep Q-Learning Network (DQN)

18

19 of 24

DQN

19

20 of 24

DQN

20

21 of 24

DQN: Experience Buffer & Experience Replay

21

22 of 24

DQN: Target and Policy DQN

22

23 of 24

Atari DQN 2013

23

24 of 24

More About DQN

24