1 of 40

Deep Reinforcement Learning

B. Ravindran

Reconfigurable and Intelligent Systems Engineering (RISE) Group

Department of Computer Science and Engineering

Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI)

Indian Institute of Technology Madras

2 of 40

Need for Generalization

Issues with large state/action spaces:

tabular approaches not memory efficient
data sparsity
continuous state/action spaces

We need methods that can generalize experience gained over a limited subset of the state space
Use a parameterized representation

Deep RL

CAIR

3 of 40

Need for Deep RL

Issues with large state/action spaces:

tabular approaches not memory efficient
data sparsity
continuous state/action spaces

Use a parameterized representation

Value Functions
Policies
Models

Deep RL

CAIR

4 of 40

Value Function Approximation

Least squares

But we don’t know the target!
Use the TD target.

Deep RL

CAIR

5 of 40

Linear Q-learning

Known to converge to close to the RMSE minimizer if the policy is fixed
No such results for Q learning
No strong results for other complex parameterizations

But many successful examples: TD Gammon, Atari,…

Deep RL!!

Deep RL

CAIR

TD Error

6 of 40

Can exploit the possibility that q values of near-by states wouldn’t change a lot.

Assume the task is to learn a policy on a big gridworld.

Navie method:-

Divide the grid into smaller 10x10 grid.

Abdrupt change in Q-value of states that lie on the boundary.

Coarse coding:-

Avoids abrupt changes in the value of the state. Smoothens the qvalues during transition from one cluster to another.

Issue:-

No uniformity in the number of ‘ON’ bits used to represent a state.

Tile coding:-

Form of coarse coding but systematic.

Number of ‘ON’ bits == Number of tiles used.

Tile and Coarse coding

7 of 40

CMAC(Cerebellar model articulation controller):-

1.Proposed by James Albus in 1975.

2.A form of coarse coding.

3.Has a value of 1 inside of k square regions and 0 elsewhere.

4.Hash function is used to make sure that the k squares are randomly scattered.

5.Implemented in such a way there are exactly c different functions which will be active for any given input.

6.The hash function makes generalization even better.

7.Typically used in places where the input is high dimensional.

Radial Basis Function:-

1.Output of radial basis function depends on the distance between input and some fixed point c.

2.Sum of many radial basis functions can be used to approximate Q(s,a).

Additional Linear Approximators

8 of 40

Non-Linear Function Approximator

1.Linear function approximators are very restrictive. Can only model linear functions. Basis expansion does help to generate non-linear functions in the original input space.

2.Non-linear approximators can model complex functions and are very powerful.

3.The features are learnt on the fly and are not hardcoded as is the case with tile and sparse coding.

4.Can generalize to unseen states.

Disadvantage:-

Requires a lot of data and compute.

9 of 40

Human Level Backgammon player �TD-Gammon (Tesauro 92, 94, 95)

Beat the best human

player in 1995

Learnt completely

by self play

New moves not recorded by humans in centuries of play

Deep RL

CAIR

10 of 40

What about the features?

Deep RL

CAIR

Learnt to play from video input from scratch!

11 of 40

Deep Q-Learning

Deep RL

CAIR