1 of 14

LEARNING REPRESENTATIONS IN DEEP REINFORCEMENT LEARNING

Dr. Nicolò Botteghi1, Dr. Mannes Poel2, and Prof. Dr. Christoph Brune1 ICT OPEN 2022

1 Mathematics of Imaging and AI,  2 Data Management and  Biometrics University of Twente

2 of 14

Deep Reinforcement Learning

  • Learning from interacting with the world 🡪 Deep Reinforcement Learning [1]

  • Control of dynamical systems using data-driven methods

  • Challenge 🡪 data-driven interactive AI suffers from sample inefficiency and long training

2

[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition, 2018.

Agent

Environment

Action

Observation

Reward

3 of 14

Representation Learning

3

How to reduce the need for data and speed up the learning by improving generalization?

  • Learning representations of the data

  • Learning behaviors based on learned representations

  • Incorporate prior knowledge

Environment

Agent

Action

Observation

Reward

Latent state

Latent action

4 of 14

4

 

Environment

Agent

Action

Observation

Reward

Latent state

Latent action

Proposed Method

5 of 14

5

 

 

[4] Balaraman Ravindran and Andrew G Barto. Symmetries and model minimization in markov decision processes, 2001.

[5] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. Deepmdp: Learning continuous latent space models for representation learning, 2019.

MDP Homomorphism

6 of 14

6

We tested our method on two widely-studied control problems [6], [7]:

  • Grid-world using high-dimensional observations (RGB images)

  • Robot navigation using high-dimensional observations (RGB images)

  • Both problems have discrete action spaces that may be high-dimensional

High-Dimensional Experiments

[6] Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, and Max Welling. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions. (Aamas), 2020.

[7] Jonschkowski, Rico, and Oliver Brock. "Learning state representations with robotic priors." Autonomous Robots 39.3 (2015): 407-428.

7 of 14

7

In both cases, the latent policy using TD3 [9] outperforms in terms of training speed and sample efficiency:

  • end-to-end DRL policy mapping observations to actions 🡪 Deep Q-Network (DQN) [8]

  • DRL policy relying only on the state representation mapping latent states to actions 🡪 Deep Q-Network (DQN) on latent states

Maze 14x14

Robot 8 actions

Training Efficiency

[8] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

[9] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.

8 of 14

8

Learning of state and action representations provides [2], [3]:

  • interpretability

  • generalization and robustness

  • sample efficiency

State representation

Action representation

[2] Nicolò Botteghi. Robotics deep reinforcement learning with loose prior knowledge. PhD thesis, University of Twente, Netherlands, October 2021.

[3] Nicolò Botteghi, Mannes Poel, Beril Sirmacek, and Christoph Brune. Low-dimensional state and action representation learning with mdp homomorphism metrics, 2021.

State and Action Representations

9 of 14

9

Impact and Future Work

Our current and future work includes:

  • Uncertainty quantification with stochastic models

  • From spatial to spatial-temporal abstractions

  • From vector to graph and set-based representations

10 of 14

10

Learning representations of states and actions is a key ingredient for data-driven control of dynamical systems

Learning representations allows Deep Reinforcement Learning algorithms to gain higher:

  • interpretability

  • generalization and robustness

  • sample efficiency

Conclusion

11 of 14

Thank you for your attention!

11

12 of 14

[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition, 2018.

[2] Nicolò Botteghi. Robotics deep reinforcement learning with loose prior knowledge. PhD thesis, University of Twente, Netherlands, October 2021.

[3] Nicolò Botteghi, Mannes Poel, Beril Sirmacek, and Christoph Brune. Low-dimensional state and action representation learning with mdp homomorphism metrics, 2021.

[4] Balaraman Ravindran and Andrew G Barto. Symmetries and model minimization in markov decision processes, 2001.

[5] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. Deepmdp: Learning continuous latent space models for representation learning, 2019.

[6] Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, and Max Welling. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions. (Aamas), 2020.

[7] Jonschkowski, Rico, and Oliver Brock. "Learning state representations with robotic priors." Autonomous Robots 39.3 (2015): 407-428.

[8] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

[9] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.

[10] Kalia, Manu, et al. "Deep learning of normal form autoencoders for universal, parameter-dependent dynamics." 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning. 2020.

12

Mathematics of Imaging and AI, Data Management and  Biometrics

References

13 of 14

Impact and Future Work

13

A0

S1

S3

S4

S2

A1

A1

A0

TA0

TA1

TA2

S: state

A: action

TA: temporal abstraction

S0

Our current and future work includes:

  • Uncertainty quantification with stochastic models

  • From spatial to spatial-temporal abstractions

  • From vector to graph and set-based representations

14 of 14

Guidelines for the presentation

  • Engage  --> tell a story (something simple and intuitive on learning from interaction)
  • What is the problem
  • What is the solution
  • Why should I care to listen
  • Who am I? Brief bio (?)
  • What are you going to tell? Structure of the presentation (10 minutes only – so keep it compact)
  • Proposed framework (MDP homomorphism + scheme)
  • Applications + Results
  • Conclusions

14