1 of 14

LEARNING REPRESENTATIONS IN DEEP REINFORCEMENT LEARNING

Dr. Nicolò Botteghi¹, Dr. Mannes Poel², and Prof. Dr. Christoph Brune¹ ICT OPEN 2022

¹Mathematics of Imaging and AI, ² Data Management and Biometrics University of Twente

2 of 14

Deep Reinforcement Learning

Learning from interacting with the world 🡪 Deep Reinforcement Learning [1]

Control of dynamical systems using data-driven methods

Challenge 🡪 data-driven interactive AI suffers from sample inefficiency and long training

2

[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition, 2018.

Agent

Environment

Action

Observation

Reward

In the last decade, Artificial Intelligence and data-driven methods such as Deep Reinforcement Learning have paved the road for new approaches for studying, analysing, and controlling dynamical systems.

What is so special about Reinforcement Learning? 🡪 Reinforcement Learning is the computation approach for learning from interaction. Learning from interaction with the world is the simplest yet strongest learning approach that every living being experiences throughout its life. Most of what humans and animals learn is built on the concept of iteratively acting and improving the behavior based on the consequences of the actions taken.

Imagine the task of holding this presenter. Even though I have never lifted it before this talk, because I learned already quite some years ago to pick up and hold objects, I can easily do it. However, the task of holding this presenter may be very complicate for a robot that was trained to pick up different objects.

This example brings us to the drawbacks of data-driven methods such as Deep Reinforcement Learning, namely the limited generalisation and the need for a huge amount of data.

3 of 14

Representation Learning

3

How to reduce the need for data and speed up the learning by improving generalization?

Learning representations of the data

Learning behaviors based on learned representations

Incorporate prior knowledge

Environment

Agent

Action

Observation

Reward

Latent state

Latent action

How can we reduce the amount of data needed for training our agents and speed up the learning by improving generalization and adaptability of the learned behaviours to new situations?

In our research we have been investigating three major aspects that we consider crucial for answering this question and for further progresses of science.

The first aspect is the learning of meaningful low-dimensional data representations using dimensionality reduction techniques. For example, our robots may be equipped with multiple sensors and may perceive many information about the world but only a sub-set of the information is really crucial for learning the optimal control policy and should be encoded in the state. As you can see in the top picture, where the observation of the grid world is encoded into a state space that preserves the grid structure.

Secondly, we want to learn behaviours given the learned representations as a mean for training faster and generalize better.

And thirdly, instead of relying purely on data, we want incorporate into our learning-based approaches structural priors deriving from our knowledge of the world for reducing the need for data and guiding the learning the representations to encode meaningful information for control. For example, consider the example of robot soccer, when using autoencoders for learning state representation small object such as the ball are not necessarily encoded and this is extremely detrimental when learning to play soccer.

In today’s talk, I will briefly touch upon the first two aspects due to time constraints.

4 of 14

4

Environment

Agent

Action

Observation

Reward

Latent state

Latent action

Proposed Method

5 of 14

5

[4] Balaraman Ravindran and Andrew G Barto. Symmetries and model minimization in markov decision processes, 2001.

[5] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. Deepmdp: Learning continuous latent space models for representation learning, 2019.

MDP Homomorphism

6 of 14

6

We tested our method on two widely-studied control problems [6], [7]:

Grid-world using high-dimensional observations (RGB images)

Robot navigation using high-dimensional observations (RGB images)

Both problems have discrete action spaces that may be high-dimensional

High-Dimensional Experiments

[6] Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, and Max Welling. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions. (Aamas), 2020.

[7] Jonschkowski, Rico, and Oliver Brock. "Learning state representations with robotic priors." Autonomous Robots 39.3 (2015): 407-428.

7 of 14

7

In both cases, the latent policy using TD3 [9] outperforms in terms of training speed and sample efficiency:

end-to-end DRL policy mapping observations to actions 🡪 Deep Q-Network (DQN) [8]

DRL policy relying only on the state representation mapping latent states to actions 🡪 Deep Q-Network (DQN) on latent states

Maze 14x14

Robot 8 actions

Training Efficiency

[8] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

[9] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.

8 of 14

8

Learning of state and action representations provides [2], [3]:

interpretability

generalization and robustness

sample efficiency

State representation

Action representation

[2] Nicolò Botteghi. Robotics deep reinforcement learning with loose prior knowledge. PhD thesis, University of Twente, Netherlands, October 2021.

[3] Nicolò Botteghi, Mannes Poel, Beril Sirmacek, and Christoph Brune. Low-dimensional state and action representation learning with mdp homomorphism metrics, 2021.

State and Action Representations

9 of 14

9

Impact and Future Work

Our current and future work includes:

Uncertainty quantification with stochastic models

From spatial to spatial-temporal abstractions

From vector to graph and set-based representations

What are the next steps that need to be considered for further advances in the field and what are working on?

First of all we need to consider uncertainties and their quantification using stochastic models. This aspect is crucial in all the cases in which we have inaccurate models of the world such as in fluid-dynamics
Second, we want to consider the temporal axis when learning abstractions and work in the so-called partial observability settings. A classical example is autonomous navigation and exploration using mobile robots
Thirdly, we are looking at the possibility of deviating from classic vector representations for latent states and actions and move towards graph or set-based representations. This is really useful in high-dimensional control problems such as in multi-agent robot systems when we want to model relations among agents

10 of 14

10

Learning representations of states and actions is a key ingredient for data-driven control of dynamical systems

Learning representations allows Deep Reinforcement Learning algorithms to gain higher:

interpretability

generalization and robustness

sample efficiency

Conclusion

11 of 14

Thank you for your attention!

11

12 of 14