LEARNING REPRESENTATIONS IN DEEP REINFORCEMENT LEARNING
Dr. Nicolò Botteghi1, Dr. Mannes Poel2, and Prof. Dr. Christoph Brune1 ICT OPEN 2022
1 Mathematics of Imaging and AI, 2 Data Management and Biometrics University of Twente
Deep Reinforcement Learning
2
[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition, 2018.
Agent
Environment
Action
Observation
Reward
Representation Learning
3
How to reduce the need for data and speed up the learning by improving generalization?
Environment
Agent
Action
Observation
Reward
Latent state
Latent action
4
Environment
Agent
Action
Observation
Reward
Latent state
Latent action
Proposed Method
5
[4] Balaraman Ravindran and Andrew G Barto. Symmetries and model minimization in markov decision processes, 2001.
[5] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. Deepmdp: Learning continuous latent space models for representation learning, 2019.
MDP Homomorphism
6
We tested our method on two widely-studied control problems [6], [7]:
High-Dimensional Experiments
[6] Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, and Max Welling. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions. (Aamas), 2020.
[7] Jonschkowski, Rico, and Oliver Brock. "Learning state representations with robotic priors." Autonomous Robots 39.3 (2015): 407-428.
7
In both cases, the latent policy using TD3 [9] outperforms in terms of training speed and sample efficiency:
Maze 14x14
Robot 8 actions
Training Efficiency
[8] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
[9] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.
8
Learning of state and action representations provides [2], [3]:
State representation
Action representation
[2] Nicolò Botteghi. Robotics deep reinforcement learning with loose prior knowledge. PhD thesis, University of Twente, Netherlands, October 2021.
[3] Nicolò Botteghi, Mannes Poel, Beril Sirmacek, and Christoph Brune. Low-dimensional state and action representation learning with mdp homomorphism metrics, 2021.
State and Action Representations
9
Impact and Future Work
Our current and future work includes:
10
Learning representations of states and actions is a key ingredient for data-driven control of dynamical systems
Learning representations allows Deep Reinforcement Learning algorithms to gain higher:
Conclusion
Thank you for your attention!
11
[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition, 2018.
[2] Nicolò Botteghi. Robotics deep reinforcement learning with loose prior knowledge. PhD thesis, University of Twente, Netherlands, October 2021.
[3] Nicolò Botteghi, Mannes Poel, Beril Sirmacek, and Christoph Brune. Low-dimensional state and action representation learning with mdp homomorphism metrics, 2021.
[4] Balaraman Ravindran and Andrew G Barto. Symmetries and model minimization in markov decision processes, 2001.
[5] Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G. Bellemare. Deepmdp: Learning continuous latent space models for representation learning, 2019.
[6] Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, and Max Welling. Plannable Approximations to MDP Homomorphisms: Equivariance under Actions. (Aamas), 2020.
[7] Jonschkowski, Rico, and Oliver Brock. "Learning state representations with robotic priors." Autonomous Robots 39.3 (2015): 407-428.
[8] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
[9] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.
[10] Kalia, Manu, et al. "Deep learning of normal form autoencoders for universal, parameter-dependent dynamics." 1st NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning. 2020.
12
Mathematics of Imaging and AI, Data Management and Biometrics
References
Impact and Future Work
13
A0
S1
S3
S4
S2
A1
A1
A0
TA0
TA1
TA2
S: state
A: action
TA: temporal abstraction
S0
Our current and future work includes:
Guidelines for the presentation
14