Unsupervised State Representation Learning in Atari
Ankesh Anand*, Evan Racah*, Sherjil Ozair*,
Yoshua Bengio, Marc-Alexandre Côté, R Devon Hjelm
Key Points
State Representation Learning
Goal: Encode high-dimensional obs. to a latent space that captures underlying generative factors of an environment
Supervised -> Self-supervised / Unsupervised Learning
(Alison Gopnik, Andrew N. Meltzoff and Patricia K. Kuhl, 1999)
(Linda Smith and Michael Gasser, 2005)
Illustrative Example
Representation Learning in humans doesn’t seem to be operating in the pixel space.
From memory
From reference
One dollar bill
Epstein et. al (2016)
Contrastive Unsupervised Representation Learning
Arora et. al (CURL) ICML’19�Poole et. al (MI Bounds) ICML’19
Hjelm et. al (Deep InfoMax) ICLR’19�Van Den Oord et. al (CPC) 2018
Nature of RL environments
Can we exploit the inherent temporal structure to learn representations?
The contrastive task
Temporal InfoMax
Temporal InfoMax is not enough
Ozair et. al (2019)
Spatio-Temporal DeepInfoMax (ST-DIM)
Evaluating Representations
Atari Annotated Ram Interface (AARI)
Categorization of State Variables
Agent Localization
Small Object Localization
Other Localization
Score/Clock/Lives/Display
Miscellaneous
facing direction
brick existence (binary)
State Variable Breakdown
Evaluation Using Probing
Alain et. al (2017)
Training Details
Baselines
Results
Categorical Breakdown
Easy-to-Exploit Features
Future Directions