1 of 24

Max Chiswick

Daniel Braun

Joe Kwon

And

Jack Koch & Lauro Langosco

Understanding RL agents using generative visualization

Lee Sharkey (presenting)

2 of 24

Input Hidden activations Output

“Feature Visualisation”; Distill; Olah et al. 2017

3 of 24

“An Overview of Early Vision in InceptionV1”; Distill; Olah et al. 2020

“Curve circuits”; Distill; Cammarata et al. 2021

Input Hidden activations Output

gradients

4 of 24

What humans see

“Quantifying generalization in RL”; Cobbe et al. 2019

What the agent

actually sees

Training curve of our agent

5 of 24

Input Hidden activations Output

Output

Hidden activations

Input

Feedforward:

RNN:

6 of 24

Recurrent agent:

Environment

Output / Input

Hidden activations

7 of 24

“Causal Analysis of Agent Behavior for AI Safety”; Déletang et al. 2021

8 of 24

Environment

Output / Input

Hidden activations

Recurrent agent:

9 of 24

Recurrent agent:

Environment

Output / Input

Hidden activations

10 of 24

Rupprecht et al. 2017

Seaquest

Maximally exciting images for neurons in an agent

11 of 24

A recap

Feature visualization is a challenge for RL agents because:

  1. Agents coordinate actions across time using memory

  • Agents may use the environment as a memory, but the environment is not differentiable

  • Features learned by agents may not generate realistic images/image sequences. Therefore destroys memory encoded in the environment.

12 of 24

Our solution:

Learn a differentiable generator for realistic agent-environment sequences

Generative model

13 of 24

Architecture diagram

sample

sample

14 of 24

Reconstruction Ground Truth

Reconstruction Ground Truth

15 of 24

Generated using random actions

Ground truth (used for initialization)

16 of 24

Latent vector 1

Latent vector 2

Latent vector 3

Latent vector 4

17 of 24

Optimizing for high/low value sequences

Samples that maximise/ minimize value output

Value output throughout optimization

Maximized

Minimized

18 of 24

Optimizing for increasing/decreasing value

Samples that maximise/ minimize difference of value output in 2nd and 1st half of sequence

Difference of values between 2nd and 1st half of the sequence throughout optimization

Increasing value

Decreasing Value

19 of 24

Optimizing for specific action sequences

% actions correct

20 of 24

Optimizing for particular agent hidden states

Samples that maximise/ minimize activation of neuron 2

Optimization of activation of neuron 2

Maximized

Minimized

21 of 24

22 of 24

23 of 24

Hidden states projected onto top 3 principle components, coloured by cluster identity

24 of 24

Max Chiswick

Daniel Braun

Joe Kwon

And

Jack Koch & Lauro Langosco

Thanks! Questions?

Lee Sharkey