Max Chiswick
Daniel Braun
Joe Kwon
And
Jack Koch & Lauro Langosco
Understanding RL agents using generative visualization
Lee Sharkey (presenting)
Input Hidden activations Output
“Feature Visualisation”; Distill; Olah et al. 2017
“An Overview of Early Vision in InceptionV1”; Distill; Olah et al. 2020
“Curve circuits”; Distill; Cammarata et al. 2021
Input Hidden activations Output
gradients
What humans see
“Quantifying generalization in RL”; Cobbe et al. 2019
What the agent
actually sees
Training curve of our agent
Input Hidden activations Output
Output
Hidden activations
Input
Feedforward:
RNN:
Recurrent agent:
Environment
Output / Input
Hidden activations
“Causal Analysis of Agent Behavior for AI Safety”; Déletang et al. 2021
Environment
Output / Input
Hidden activations
Recurrent agent:
Recurrent agent:
Environment
Output / Input
Hidden activations
Rupprecht et al. 2017
Seaquest
Maximally exciting images for neurons in an agent
A recap
Feature visualization is a challenge for RL agents because:
Our solution:
Learn a differentiable generator for realistic agent-environment sequences
Generative model
Architecture diagram
sample
sample
Reconstruction Ground Truth
Reconstruction Ground Truth
Generated using random actions
Ground truth (used for initialization)
Latent vector 1
Latent vector 2
Latent vector 3
Latent vector 4
Optimizing for high/low value sequences
Samples that maximise/ minimize value output
Value output throughout optimization
Maximized
Minimized
Optimizing for increasing/decreasing value
Samples that maximise/ minimize difference of value output in 2nd and 1st half of sequence
Difference of values between 2nd and 1st half of the sequence throughout optimization
Increasing value
Decreasing Value
Optimizing for specific action sequences
% actions correct
Optimizing for particular agent hidden states
Samples that maximise/ minimize activation of neuron 2
Optimization of activation of neuron 2
Maximized
Minimized
Hidden states projected onto top 3 principle components, coloured by cluster identity
Max Chiswick
Daniel Braun
Joe Kwon
And
Jack Koch & Lauro Langosco
Thanks! Questions?
Lee Sharkey