A Divergence Minimization Perspective
on Imitation Learning Methods
Shane Shixiang Gu
Seyed Kamyar
Seyed Ghasemipour
University of Toronto
Vector Institute
Richard Zemel
University of Toronto
Vector Institute
Motivation
Background: [Ho & Ermon, 2016]
Max-Ent IRL
Matching
to
Curious Similarity
Let’s Build Intuition
Reverse KL!
Natural Question: Other f-Divergences?
Mode-Seeking
Mode-Covering
f-MAX: IRL with f-Div
SKSG, Gu, Zemel, NeurIPS 2019
Different Behaviours
Mode-Seeking
Mode-Covering
AIRL
Rev. KL
GAIL
JS
FAIRL
Forw. KL
Quick Breather
Max-Ent IRL
Matching
to
IRL vs. BC
AIRL
GAIL
Standard BC
Empirical Evaluation
Pure State-Marginal Matching (SMM)?
Figure from Lee et al. 2019
Adversarial SMM
SMM
AIRL without actions!
=
=
Adversarial SMM
Adversarial SMM
Adversarial SMM
Random Policy
Adversarial SMM
Random Policy
SMM Policy
Contributions
Ho & Ermon 2016
f-MAX
AIRL
GAIL
Thank you for listening!