1 of 19

A Divergence Minimization Perspective

on Imitation Learning Methods

Shane Shixiang Gu

Google

Seyed Kamyar

Seyed Ghasemipour

University of Toronto

Vector Institute

Richard Zemel

University of Toronto

Vector Institute

2 of 19

Motivation

  • Imitation Learning:
    • Behavioural Cloning (BC)
    • Inverse RL (IRL)
  • Recent IRL methods:
    • Significantly outperform BC
    • Derived from diverse perspectives
  • Goals:
    • Build a unifying perspective
    • Understand IRL vs. BC
    • Solve new problems

3 of 19

Background: [Ho & Ermon, 2016]

  • Direct Policy Search!
  • GAIL: Optimize

Max-Ent IRL

Matching

to

4 of 19

Curious Similarity

  • AIRL [Fu et al. 2017]: derived from a different perspective

5 of 19

Let’s Build Intuition

  • AIRL [Fu et al. 2017]:

Reverse KL!

6 of 19

Natural Question: Other f-Divergences?

  • GAIL: JS AIRL: Reverse KL
  • f-Div Examples: Reverse KL, JS, Forward KL, etc.
  • Why care? Induce different learned behaviours

Mode-Seeking

Mode-Covering

7 of 19

f-MAX: IRL with f-Div

  1. Objective

  • Estimator

  • Rewrite as RL

  • Policy Gradient

SKSG, Gu, Zemel, NeurIPS 2019

8 of 19

Different Behaviours

  • x-axis: log density ratio
  • y-axis: induced “reward”

Mode-Seeking

Mode-Covering

AIRL

Rev. KL

GAIL

JS

FAIRL

Forw. KL

9 of 19

Quick Breather

  • We showed how to derive IRL with any f-divergence
  • Goals:
    • Understand IRL vs. BC
    • Solve new problems

Max-Ent IRL

Matching

to

10 of 19

IRL vs. BC

  • Hypothesis 1: Matching vs.
  • Hypothesis 2: Mode-seeking vs. Mode-Covering divergences

AIRL

GAIL

Standard BC

11 of 19

Empirical Evaluation

  • Compare: BC, reverse KL (AIRL), forward KL (FAIRL)
    • Controlling for the divergence used
  • Results: AIRL = FAIRL >> BC

    • Hypothesis 1: additional state-marginal matching matters!

    • Hypothesis 2: Inconclusive in these domains

12 of 19

Pure State-Marginal Matching (SMM)?

  • Train policy such that:
  • No need for expert demos or tuned reward functions
  • [Hazan et al., 2018, Lee et al. 2019] to train an exploration policy

Figure from Lee et al. 2019

13 of 19

Adversarial SMM

SMM

AIRL without actions!

=

=

14 of 19

Adversarial SMM

15 of 19

Adversarial SMM

16 of 19

Adversarial SMM

Random Policy

17 of 19

Adversarial SMM

Random Policy

SMM Policy

18 of 19

Contributions

  • Unify recent IRL approaches
    • Analyzing behaviour of IRL methods

  • Use this intuition for:
    • Understanding IRL vs. BC
    • Developing new solutions to other problems

Ho & Ermon 2016

f-MAX

AIRL

GAIL

19 of 19

Thank you for listening!