1 of 26

Presenter (15mins)

Seth Karten skarten@cs.cmu.edu

2 of 26

Dynamics-Aware Unsupervised Discovery of Skills

Archit Sharma∗, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

ICLR 2020

This paper had so many typos ☹

3 of 26

Problem Statement & Motivation

  •  

4 of 26

Related Work – End-to-End Hierarchical RL�

  • High-level controller for time-scale of walking steps and low level controller for target angles
  • BUT changing LLC dynamics impacts HLC behavior, impairing the learning ability

(Peng et al., 2017)

5 of 26

Related Work – Model-based RL�

  • Probabilistic ensembles with trajectory sampling (PETS)
  • Much lower sample complexity and performs
  • Able to match model-free performance in recent work

(Chua et al., 2018a)

6 of 26

Related Work – Diversity is all you need (DIAYN)

  •  

7 of 26

Mutual Information for Unsupervised Skill Discovery

  •  

8 of 26

Mutual Information for Unsupervised Skill Discovery

  • Variational lower bound of mutual information since KL-Divergence is non-negative
  • Intractable p -> variational q

9 of 26

Mutual Information for Unsupervised Skill Discovery

  • Denominator is also intractable so marginalize the variational approximation q over the prior p(z)

10 of 26

Planning Using Skill Dynamics

Using MPPI (Model Predictive Path Integral)

11 of 26

Mujoco Environments: Humanoid, Ant, Half-Cheetah

12 of 26

Experiment 1: Continuous Skill Spaces Allow Interpolation

13 of 26

Experiment 2: Lower Variance Primitives

14 of 26

Experiment 3: DADs vs Model-based RL

  • Random-MBRL – randomly collected trajectories
  • Weak-oracle MBRL – trajectories generated by planner with a randomly sampled goal
  • Strong-oracle MBRL - trajectories generated by the planner to reach a specific goal

15 of 26

Experiment 4: Meta-controller vs MPPI controller

16 of 26

Strengths

17 of 26

Strength 1

  • Modularly learned transitions (skills) can be used with any controller (planner)

18 of 26

Strength 2

  • Interesting way to use model free training with model based planning

19 of 26

Strength 3

  • Decoupling the learning phase has the potential for generalization

20 of 26

Weaknesses

21 of 26

Weakness 1

  • Modularity only works if the skills span the task space
  • More skills may need to be learned in the future

22 of 26

Weakness 2

  • Unclear how small (short) or large (long) a motion primitive can be learned

23 of 26

Weakness 3

  • There are no guarantees that the learned skills are useful
  • There may be many “cool” skills but not very useful

24 of 26

TL;DR/Summary: Key Insights

  • HRL
    • 1st learn skills in model-free manner
    • Then use model based planner to learn task
    • combines performance benefits of model-free with complexity benefits of model-based RL

25 of 26

QnA (1mins)

26 of 26

5 Discussion Points – send to TA, don’t include in slides

  1. Are all diverse skills worth learning?
  2. Is there a basis of skills that span any task?
  3. Is a two-step (skill, then planning) methodology scalable?
  4. Can we learn the number of skills hyperparameter?
  5. Would variable skill duration length improve performance?