1 of 26

Presenter (15mins)

Seth Karten skarten@cs.cmu.edu

2 of 26

Dynamics-Aware Unsupervised Discovery of Skills

Archit Sharma∗, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

ICLR 2020

This paper had so many typos ☹

3 of 26

Problem Statement & Motivation

4 of 26

Related Work – End-to-End Hierarchical RL�

High-level controller for time-scale of walking steps and low level controller for target angles
BUT changing LLC dynamics impacts HLC behavior, impairing the learning ability

(Peng et al., 2017)

5 of 26

Related Work – Model-based RL�

Probabilistic ensembles with trajectory sampling (PETS)
Much lower sample complexity and performs
Able to match model-free performance in recent work

(Chua et al., 2018a)

6 of 26

Related Work – Diversity is all you need (DIAYN)

7 of 26

Mutual Information for Unsupervised Skill Discovery

8 of 26

Mutual Information for Unsupervised Skill Discovery

Variational lower bound of mutual information since KL-Divergence is non-negative
Intractable p -> variational q

9 of 26

Mutual Information for Unsupervised Skill Discovery

Denominator is also intractable so marginalize the variational approximation q over the prior p(z)

10 of 26

Planning Using Skill Dynamics

Using MPPI (Model Predictive Path Integral)

11 of 26

Mujoco Environments: Humanoid, Ant, Half-Cheetah

12 of 26

Experiment 1: Continuous Skill Spaces Allow Interpolation

13 of 26

Experiment 2: Lower Variance Primitives

14 of 26

Experiment 3: DADs vs Model-based RL

Random-MBRL – randomly collected trajectories
Weak-oracle MBRL – trajectories generated by planner with a randomly sampled goal
Strong-oracle MBRL - trajectories generated by the planner to reach a specific goal

15 of 26

Experiment 4: Meta-controller vs MPPI controller

17 of 26

Strength 1

Modularly learned transitions (skills) can be used with any controller (planner)

18 of 26

Strength 2

Interesting way to use model free training with model based planning

19 of 26

Strength 3

Decoupling the learning phase has the potential for generalization

21 of 26

Weakness 1

Modularity only works if the skills span the task space
More skills may need to be learned in the future

22 of 26

Weakness 2

Unclear how small (short) or large (long) a motion primitive can be learned

23 of 26

Weakness 3

There are no guarantees that the learned skills are useful
There may be many “cool” skills but not very useful

24 of 26

TL;DR/Summary: Key Insights

1st learn skills in model-free manner
Then use model based planner to learn task
combines performance benefits of model-free with complexity benefits of model-based RL

25 of 26

QnA (1mins)

26 of 26

5 Discussion Points – send to TA, don’t include in slides

Are all diverse skills worth learning?
Is there a basis of skills that span any task?
Is a two-step (skill, then planning) methodology scalable?
Can we learn the number of skills hyperparameter?
Would variable skill duration length improve performance?

1 of 26

2 of 26

3 of 26

4 of 26

5 of 26

6 of 26

7 of 26

8 of 26

9 of 26

10 of 26

11 of 26

12 of 26

13 of 26

14 of 26

15 of 26

16 of 26

17 of 26

18 of 26

19 of 26

20 of 26

21 of 26

22 of 26

23 of 26

24 of 26

25 of 26

26 of 26