1 of 32

Challenge: Robots find it difficult to learn in novel long-horizon environments with sparse rewards.

Proposal: Learning faster through expert intervention.

The Sticky Mittens Experiment

  • Infants, too young to grasp, use “sticky mittens”; play with Velcro toys
  • After removing mittens, better manipulation skills than other infants.

Woah, I can grasp now?!

Exposure to consequences of grasping before learning to grasp!

Option Template: Sticky Mitten in RL

Option Templates allow a robot to explore the consequences of a skill or option before learning its policy.

 

 

Learning with Option Templates in Fetch & Stack

Learning with Option Templates in

Google Research Football

Learning with Option Templates in 2D (Mine)Craft in our paper.

Three order-of-magnitude improvement!

Two order-of-magnitude improvement!

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

2 of 32

The Dirty Laundry is the elegant Option Template being sometimes inelegant in practice.

Sim2Real: Teleportation

Real: Robot Arm with Mittens

Real: Suboptimal, Simple, Slow Planners/ Controllers can be used for teleportation.

But sticky mittens to decouple learning components is widely applicable!

Sneak Peak into sticky mittens (simple differentiable programs) for representation learning:

Maintain ball possession option template

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

3 of 32

Detailed Presentation --�Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates

PRECISE Center

School of Engineering and Applied Science

University of Pennsylvania

December 9, 2022�

with Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

Conference on Robot Learning (CoRL), 2022

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

4 of 32

Challenge: Robots find it difficult to learn in novel long-horizon environments with sparse rewards.

Proposal: Learning faster through expert intervention.

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

5 of 32

“….. The researchers placed the mittens on infants too young to actually grasp objects, but the mittens allowed the infants to snag Velcro-fitted toys merely by swiping at them. In comparisons with infants who hadn't used the mittens, found the psychologists, those who had used the mitten subsequently showed more sophisticated abilities to explore objects… “

Woah, I can grasp now?!

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

6 of 32

a

Actions.

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

7 of 32

Option.

 

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

8 of 32

Option Template = Option without Policy.

 

 

Option Templates

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

9 of 32

Option Template = Option without Policy.

 

 

Option Templates

The option template allows an agent to explore the consequences of an option before learning it.

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

10 of 32

Option Template Learning in Fetch and Stack.

Start

4 blocks stacked in the correct order

Place Green Block

Place Yellow Block

Place Blue Block

Place Violet Block

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

11 of 32

Option Template Learning in Fetch and Stack.

Start

4 blocks stacked in the correct order

Place Green Block

Place Yellow Block

Place Blue Block

Place Violet Block

Reach Block 3

Pick Block 3 and Reach Goal

Release Block and Lift Gripper

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

12 of 32

Option Template Learning in Fetch and Stack.

Fetch and Stack

Environment Actions

Option Templates Guided RL (Ours)

Hierarchical RL (baseline)

Place Green Block

Place Yellow Block

Place Blue Block

Place Violet Block

Reach Block

Pick and Reach Goal

Release and lift gripper

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

13 of 32

Option Template Learning in Fetch and Stack.

Fetch and Stack

Environment Actions

Option Templates Guided RL (Ours)

Hierarchical RL (baseline)

Place Green Block

Place Yellow Block

Place Blue Block

Place Violet Block

Reach Block

Pick and Reach Goal

Release and lift gripper

Reward inherited from env.

Reward = termination condition of option template.

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

14 of 32

Option Template Learning in Fetch and Stack.

Comparison of steps to solve environment

3 blocks

4 blocks

Learning with demonstrations (Nair et al., 2017)

3.5 x 108

8 x 108

Hierarchical RL (bottom-up)

Only learns to place 1 block.

Option Templates (top-down)

4.5 x 105

6.0 x 105

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

15 of 32

Option Template Learning in Fetch and Stack.

HAC = Hindsight Actor Critic

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

16 of 32

Complicated multi-agent planning task!

Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

17 of 32

8 movement directions

high pass

short pass

long pass

shoot, dribble, sprint, and other actions…

For all eleven players on a team!

State information includes positions, velocities, one-hot encodings of our team and opponents.

Reward of 1 when a goal is scored!

Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

18 of 32

Maintain ball possession

Charge to the Goal

Shoot

Start

Goal = Score Goal!

Defend

Attack and Score Goal

Kick off. Opponent team in possession.

Our team has the ball.

Option Template Learning in Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

19 of 32

Win Game

Defend

Attack and Score Goal

Defend.

Maintain Ball Possession

Charge Goal

Shoot

Environment Actions

Option Templates Guided RL (Ours)

Hierarchical RL (baseline)

Option Template Learning in Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

20 of 32

Similar (easy) or better (medium, hard) performance at two orders of magnitude fewer steps.

Option Template Learning in Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

21 of 32

Option Template Learning in Google Research Soccer

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

22 of 32

Option Templates Guided RL

(top-down)

Traditional Hierarchical RL,

e.g. Option-value iteration

(bottom-up)

*Primitive actions = {top, down, left, right, USE}

Option Template Learning in Craft

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

23 of 32

Option Templates Guided RL

(top-down)

Traditional Hierarchical RL,

e.g. Option-value iteration

(bottom-up)

*Primitive actions = {top, down, left, right, USE}

Option Template Learning in Craft

Two orders of magnitude fewer steps.

Top down learning outperforms bottom-up (option value iteration)

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

24 of 32

Sneak Peak --�Fake it till you Learn it: Program Guided Representation Learning

with Neelay Velingker, Ziyang Li, Souradeep Dutta, Osbert Bastani, Mayur Naik, Insup Lee

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

25 of 32

Standard RL: High Sample Complexity

Representations

(x , y) – Coordinates of agent and goal.

Neural Network

Simple differentiable (possibly sub-optimal) Program =

RL Problem: Map images to actions.

Sticky Mittens in Representation Learning

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

26 of 32

Standard RL: High Sample Complexity

Representations

(x , y) – Coordinates of agent and goal.

Neural Network

RL Problem: Map images to actions.

Sticky Mittens in Representation Learning

Optimal Policy

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

27 of 32

In the shape of Linear Bellman Completeness,

But, with given program (sticky mitten),

Problem Formulation: Programmatic Bellman Completeness

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

28 of 32

Pacman Case Study

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

29 of 32

Pacman Case Study

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

30 of 32

Pacman Case Study

 

Pure Neural DQN

Program Guided Representation Learning

#Training Episodes

50K

50

Testing Success Rate

84.90%

99.40%

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

31 of 32

Pacman Case Study

 

Pure Neural DQN

Program Guided Representation Learning

#Training Episodes

50K

50

Testing Success Rate

84.90%

99.40%

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris

32 of 32

  1. Exciting extensions into representation learning for google football, CARLA autonomous driving, Atari games.

  • A Proof of learning the induced representations.

  • A future towards sticky mittens for decoupling intertwined and challenging components of learning!

Further-In-the-Future Work

Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates

Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris