Challenge: Robots find it difficult to learn in novel long-horizon environments with sparse rewards.
Proposal: Learning faster through expert intervention.
The Sticky Mittens Experiment
Woah, I can grasp now?!
Exposure to consequences of grasping before learning to grasp!
Option Template: Sticky Mitten in RL
Option Templates allow a robot to explore the consequences of a skill or option before learning its policy.
Learning with Option Templates in Fetch & Stack
Learning with Option Templates in
Google Research Football
Learning with Option Templates in 2D (Mine)Craft in our paper.
Three order-of-magnitude improvement!
Two order-of-magnitude improvement!
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
The Dirty Laundry is the elegant Option Template being sometimes inelegant in practice.
Sim2Real: Teleportation
Real: Robot Arm with Mittens
Real: Suboptimal, Simple, Slow Planners/ Controllers can be used for teleportation.
But sticky mittens to decouple learning components is widely applicable!
Sneak Peak into sticky mittens (simple differentiable programs) for representation learning:
Maintain ball possession option template
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Detailed Presentation --�Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates
PRECISE Center
School of Engineering and Applied Science
University of Pennsylvania
December 9, 2022�
with Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Conference on Robot Learning (CoRL), 2022
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Challenge: Robots find it difficult to learn in novel long-horizon environments with sparse rewards.
Proposal: Learning faster through expert intervention.
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
“….. The researchers placed the mittens on infants too young to actually grasp objects, but the mittens allowed the infants to snag Velcro-fitted toys merely by swiping at them. In comparisons with infants who hadn't used the mittens, found the psychologists, those who had used the mitten subsequently showed more sophisticated abilities to explore objects… “
Woah, I can grasp now?!
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
a
Actions.
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option.
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template = Option without Policy.
Option Templates
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template = Option without Policy.
Option Templates
The option template allows an agent to explore the consequences of an option before learning it.
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
Start
4 blocks stacked in the correct order
Place Green Block
Place Yellow Block
Place Blue Block
Place Violet Block
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
Start
4 blocks stacked in the correct order
Place Green Block
Place Yellow Block
Place Blue Block
Place Violet Block
Reach Block 3
Pick Block 3 and Reach Goal
Release Block and Lift Gripper
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
Fetch and Stack
Environment Actions
Option Templates Guided RL (Ours)
Hierarchical RL (baseline)
Place Green Block
Place Yellow Block
Place Blue Block
Place Violet Block
Reach Block
Pick and Reach Goal
Release and lift gripper
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
Fetch and Stack
Environment Actions
Option Templates Guided RL (Ours)
Hierarchical RL (baseline)
Place Green Block
Place Yellow Block
Place Blue Block
Place Violet Block
Reach Block
Pick and Reach Goal
Release and lift gripper
Reward inherited from env.
Reward = termination condition of option template.
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
Comparison of steps to solve environment | 3 blocks | 4 blocks |
Learning with demonstrations (Nair et al., 2017) | 3.5 x 108 | 8 x 108 |
Hierarchical RL (bottom-up) | Only learns to place 1 block. | |
Option Templates (top-down) | 4.5 x 105 | 6.0 x 105 |
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Fetch and Stack.
HAC = Hindsight Actor Critic
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Complicated multi-agent planning task!
Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
8 movement directions
high pass
short pass
long pass
shoot, dribble, sprint, and other actions…
For all eleven players on a team!
State information includes positions, velocities, one-hot encodings of our team and opponents.
Reward of 1 when a goal is scored!
Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Maintain ball possession
Charge to the Goal
Shoot
Start
Goal = Score Goal!
Defend
Attack and Score Goal
Kick off. Opponent team in possession.
Our team has the ball.
Option Template Learning in Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Win Game
Defend
Attack and Score Goal
Defend.
Maintain Ball Possession
Charge Goal
Shoot
Environment Actions
Option Templates Guided RL (Ours)
Hierarchical RL (baseline)
Option Template Learning in Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Similar (easy) or better (medium, hard) performance at two orders of magnitude fewer steps.
Option Template Learning in Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Template Learning in Google Research Soccer
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Templates Guided RL
(top-down)
Traditional Hierarchical RL,
e.g. Option-value iteration
(bottom-up)
*Primitive actions = {top, down, left, right, USE}
Option Template Learning in Craft
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Option Templates Guided RL
(top-down)
Traditional Hierarchical RL,
e.g. Option-value iteration
(bottom-up)
*Primitive actions = {top, down, left, right, USE}
Option Template Learning in Craft
Two orders of magnitude fewer steps.
Top down learning outperforms bottom-up (option value iteration)
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Sneak Peak --�Fake it till you Learn it: Program Guided Representation Learning
with Neelay Velingker, Ziyang Li, Souradeep Dutta, Osbert Bastani, Mayur Naik, Insup Lee
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Standard RL: High Sample Complexity
Representations
(x , y) – Coordinates of agent and goal.
Neural Network
Simple differentiable (possibly sub-optimal) Program =
RL Problem: Map images to actions.
Sticky Mittens in Representation Learning
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Standard RL: High Sample Complexity
Representations
(x , y) – Coordinates of agent and goal.
Neural Network
RL Problem: Map images to actions.
Sticky Mittens in Representation Learning
Optimal Policy
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
In the shape of Linear Bellman Completeness,
But, with given program (sticky mitten),
Problem Formulation: Programmatic Bellman Completeness
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Pacman Case Study
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Pacman Case Study
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Pacman Case Study
| Pure Neural DQN | Program Guided Representation Learning |
#Training Episodes | 50K | 50 |
Testing Success Rate | 84.90% | 99.40% |
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Pacman Case Study
| Pure Neural DQN | Program Guided Representation Learning |
#Training Episodes | 50K | 50 |
Testing Success Rate | 84.90% | 99.40% |
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
Further-In-the-Future Work
Exploring with Sticky Mittens: Reinforcement Learning with Expert Intervention via Option Templates
Kaustubh Sridhar, Souradeep Dutta, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris