Lab 5: How to Train Your Dog!
CS 123
modified from Jaden’s original slides
Let’s make Pupper walk!
CS 123
Check out Jaden’s work: https://lgpl-gaits.github.io/
Sim2Real Locomotion Policy Learning
3
Lab Overview
4
Despite little (no) coding needed, this is, in practice, a very complicated lab. Make sure to start early!
Google Colab Setup
(check lab document for concrete instructions)
5
Training Pupper to Walk with MuJoCo XLA (MJX)
What is MJX?
6
How We Train Pupper
7
Rewards in RL
Goal: optimize the reward
reward
state
action
8
What’s a good reward…
Well… the reward is a function of the of the state and action at time t
state/observation: 12-dof motor positions + velocities, roll/pitch/yaw, base velocities / orientation, foot contact forces/position, etc
action: normalized 12-dof motor position commands
9
We want Pupper to follow a velocity command
Is a velocity following command sufficient?
10
We want Pupper to follow a velocity command
Is a velocity following command sufficient?
In practice… NO
11
We want Pupper to follow a velocity command
Is a velocity following command sufficient?
In practice… NO
Why???
12
Teaching Pupper to walk is like teaching a toddler
13
What happens when you only use Velocity Command…
Command: 0.05 m/s
14
What happens when you only use Velocity Command…
15
Teaching Pupper to walk is like teaching a toddler
Can we just teach a baby to walk by giving candy when it goes forward??
16
Teaching Pupper to walk is like teaching a toddler
Can we just teach a baby to walk by giving candy when it goes forward??
Need to give it auxiliary tasks…
17
Teaching Pupper to walk is like teaching a toddler
Shooting a basketball
18
Auxiliary rewards
Guiding gradient to correct optimum
19
Pupper needs auxiliary rewards too
How to encourage Pupper to walk with the correct gait?
Linear combination of differentiable rewards:
20
Pupper needs auxiliary rewards too
How to encourage Pupper to walk with the correct gait?
reward definitions: https://cs123-stanford.readthedocs.io/en/latest/_static/rewards.py
21
RL Workflow
22
Domain randomization
System identification is never perfect…
Which terms to randomize?
23
units is in meters
Policy deployment
run on Pupper python3 deploy.py
Press different buttons on the controller to try out different walking gaits!
24
RL Workflow
25
Challenge: Train the most agile policy (teaser for optional lab 2)
Optional Lab Release:
26
Quick Tips
Safety
General
27
General safety
28