Assistive Gym
Spring 2020
Lecture 4
Zackory Erickson
Assistive Gym
An open source physics simulation framework for assistive robotics
Assistive Gym
Pybullet physics engine
OpenAI Gym interface
Real time simulation: train in the cloud, test on a laptop
Pybullet
Open source Python wrappers for the Bullet physics engine
Rigid body and soft body (cloth, rope, etc.) position-based dynamics
CPU and GPU rendering support
Quickstart Guide: https://bit.ly/36HKWyd
Where is Bullet Physics Used?
Films
Games
Pybullet in Robotics
Erwin Coumans - Google Robotics (2014—)
Several Pybullet + robotics references on: https://pybullet.org
S. James et al. “Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks,” CVPR, 2019.
A. Singla et al. “Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives,” ICRA, 2019.
J. Tan et al. “Sim-to-Real: Learning Agile Locomotion For Quadruped Robots,” RSS, 2018.
OpenAI Gym
Atari Breakout
Pendulum
Google Deepmind
Fetch Robot
Pybullet Roboschool
Why Physics Simulation?
Safely learn
Parallelize data collection
Simulate many humans
Key features
of Assistive Gym
Environments
PR2
Jaco
Baxter
Sawyer
Human Model
Male and female models
40 controllable joints
Head
Torso
Waist
Arms / hands
Legs / feet
Robot Base Positioning
Joint-limit-weighted kinematic isotropy (JLWKI)
High manipulability near goals
Human Preferences
Human Preferences
Environment
Robot
Human
Physical
Mental
+
rR(s)
rH(s)
r(s)
What can a robot learn with a static human?
How?
Assistive Gym Examples
Random Actions
import gym, assistive_gym
env = gym.make('FeedingJacoHuman-v0')
env.render()
observation = env.reset()
while True:
env.render()
action = env.action_space.sample() # Get a random action
observation, reward, done, info = env.step(action)
Teleoperation
Open teleop_example.py
target_joint_positions = p.calculateInverseKinematics(env.robot, 8, position, orientation)
joint_positions, joint_velocities, joint_torques = env.get_motor_joint_states(env.robot)
joint_action = (target_joint_positions - joint_positions) * 10
observation, reward, done, info = env.step(joint_action)
Reinforcement Learning
PPO - Proximal Policy Optimization
Environment
Agent
Observation
Reward
Action
What about active human motion?
Co-optimization
Environment
Agents
Observation
Reward
Action
Action
Can Assistive Gym help train real robots?
Let’s Run Pretrained Policies
https://github.com/Healthcare-Robotics/assistive-gym/wiki
Let’s Train a New Control Policy
Training Command:
python3 -m ppo.train --env-name "ScratchItchJaco-v0" --num-env-steps 100000 --save-dir ./trained_models_new/
Now, evaluate the trained policy:
python3 -m ppo.enjoy --env-name "ScratchItchJaco-v0" --load-dir trained_models_new/ppo
We can use the --load-policy argument to continue training an existing policy.
Basic Code Structure
class NewTaskEnv(AssistiveEnv):
def __init__(self, robot_type='pr2', human_control=False):
...
def step(self, action):
...
def _get_obs(self, forces, forces_human):
...
def reset(self):
...