1 of 56

Presenter (15mins)

Himangi Mittal <hmittal@andrew.cmu.edu>

2 of 56

Learning a Unified Policy for Whole-Body Control ofManipulation and Locomotion

Zipeng Fu*, Xuxin Cheng*, Deepak Pathak�CoRL 2022 (Oral)

3 of 56

Problem Statement/Motivation

Legged-only robots have achieved impressive performance in the last decade in challenging outdoor and indoor terrain.

Kumar, Ashish, et al. "Rma: Rapid motor adaptation for legged robots." arXiv preprint arXiv:2107.04034 (2021).

4 of 56

Problem Statement/Motivation

However, legged-only robots have strong limitations in what they can achieve.

Pick and place objects

Pressing button

Everyday tasks require some form of manipulation.

5 of 56

Attaching an arm can significantly increase the ability of the legged robots to several mobile manipulation tasks

6 of 56

Attaching an arm can significantly increase the ability of the legged robots to several mobile manipulation tasks

Not an easy task!

7 of 56

Challenges (1)

High-DoF control

  • Robot shown in the Figure has 19 degrees of freedom.
  • Control is dynamic and continuous which leads to an exponentially large search space.

8 of 56

Challenges (2)

Conflicting policies

  • Training can be prone to learn only one of either locomotion or manipulation well.

9 of 56

Challenges (3)

Dependency

  • Performance of manipulation is bounded until can adapt
  • Coordination is needed between arms and legs
  • Error propagation between modules

10 of 56

Challenges (4)

Cost ($$$$$) and Hardware

Spot Arm from Boston Dynamics

(has pre-designed controllers, cannot be changed)

ANYmal robot with a custom arm (ANYBotics)

11 of 56

Challenges (4)

Cost ($$$$$) and Hardware

Spot Arm from Boston Dynamics

(has pre-designed controllers, cannot be changed)

ANYmal robot with a custom arm (ANYBotics)

Expensive ($100K)!!

12 of 56

Related Work (1)

Go Fetch! - Dynamic Grasps using Boston Dynamics Spot with ExternalRobotic Arm (Simon Zimmermann, Roi Poranne, Stelian Coros)

13 of 56

Related Work (2)

Combining Learning-based Locomotion Policy with Model-basedManipulation for Legged Mobile Manipulators (Yuntao Ma, Farbod Farshidian, Takahiro Miki, Joonho Lee, Marco Hutter)

14 of 56

Related Work (2)

ALMA - Articulated Locomotion and Manipulation for a Torque-Controllable Robot (C. Dario Bellicoso, Koen Kr ̈amer, Markus St ̈auble, Dhionis Sako, Fabian Jenelten, Marko Bjelonic, Marco Hutter)

15 of 56

Related Work (3)

RMA: Rapid Motor Adaptation for Legged Robots

(Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik)

Has

Has a similar online real-time adaptation module which works on a diverse set of environment configurations.

16 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm

17 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm

Input

  • Current base state
  • Arm state
  • Leg State
  • Last Action
  • Base Velocity Command
  • End-effector position and orientation command

18 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm

Input

  • Current base state
  • Arm state
  • Leg State
  • Last Action
  • Base Velocity Command
  • End-effector position and orientation command

19 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm

Input

  • Current base state
  • Arm state
  • Leg State
  • Last Action
  • Base Velocity Command
  • End-effector position and orientation command

Output

  • Target arm joint position
  • Target leg joint position

20 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm.
  • Advantage Mixing for locomotion and manipulation.

21 of 56

Contributions and Proposed Solution

  • Unified policy (∏) to control and coordinate both legs and arm.
  • Advantage Mixing for locomotion and manipulation.
  • Regularized Online Adaptation for Sim-to-Real Transfer.

22 of 56

Hardware setup

The robot platform is comprised of a Unitree Go1 quadraped with 12 actuatable DoFs, and a robot arm which is the 6-DoF Interbotix WidowX 250s with a parallel gripper. We mount the arm on top of the quadruped. The RealSense D435 provides RGB visual information and is mounted close to the gripper of WidowX. Both power of Go1 and WidowX are provided by Go1’s onboard battery. Neural network inference is also done onboard of Go1. Our robot system uses only onboard computation and power so it is fully untethered.

23 of 56

24 of 56

25 of 56

26 of 56

27 of 56

28 of 56

29 of 56

30 of 56

31 of 56

Advantage Mixing for locomotion and manipulation

  • Advantage function can be considered as an inductive bias.
  • Mathematically, it is the difference between Q-value for a given state-action pair and the value function of the state.

: policy

: target leg joint position

: state

:Advantage function of locomotion

:Advantage function of manipulation

: target arm joint position

: beta curriculum parameter linearly increasing from 0 to 1

32 of 56

Regularized Online Adaptation for Sim-to-Real Transfer

: parameters of the policy

: parameters of the encoder

: parameters of the adaptation module

: Encoder

: stop-gradient

: adaptation module

33 of 56

Evaluation Questions

  • Does the unified policy improve over separate policies for the arm and legs? If so, how?
  • How Advantage Mixing helps learning the unified policy?
  • What’s the performance of Regularized Online Adaptation compared with other Sim2Real methods?

34 of 56

Baselines and Metrics

Baselines

  • Separate policies for legs and the arm
  • One uncoordinated policy
  • Rapid Motor Adaptation (RMA)
  • Expert policy
  • Domain Randomization

Metrics

  • Survival percentage
  • Base Acceleration
  • Velocity Error
  • EE error
  • Total Energy

35 of 56

Dataset (Simulation Environment)

36 of 56

Results (1) (Simulation Environment)

  • Unified policy outperforms separate and uncoordinated policies

37 of 56

Results (2) (Simulation Environment)

  • Unified policy Increase Whole-Body coordination

38 of 56

Results (3) (Simulation Environment)

  • Advantage Mixing Helps Learning the Unified Policy

39 of 56

Results (4) (Simulation Environment)

  • Robust out-of-distribution performance of proposed Regularized Online Adaptation

40 of 56

Results (1) (Real-World Environment)

  • Unified Policy enables whole-body coordination where the leg joints and arm joints help reaching

41 of 56

Dataset (Real-World Environment)

42 of 56

Results (2) (Real-World Environment)

  • Vision guided Tracking

43 of 56

Results (2) (Real-World Environment)

  • Vision guided Tracking

44 of 56

Results (3) (Real-World Environment)

  • Open-loop Control from Demonstration

45 of 56

Results (Videos)

46 of 56

Strengths

47 of 56

Strength 1

Both the real-world and simulation experiments demonstrate that the proposed learning-based method is able to perform well on complex tasks that require coordination of arms and legs.

48 of 56

Strength 2

Technical strengths

  • The framework incorporates a sim-to-real transfer module with the help of regularized online adaptation which makes the proposed work more robust.
  • Unified policy for the whole-body legged robot which is helpful in controlling and coordinating both legs and arms
  • Advantage mixing is a simple and effective technique which introduces inductive bias for manipulation and locomotion.

49 of 56

Strength 3

Proposes a low-cost hardware setup for academic research labs. Reduces the cost from $100K to $6K.

50 of 56

Weaknesses

Himangi Mittal <hmittal@andrew.cmu.edu>

51 of 56

Weakness 1

Generalization of the method to other type of object interactions :

  • Soft object manipulation
  • Grasping in occluded scenes

52 of 56

Weakness 2

High input dimension : Although it does not impact training time, but training is prone to error if multiple dimensions have external noise.

53 of 56

Weakness 3

Adaptation of online module to changing dynamics when a few degree of freedom is lost

  • If one of the joint gets broken, can the robot still adapt to this situation?
  • New constraints in the environment such as oily/slippery surface.

54 of 56

TL;DR/Summary: Key Insights

  • Compared to legged-only robots, an attached arms enhances the ability of the legged-robot to perform everyday manipulation tasks.
  • This work proposes a unified whole-body control framework which is helpful in coordinating the legs and the arms of a robot.
  • A learned adaptation strategy is shown which can be used to transfer knowledge from simulator to real-world
  • Finally, this work contributes a low-cost hardware setup (~$6K) for academic research labs.

55 of 56

QnA (1mins)

56 of 56

5 Discussion Points – send to TA, don’t include in slides

Sim2Real Table Tennis

  1. What type of human behavior can be practically modeled with the i-S2R approach?
  2. Can we use 2 humans with ego-centric cameras on them to get data for human behavior modeling instead of iteratively interacting with the robot?
  3. Can the robot handle adversarial human interaction? The human picks a new adversarial behavior in every iteration (The human plays cooperatively in the actual paper setting)

Unified Whole-Body Control of Manipulation and Locomotion

4. If a legged robot has two arms : a). What additional challenges would need to be addressed? b). Are there any added benefits of adding extra arms? If yes, what can the benefits be?

5. What are the different scenarios which can be tested to analyze the robustness of the robot (for example, slippery/snowy/rough terrains, how much weight can the arm hold and the robot can still maintain its pose and move)?