1 of 27

1

Excavation of Fragmented Rocks with Multi-modal Model-based Reinforcement Learning

Yifan Zhu, Liyang Wang, and Liangjun Zhang

2 of 27

2

Zhang et al.

Why excavator automation?

  • Excavator automation reduces cost and improves safety

3 of 27

3

[1] L. Zhang, J. Zhao, P. Long, L. Wang, L. Qian, F. Lu, X. Song, and D. Manocha, “An autonomous excavator system for material loading tasks,” Science Robotics, vol. 6, no. 55, 2021.

[2] R. J. Sandzimier and H. H. Asada, “A Data-Driven Approach to Prediction and Optimal Bucket-Filling Control for Autonomous Excavators,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2682–2689, 2020.

[3] Y. Yang, P. Long, X. Song, J. Pan, and L. Zhang, “Optimization-based framework for excavation trajectory generation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1479–1486, 2021.

[4] O. Azulay and A. Shapiro, “Wheel Loader Scooping Controller Using Deep Reinforcement Learning,” IEEE Access, vol. 9, pp. 24 145–24 154, 2021.

[5] S. Dadhich, U. Bodin, F. Sandin, and U. Andersson, “Machine learning approach to automatic bucket loading,” in 24th Mediterranean Conference on Control and Automation, 2016.

Why excavator automation?

  • Excavator automation reduces cost and improves safety
  • While many focus on the excavation of soil [1,2,3], relatively few have worked on fragmented rocks [4,5]
    • Interaction between bucket and rocks are extremely complex

4 of 27

4

  • Human operators use several modalities of information
    • Visual, tactile, proprioceptive, auditory
  • Continuously adapt to the terrain
  • Use strategies like “wiggling” to avoid getting stuck

How do human experts perform fragmented rocks excavation?

5 of 27

5

  • Human operators use several modalities of information
    • Visual, tactile, proprioceptive, auditory
  • Continuously adapt to the terrain
  • Use strategies like “wiggling” to avoid getting stuck

Can we equip automated excavators with similar capabilities compared to human operators?

How do human experts perform fragmented rocks excavation?

6 of 27

6

Fragmented Rocks Excavation

  • Contributions:
    • A multi-modal model-based reinforcement learning (MBRL) approach to fragmented rocks excavation
      • Excavation domain knowledge is encoded into a discrete set of primitive motion
      • A dynamics model based on a recurrent neural network (RNN) learned from a small amount of real-world data
      • Model-predictive controller (MPC) for closed-loop planning
    • Significantly outperforms manually tuned excavation strategies

7 of 27

7

Problem Definition

penetration

Dragging

Closing & Lifting

  • Plan discrete actions to track a global reference trajectory while avoiding getting stuck

8 of 27

8

Experiment Setup

RGB Camera

Franka Panda robot arm equipped with digging bucket

Wood blocks vary from 1 - 4 in to emulate fragmented rocks

  • Robot “jams” when excessive external force is detected

9 of 27

9

Action Space

  • Discrete action space: 9 actions
    • Cartesian movements in x,y,z for 1.5 cm
    • Rotate bucket for 8 degrees
    • Wiggle in x direction
  • Robot joint impedance control at 100Nm/rad

x

y

z

10 of 27

10

Method

Data collection with random actions and scripted policies

Multi-modal dynamics learning

Offline

Online

Model-predictive controller for reference trajectory tracking

11 of 27

11

Method

Data collection with random actions and scripted policies

Multi-modal dynamics learning

Offline

Online

Model-predictive controller for reference trajectory tracking

12 of 27

12

Method – Multi-modal Dynamics Learning

  • Excavation dynamics is sequential in nature
  • Use a history of past states to predict future states

 

 

RGB Image

CNN

Pose

Force

GRU Encoder

Force

 

[4 x 512]

[3 x 512]

[3076 x 1024]

Action

[1024 x 4]

GRU Decoder

Pose

[1024 x 3]

[10 x 1024]

Attention

13 of 27

13

Method

Multi-modal Dynamics Learning

Offline

Online

Data collection with random actions and scripted policies

Model-predictive controller for reference trajectory tracking

14 of 27

14

Method – Planning

  • Model-predictive control (MPC)
    • Compute controls for a horizon k, apply the first control, and recompute controls

  • Cost function:

Contact force

Cost

15 of 27

15

Method – Planning

    • Brute-force search
      • Depending on the horizon
    • Random shooting (RS)
    • Monte-Carlo Tree Search (MCTS)

https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

16 of 27

16

Experiments - Dynamics

  • Data collection
    • Random controls and scripted policies ~ only 300 trajectories with about 100 actions/trajectory

  • Dynamics learning
    • Feed-forward NN baseline (FF)
      • Prediction horizon = 5
      • 3 models for history length 1,3, and 5
    • Cost function: L2 loss

17 of 27

17

Experiments – RNN vs Baseline

Dynamics prediction error with different history and prediction lengths

18 of 27

18

Experiments – Dynamics Ablation Studies

  • History length = 7
  • Prediction horizon = 7

19 of 27

19

20 of 27

20

Experiments – Planner

  • Baseline planners
    • 1) Moving reference point, select action that reduces most tracking error (“follow”)
    • 2) follow + wiggling (“closed-loop)
    • 3) Plan an open loop trajectory, wiggling every 2 controls (“open-loop”)
  • Test 10 trials on each of the 3 reference paths (shallow/medium/deep)

21 of 27

21

Experiments – Planner

22 of 27

x30

Method: MCTS, k = 5

Reference Trajectory: Deep

Result: jammed

Result: successfully excavated objects

Experiments – Planner

23 of 27

23

Failure case

24 of 27

24

Conclusion and Future Work

  • A model-based reinforcement learning approach for rock excavation with a small amount of real-world data
  • The tactile modality is more informative than vision

Future work:

  • Experiments on excavators
  • Dynamics is inherently stochastic
  • Combine with a global planner

25 of 27

25

Experiments – Planner

26 of 27

26

Experiments – Planner

27 of 27

27

Experiments – Dynamics Ablation Studies

  • History length = 7
  • Prediction horizon = 7