1 of 27

1

Excavation of Fragmented Rocks with Multi-modal Model-based Reinforcement Learning

Yifan Zhu, Liyang Wang, and Liangjun Zhang

2 of 27

2

Zhang et al.

Why excavator automation?

Excavator automation reduces cost and improves safety

3 of 27

3

[1] L. Zhang, J. Zhao, P. Long, L. Wang, L. Qian, F. Lu, X. Song, and D. Manocha, “An autonomous excavator system for material loading tasks,” Science Robotics, vol. 6, no. 55, 2021.

[2] R. J. Sandzimier and H. H. Asada, “A Data-Driven Approach to Prediction and Optimal Bucket-Filling Control for Autonomous Excavators,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2682–2689, 2020.

[3] Y. Yang, P. Long, X. Song, J. Pan, and L. Zhang, “Optimization-based framework for excavation trajectory generation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1479–1486, 2021.

[4] O. Azulay and A. Shapiro, “Wheel Loader Scooping Controller Using Deep Reinforcement Learning,” IEEE Access, vol. 9, pp. 24 145–24 154, 2021.

[5] S. Dadhich, U. Bodin, F. Sandin, and U. Andersson, “Machine learning approach to automatic bucket loading,” in 24th Mediterranean Conference on Control and Automation, 2016.

Why excavator automation?

Excavator automation reduces cost and improves safety
While many focus on the excavation of soil [1,2,3], relatively few have worked on fragmented rocks [4,5]

Interaction between bucket and rocks are extremely complex

4 of 27

4

Human operators use several modalities of information

Visual, tactile, proprioceptive, auditory

Continuously adapt to the terrain
Use strategies like “wiggling” to avoid getting stuck

How do human experts perform fragmented rocks excavation?

5 of 27

5

Human operators use several modalities of information

Visual, tactile, proprioceptive, auditory

Continuously adapt to the terrain
Use strategies like “wiggling” to avoid getting stuck

Can we equip automated excavators with similar capabilities compared to human operators?

How do human experts perform fragmented rocks excavation?

6 of 27

6

Fragmented Rocks Excavation

Contributions:

A multi-modal model-based reinforcement learning (MBRL) approach to fragmented rocks excavation

Excavation domain knowledge is encoded into a discrete set of primitive motion
A dynamics model based on a recurrent neural network (RNN) learned from a small amount of real-world data
Model-predictive controller (MPC) for closed-loop planning

Significantly outperforms manually tuned excavation strategies

7 of 27

7

Problem Definition

penetration

Dragging

Closing & Lifting

Plan discrete actions to track a global reference trajectory while avoiding getting stuck

8 of 27

8

Experiment Setup

RGB Camera

Franka Panda robot arm equipped with digging bucket

Wood blocks vary from 1 - 4 in to emulate fragmented rocks

Robot “jams” when excessive external force is detected

9 of 27

9

Action Space

Discrete action space: 9 actions

Cartesian movements in x,y,z for 1.5 cm
Rotate bucket for 8 degrees
Wiggle in x direction

Robot joint impedance control at 100Nm/rad

x

y

z

10 of 27

10

Method

Data collection with random actions and scripted policies

Multi-modal dynamics learning

Offline

Online

Model-predictive controller for reference trajectory tracking

11 of 27

11

Method

Data collection with random actions and scripted policies

Multi-modal dynamics learning

Offline

Online

Model-predictive controller for reference trajectory tracking

12 of 27

12

Method – Multi-modal Dynamics Learning

Excavation dynamics is sequential in nature
Use a history of past states to predict future states

RGB Image

CNN

Pose

Force

GRU Encoder

Force

[4 x 512]

[3 x 512]

[3076 x 1024]

Action

[1024 x 4]

GRU Decoder

Pose

[1024 x 3]

[10 x 1024]

Attention

13 of 27

13

Method

Multi-modal Dynamics Learning

Offline

Online

Data collection with random actions and scripted policies

Model-predictive controller for reference trajectory tracking

14 of 27

14

Method – Planning

Model-predictive control (MPC)

Compute controls for a horizon k, apply the first control, and recompute controls

Cost function:

Contact force

Cost

15 of 27

15

Method – Planning

Brute-force search

Depending on the horizon

Random shooting (RS)
Monte-Carlo Tree Search (MCTS)

https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

16 of 27

16

Experiments - Dynamics

Data collection

Random controls and scripted policies ~ only 300 trajectories with about 100 actions/trajectory

Dynamics learning

Feed-forward NN baseline (FF)

Prediction horizon = 5
3 models for history length 1,3, and 5

Cost function: L2 loss

17 of 27

17

Experiments – RNN vs Baseline

Dynamics prediction error with different history and prediction lengths

18 of 27

18

Experiments – Dynamics Ablation Studies

History length = 7
Prediction horizon = 7

19 of 27

19

20 of 27

20

Experiments – Planner

Baseline planners

1) Moving reference point, select action that reduces most tracking error (“follow”)
2) follow + wiggling (“closed-loop)
3) Plan an open loop trajectory, wiggling every 2 controls (“open-loop”)

Test 10 trials on each of the 3 reference paths (shallow/medium/deep)

21 of 27

21

Experiments – Planner

22 of 27

x30

Method: MCTS, k = 5

Reference Trajectory: Deep

Result: jammed

Result: successfully excavated objects

Experiments – Planner

23 of 27

23

Failure case

24 of 27

24

Conclusion and Future Work

A model-based reinforcement learning approach for rock excavation with a small amount of real-world data
The tactile modality is more informative than vision

Future work:

Experiments on excavators
Dynamics is inherently stochastic
Combine with a global planner

25 of 27

25

Experiments – Planner

26 of 27

26

Experiments – Planner

27 of 27

27

Experiments – Dynamics Ablation Studies

History length = 7
Prediction horizon = 7