1
Excavation of Fragmented Rocks with Multi-modal Model-based Reinforcement Learning
Yifan Zhu, Liyang Wang, and Liangjun Zhang
2
Zhang et al.
Why excavator automation?
3
[1] L. Zhang, J. Zhao, P. Long, L. Wang, L. Qian, F. Lu, X. Song, and D. Manocha, “An autonomous excavator system for material loading tasks,” Science Robotics, vol. 6, no. 55, 2021.
[2] R. J. Sandzimier and H. H. Asada, “A Data-Driven Approach to Prediction and Optimal Bucket-Filling Control for Autonomous Excavators,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2682–2689, 2020.
[3] Y. Yang, P. Long, X. Song, J. Pan, and L. Zhang, “Optimization-based framework for excavation trajectory generation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1479–1486, 2021.
[4] O. Azulay and A. Shapiro, “Wheel Loader Scooping Controller Using Deep Reinforcement Learning,” IEEE Access, vol. 9, pp. 24 145–24 154, 2021.
[5] S. Dadhich, U. Bodin, F. Sandin, and U. Andersson, “Machine learning approach to automatic bucket loading,” in 24th Mediterranean Conference on Control and Automation, 2016.
Why excavator automation?
4
How do human experts perform fragmented rocks excavation?
5
Can we equip automated excavators with similar capabilities compared to human operators?
How do human experts perform fragmented rocks excavation?
6
Fragmented Rocks Excavation
7
Problem Definition
penetration
Dragging
Closing & Lifting
8
Experiment Setup
RGB Camera
Franka Panda robot arm equipped with digging bucket
Wood blocks vary from 1 - 4 in to emulate fragmented rocks
9
Action Space
x
y
z
10
Method
Data collection with random actions and scripted policies
Multi-modal dynamics learning
Offline
Online
Model-predictive controller for reference trajectory tracking
11
Method
Data collection with random actions and scripted policies
Multi-modal dynamics learning
Offline
Online
Model-predictive controller for reference trajectory tracking
12
Method – Multi-modal Dynamics Learning
RGB Image
CNN
Pose
Force
GRU Encoder
Force
[4 x 512]
[3 x 512]
[3076 x 1024]
Action
[1024 x 4]
GRU Decoder
Pose
[1024 x 3]
[10 x 1024]
Attention
13
Method
Multi-modal Dynamics Learning
Offline
Online
Data collection with random actions and scripted policies
Model-predictive controller for reference trajectory tracking
14
Method – Planning
Contact force
Cost
15
Method – Planning
https://en.wikipedia.org/wiki/Monte_Carlo_tree_search
16
Experiments - Dynamics
17
Experiments – RNN vs Baseline
Dynamics prediction error with different history and prediction lengths
18
Experiments – Dynamics Ablation Studies
19
20
Experiments – Planner
21
Experiments – Planner
x30
Method: MCTS, k = 5
Reference Trajectory: Deep
Result: jammed
Result: successfully excavated objects
Experiments – Planner
23
Failure case
24
Conclusion and Future Work
Future work:
25
Experiments – Planner
26
Experiments – Planner
27
Experiments – Dynamics Ablation Studies