Helping robots to help humans
Luca Minciullo, Lifehikes
DB-GAN: Boosting Object Recognition Under Strong Lighting Conditions
Luca Minciullo*, Fabian Manhardt*, Kei Yoshikawa, Sven Meier, Federico Tombari and Norimasa Kobori
Toyota Motor Europe, Technical University of Munich, Woven Core
*equal contribution
Motivation
Existing works
Difference of gaussians filters
Lighting is removed together with most texture information
EnlightenGAN[1]
Dark -> Bright or viceversa
None of these images is generated to explicitly maximise detection accuracy
RetinexNet[2]
Color constancy
DeepUPE[3]
GAN-based image enhancement
Training Data
Background images are patches from PHOS dataset*(15 static scenes under 15 different lighting conditions)
Object is rendered with a random pose, synthetic lighting is applied on the rendered object.
*https://sites.google.com/site/vonikakis/datasets
Input
Output
Our approach
Quantitative Results�Experiments on the test sets from two BOP datasets*
SSD with | Toyota Light mAP | TUD Light mAP |
DoG | 0.20 | 0.36 |
enlightenGAN[1] | 0.29 | 0.43 |
RetinexNet[2] | 0.28 | 0.62 |
deepUPE[3] | 0.29 | 0.47 |
baseline | 0.27 | 0.18 |
DB-GAN | 0.72 | 0.66 |
2D Object Detection Results
SSD with | Toyota Light mAP w\o ICP | Toyota Light mAP w\ICP | TUD Light mAP w\o | TUD Light w\ ICP |
DoG | 0.35 | 0.37 | 0.14 | 0.19 |
enlightenGAN[1] | 0.30 | 0.34 | 0.157 | 0.21 |
RetinexNet[2] | 0.32 | 0.36 | 0.13 | 0.19 |
deepUPE[3] | 0.34 | 0.38 | 0.12 | 0.18 |
baseline | 0.23 | 0.32 | 0.159 | 0.155 |
DB-GAN | 0.42 | 0.44 | 0.164 | 0.25 |
6D Object Pose Estimation Results
*https://bop.felk.cvut.cz/home/
Losses Used | mAP |
L1 | 0.55 |
+Perceptual | 0.67 |
+ Global Discriminator | 0.66 |
+ Local Discriminator | 0.60 |
+ SSD Loss | 0.72 |
Loss Ablation study�2D detection on Toyota Light
Qualitative Results(2D)
Qualitative Results(6D)
EnlightenGAN
RetinexNet
DeepUPE
DB-GAN
Toyota TrueBlue dataset
Results: Toyota TrueBlue
method | mAP |
baseline | 0.39 |
Baseline + color augmentation | 0.54 |
DB-GAN | 0.73 |
Baseline
DB-GAN
DB-GAN LIVE CAMERA DEMO
camera
SSD-GAN
output
SSD output
SSD output
DemoGrasp: Few-Shot Learning for Robotic Grasping with Human Demonstration�
Pengyuan Wang*, Fabian Manhardt*, Luca Minciullo, Lorenzo Garattoni, Sven Meier, Nassir Navab and Benjamin Busam
Toyota Motor Europe, Technical University of Munich, Woven Core
*equal contribution
The robotics grasping problem
[1] https://www.toyota-global.com/innovation/partner_robot
15
YCB datasets [2]
DenseFusion [3]
[2] C. Berk, S. Arjun, B. James, W. Aaron, K. Kurt, S. Siddhartha, A. Pieter, D. Aaron M. Yale-CMU-Berkeley dataset for robotic manipulation research, IJRR, 2017.
[3] W. Chen, X. Danfei, Z. Yuke, M. Roberto, L. Cewu, F. Li, S. Silvio. Densefusion: 6d object pose estimation by iterative dense fusion, CVPR, 2019.
Existing solutions: Model-based Grasping
16
New objects outside the dataset?
...etc.
Model-based Grasping
17
Dex-Net [4]
[4] M. Jeffrey, L. Jacky, N. Sherdil, L. Michael, D. Richard, L. Xinyu, O. Juan Aparicio, G. Ken. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics, arXiv preprint, 2017.
[5] A. Mousavian, C. Eppner, D. Fox. 6-dof graspnet: Varational grasp generation for object manipulation, ICCV, 2019.
Existing solutions: Model-free Grasping
18
Idea
19
Method overview
Segmentation of Hand and Object
Hand-Object Interaction
20
[7] K. He, G. Gkioxari, P. Dollar, R. Girshick. Mask r-cnn, ICCV, 2017
Segmentation masks of hand and object
Point cloud extracted from fused TSDF volume
[8] R. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. Davison, P. Kohli, J. Shotton, S. Hodges, A. Fitzgibbon. KinectFusion: Real-Time Dense Surface Mapping and Tracking, ISMAR, 2011
Method: Learning phase
Hand Pose Alignment
21
[11] Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. Black, I. Laptev, C. Schmid. Learning joint reconstruction of hands and manipulated objects , CVPR, 2019
Hand mesh aligned with fused point cloud
Method: hand-object interaction(I)
Additional Object Shape Completion
22
[9] O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation, MICCAI, 2015.
[10] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár. Focal loss for dense object detection, ICCV, 2017.
Focal loss [10] as loss function, where Pos and Neg represent occupied and empty voxels and γ is set as 2
Method: hand-object interaction(II)
Object Point Cloud Registration
23
[12] D. Haowen, B. Tolga, S. Ilic. PPF-FoldNet: Unsupervised learning of rotation invariant 3d local descriptors, ECCV, 2018
Point pair feature: selected points m1, m2. difference vector d. normals n1, n2 [12]
Method: match object mesh to the scene
24
Gripper grasp instruction from hand pose
Method: Grasping Instruction Retrieval
Simulation Setup
25
Test objects: Shampoo, Drill, Hole Punch, Cookie Box
Evaluation
Evaluation Metric
26
Evaluation result
Evaluation
Real World Evaluation
27
Scene setup
Test objects
Evaluation
Evaluation Metric
28
Evaluation result
Evaluation
Q&A