JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 29

Helping robots to help humans

Luca Minciullo, Lifehikes

2 of 29

DB-GAN: Boosting Object Recognition Under Strong Lighting Conditions

Luca Minciullo*, Fabian Manhardt*, Kei Yoshikawa, Sven Meier, Federico Tombari and Norimasa Kobori

Toyota Motor Europe, Technical University of Munich, Woven Core

*equal contribution

3 of 29

Motivation

Lighting conditions

vary over time
affect recognition accuracy

Acquiring a dataset with the required lighting variation is impractical.
Indoor performance is affected more than we realize

4 of 29

Existing works

Difference of gaussians filters

Lighting is removed together with most texture information

EnlightenGAN[1]

Dark -> Bright or viceversa

None of these images is generated to explicitly maximise detection accuracy

RetinexNet[2]

Color constancy

DeepUPE[3]

GAN-based image enhancement

5 of 29

Training Data

Background images are patches from PHOS dataset*(15 static scenes under 15 different lighting conditions)

Object is rendered with a random pose, synthetic lighting is applied on the rendered object.

*https://sites.google.com/site/vonikakis/datasets

Input

Output

6 of 29

Our approach

Based on the Pix2Pix encoder-decoder architecture
Joint learning of image lighting normalization and object detection

7 of 29

Quantitative Results�Experiments on the test sets from two BOP datasets*

SSD with	Toyota Light mAP	TUD Light mAP
DoG	0.20	0.36
enlightenGAN[1]	0.29	0.43
RetinexNet[2]	0.28	0.62
deepUPE[3]	0.29	0.47
baseline	0.27	0.18
DB-GAN	0.72	0.66

2D Object Detection Results

SSD with	Toyota Light mAP w\o ICP	Toyota Light mAP w\ICP	TUD Light mAP w\o	TUD Light w\ ICP
DoG	0.35	0.37	0.14	0.19
enlightenGAN[1]	0.30	0.34	0.157	0.21
RetinexNet[2]	0.32	0.36	0.13	0.19
deepUPE[3]	0.34	0.38	0.12	0.18
baseline	0.23	0.32	0.159	0.155
DB-GAN	0.42	0.44	0.164	0.25

6D Object Pose Estimation Results

*https://bop.felk.cvut.cz/home/

Losses Used	mAP
L1	0.55
+Perceptual	0.67
+ Global Discriminator	0.66
+ Local Discriminator	0.60
+ SSD Loss	0.72

Loss Ablation study�2D detection on Toyota Light

8 of 29

Qualitative Results(2D)

9 of 29

Qualitative Results(6D)

EnlightenGAN

RetinexNet

DeepUPE

DB-GAN

10 of 29

Toyota TrueBlue dataset

11 scenes
11 different color temperatures per scene
5 objects

2D Bbox annotation available
3D object model also available

Checker-board in the scene

11 of 29

Results: Toyota TrueBlue

method	mAP
baseline	0.39
Baseline + color augmentation	0.54
DB-GAN	0.73

Baseline

DB-GAN

12 of 29

DB-GAN LIVE CAMERA DEMO

camera

SSD-GAN

output

SSD output

13 of 29

DemoGrasp: Few-Shot Learning for Robotic Grasping with Human Demonstration�

Pengyuan Wang*, Fabian Manhardt*, Luca Minciullo, Lorenzo Garattoni, Sven Meier, Nassir Navab and Benjamin Busam

Toyota Motor Europe, Technical University of Munich, Woven Core

*equal contribution

14 of 29

The robotics grasping problem

Layman Definition
A robot can grasp an object if it can pick it up from a surface, hold it for some time and release the object safely in another location.
Motivation
Pick & place
Manufacturing/automation
Assistive robotics
Problem
How/where to place their gripper for a given object?
How to acquire this ability for new objects?

[1] https://www.toyota-global.com/innovation/partner_robot

15 of 29

3D object models (CAD)
Grasping points pre-assigned to the object by a technical person
Problem is solved by 6D pose estimation solution, when model is fit to the scene, grasping point can be retrieved for free

YCB datasets [2]

DenseFusion [3]

[2] C. Berk, S. Arjun, B. James, W. Aaron, K. Kurt, S. Siddhartha, A. Pieter, D. Aaron M. Yale-CMU-Berkeley dataset for robotic manipulation research, IJRR, 2017.

[3] W. Chen, X. Danfei, Z. Yuke, M. Roberto, L. Cewu, F. Li, S. Silvio. Densefusion: 6d object pose estimation by iterative dense fusion, CVPR, 2019.

Existing solutions: Model-based Grasping

16 of 29

New objects outside the dataset?

...etc.

Model-based Grasping

Collect CAD model (3D scanner)
Assign grasping point manually
Training data needs to be collected for each object / generated by simulation
Pose estimation model needs to be re-trained for any new object

17 of 29

Dex-Net [4], GraspNet [5]
Grasp point proposals from neural networks
Large amount of real training data needed
As we will see, the accuracy of this type of solution is far from ideal

Dex-Net [4]

[4] M. Jeffrey, L. Jacky, N. Sherdil, L. Michael, D. Richard, L. Xinyu, O. Juan Aparicio, G. Ken. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics, arXiv preprint, 2017.

[5] A. Mousavian, C. Eppner, D. Fox. 6-dof graspnet: Varational grasp generation for object manipulation, ICCV, 2019.

Existing solutions: Model-free Grasping

18 of 29

Idea

Human demonstration

a person shows the robot how to grasp a new object

Robot learns to transfer the demonstration to a behaviour it can perform and reproduce
Pros:

No expert knowledge needed
No object scanning
No re-training

Cons:

Inference was not realtime

19 of 29

Method overview

Learning Phase

Collect an RGB-D sequence
Segment both hand and object
Fuse the sequence into hand-object mesh

Hand-Object Interaction

Separate hand mesh from object mesh
Perform object completion on object mesh
Fit a hand model to the hand mesh
Grasping points are defined based on the fingers’ location

Grasping

Load a RGB-D test scene
Match the object mesh to the scene
Hand mesh is matched as a result
Retrieve grasping points

20 of 29

Segmentation of Hand and Object

Mask R-CNN [7]
Binary cross entropy for each class
Preventing inter-class competition

Hand-Object Interaction

Simultaneously track the hand and object together
the hand and object reconstructed in TSDF volume following KinectFusion [8]

[7] K. He, G. Gkioxari, P. Dollar, R. Girshick. Mask r-cnn, ICCV, 2017

Segmentation masks of hand and object

Point cloud extracted from fused TSDF volume

[8] R. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. Davison, P. Kohli, J. Shotton, S. Hodges, A. Fitzgibbon. KinectFusion: Real-Time Dense Surface Mapping and Tracking, ISMAR, 2011

Method: Learning phase

21 of 29

Hand Pose Alignment

Hand mesh from RGB image leveraging [11]
Point cloud extracted from partial hand reconstruction
ICP to tightly align both together

[11] Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. Black, I. Laptev, C. Schmid. Learning joint reconstruction of hands and manipulated objects , CVPR, 2019

Hand mesh aligned with fused point cloud

Method: hand-object interaction(I)

22 of 29

Additional Object Shape Completion

3D U-Net [9] with skip-connections
Fused TSDF volume as input
Predicts 64x64x64 voxels with binary classification scores (object vs no-object)

[9] O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation, MICCAI, 2015.

[10] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár. Focal loss for dense object detection, ICCV, 2017.

Focal loss [10] as loss function, where Pos and Neg represent occupied and empty voxels and γ is set as 2

Method: hand-object interaction(II)

23 of 29

Object Point Cloud Registration

PPF-FoldNet [12]
collect point pair features in local patches
generate encoded descriptors for surrounding point pair features
Registration of point cloud in the scene using RANSAC

[12] D. Haowen, B. Tolga, S. Ilic. PPF-FoldNet: Unsupervised learning of rotation invariant 3d local descriptors, ECCV, 2018

Point pair feature: selected points m1, m2. difference vector d. normals n1, n2 [12]

Method: match object mesh to the scene

24 of 29

6D robotic gripper pose from hand mesh
Grasp point from middle point between the index and thumb locations
Grasp direction from wrist and grasp point

Gripper grasp instruction from hand pose

Method: Grasping Instruction Retrieval

25 of 29

Simulation Setup

Human Support Robot (HSR)
Four test objects
Simulation scene in Gazebo
Object placed on table with random position and rotation

Test objects: Shampoo, Drill, Hole Punch, Cookie Box

Evaluation

26 of 29

Evaluation Metric

15 trials for each object
6DoF GraspNet [5] as baseline
Success if the robot grasps the object and hold it for 5 seconds without dropping it

Evaluation result

Evaluation

27 of 29

Real World Evaluation

Similar setup as synthetic evaluation

Scene setup

Test objects

Evaluation

28 of 29

Evaluation Metric

For each object, learning sequence from side and top
Each learning sequence 9 trials, each object 18 trials
Overall 72 trials

Evaluation result

Evaluation

29 of 29

Q&A