1 of 26

Robotic Grasp Detection

2 of 26

The Cornell Grasping Dataset

3 of 26

The Cornell Grasping Dataset

4 of 26

height

width

cos(2θ)

sin(2θ)

How To Grasp...

(x, y)

5 of 26

Convolutional, 64 filters, 5x5 size

Convolutional, 128 filters, 3x3 size

Convolutional, 128 filters, 3x3 size

Convolutional, 128 filters, 3x3 size

Convolutional, 256 filters, 3x3 size

Fully Connected, 512 Outputs, Dropout = .5

Fully Connected, 512 Outputs, Dropout = .5

Fully Connected, 6 Outputs

Architecture: Direct Regression to Grasps

(x, y)

height

width

cos(2θ)

sin(2θ)

Big Assumption:

1 grasp per image

6 of 26

Algorithm

Image-wise split accuracy

Object-wise split accuracy

Time per image

2-stage sliding window SVM, static features

60.5%

58.3%

Unknown

2-stage sliding window, deep features

73.9%

75.6%

13.5 sec

Deep CNN Regression

85.1%

84.5%

76 ms

Direct Regression to Grasps Works!

7 of 26

Average Grasps Are Awesome!!!

8 of 26

Average Grasps Are Awesome!!! (Right up until they’re not….)

9 of 26

Average Grasps Are Awesome!!! (Right up until they’re not….)

10 of 26

New System, Predict Local Bounding Boxes.

11 of 26

New System, Predict Local Bounding Boxes.

12 of 26

New System, Predict Local Bounding Boxes.

13 of 26

New System, Predict Local Bounding Boxes.

14 of 26

New System, Predict Local Bounding Boxes.

15 of 26

Convolutional, 64 filters, 5x5 size

Convolutional, 128 filters, 3x3 size

Convolutional, 128 filters, 3x3 size

Convolutional, 128 filters, 3x3 size

Convolutional, 256 filters, 3x3 size

Fully Connected, Output NxNx7 Grid

Fully Connected, 512 Outputs, Dropout = .5

Architecture: Direct Regression to Grasps

New Assumption:

1 grasp per NxN patch

16 of 26

Output = Grasps + Weights

Grasp Coordinates

Heatmap Of Grasp Probability

17 of 26

Learning

Only back-propagate error for the ground-truth grasps.

Back-propagate error for full heatmap

18 of 26

Examples

19 of 26

Examples

20 of 26

Examples

21 of 26

Examples

22 of 26

Examples

23 of 26

Examples

24 of 26

Examples

25 of 26

Examples

26 of 26

Algorithm

Image-wise split accuracy

Object-wise split accuracy

Time per image

2-stage sliding window SVM, static features

60.5%

58.3%

Unknown

2-stage sliding window, deep features

73.9%

75.6%

13.5 sec

Deep CNN Regression

85.1%

84.5%

76 ms

Grassroots Detection

88.2%

88.6%

76 ms

Grassroots Works Better!!!