1 of 69

Improving Robot Success Detection using

Static Object Data

Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa

1

2 of 69

2

3 of 69

3

4 of 69

4

sensor stream

classification

of

outcome

5 of 69

5

Is apple in bowl?

,

t=0

t=15

yes

6 of 69

6

however...

7 of 69

7

however... sensors are noisy

8 of 69

8

pre-manipulation:

post-manipulation:

9 of 69

9

pre-manipulation:

post-manipulation:

10 of 69

10

pre-manipulation:

post-manipulation:

11 of 69

11

sensor stream

classification

of

outcome

12 of 69

12

sensor stream

static object information

classification

of

outcome

13 of 69

13

sensor stream

size

classification

of

outcome

14 of 69

14

sensor stream

size

shape

classification

of

outcome

15 of 69

15

sensor stream

size

shape

object-

relationships

classification

of

outcome

16 of 69

16

Grasped Object: O_G

Target Object: O_T

17 of 69

17

What is the observed outcome?

O_G ON O_T ?

O_G IN O_T ?

Y

N

Y

N

18 of 69

18

Classify this outcome using egocentric RGBD sensor modalities.

O_G ON O_T ?

O_G IN O_T ?

Y

N

Y

N

19 of 69

19

Our Domain: The YCB Objects

20 of 69

20

Our Domain: The YCB Objects

21 of 69

21

Our Domain: The YCB Objects

22 of 69

22

Our Domain: The YCB Objects

23 of 69

23

( O_G , O_T )

O_G ON O_T ?

O_G IN O_T ?

Dataset format:

Input: object pair

Output: GT labels

24 of 69

24

( , )

O_G ON O_T ? YES

O_G IN O_T ? NO

Dataset format:

Input: object pair

Output: GT labels

25 of 69

25

Robot Pairs

195 object pairs

X 5 trials each

= 955 examples

26 of 69

26

Robot Pairs

195 object pairs

X 5 trials each

= 955 examples

> 50 operator hours for this dataset!

27 of 69

27

Auxiliary Data from Human Judgement

28 of 69

28

Front, Back, Topdown, Left, Right

Auxiliary Data from Human Judgement

29 of 69

29

Auxiliary Data from Human Judgement

30 of 69

30

on?

in?

Auxiliary Data from Human Judgement

31 of 69

31

on?

in?

yes

no

Auxiliary Data from Human Judgement

32 of 69

32

on?

in?

on?

in?

yes

no

Auxiliary Data from Human Judgement

33 of 69

33

on?

in?

on?

in?

yes

no

yes

no

Auxiliary Data from Human Judgement

34 of 69

34

on?

in?

on?

in?

yes

no

yes

no

>3 annotations per object pair

Auxiliary Data from Human Judgement

35 of 69

35

on?

in?

on?

in?

yes

no

yes

no

All Pairs vs. Robot Pairs

Auxiliary Data from Human Judgement

36 of 69

36

“long yellow food”

“curved fruit”

“portable tasty snack”

Auxiliary Data from Human Judgement

37 of 69

37

9 referring expressions per object

Auxiliary Data from Human Judgement

“long yellow food”

“curved fruit”

“portable tasty snack”

38 of 69

38

Models

39 of 69

39

Accuracy on Test Fold

Baseline (majority class) :

Baseline (random) :

IN

.32 ± .00

.49 ± .06

ON

.36 ± .00

.50 ± .06

40 of 69

40

Egocentric RGBD

sensor stream baseline

41 of 69

41

Egocentric RGBD

42 of 69

42

Egocentric RGBD

43 of 69

43

Egocentric RGBD

44 of 69

44

Egocentric RGBD

45 of 69

45

Egocentric RGBD

46 of 69

46

Accuracy on Test Fold

Baseline (majority class) :

Baseline (random) :

Egocentric RGBD :

IN

.32 ± .00

.49 ± .06

.77 ± .05

ON

.36 ± .00

.50 ± .06

.53 ± .10

47 of 69

47

RGBD + Static Object Data

48 of 69

48

RGBD + Static Object Data

49 of 69

49

RGBD + Static Object Data

50 of 69

50

RGBD + Static Object Data

51 of 69

51

RGBD + Static Object Data

52 of 69

52

RGBD + Static Object Data

53 of 69

53

RGBD + Static Object Data

54 of 69

54

RGBD + Static Object Data

55 of 69

55

56 of 69

56

Ego Classification: On? NO

57 of 69

57

Ego Classification: On? NO

Ego + Obj Data Classification: On? YES

58 of 69

58

Accuracy on Test Fold

Baseline (majority class) :

Baseline (random) :

Ego RGBD :

Ego RGBD + Object Data :

IN

.32 ± .00

.49 ± .06

.77 ± .05

.74 ± .07

ON

.36 ± .00

.50 ± .06

.53 ± .10

.59 ± .08

59 of 69

59

RGBD + Static Object Data

Pre-Trained on

‘All Pairs’

1

60 of 69

60

RGBD + Static Object Data

Then trained on ‘Robot Pairs’

2

61 of 69

61

62 of 69

62

Ego Classification: In? NO

63 of 69

63

Ego Classification: In? NO

Ego + Pretrained Obj: In? YES

64 of 69

64

Accuracy on Test Fold

Baseline (majority class) :

Baseline (random) :

Ego RGBD :

Ego RGBD + Object Data :

Ego RGBD + Pre-trained Obj :

IN

.32 ± .00

.49 ± .06

.77 ± .05

.74 ± .07

.77 ± .05

ON

.36 ± .00

.50 ± .06

.53 ± .10

.59 ± .08

.59 ± .06

65 of 69

65

In summary...

66 of 69

66

+ object data

67 of 69

67

+ object data

68 of 69

68

+ object data

69 of 69

Improving Robot Success Detection using

Static Object Data

Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa

69

Data + Code Repository: https://github.com/thomason-jesse/YCBLanguage