1 of 27

Predictive and Generative Neural Networks for Object Functionality

Ruizhen Hu, Zihao Yan, Jingwen Zhang, Oliver van Kaick, Ariel Shamir, Hao Zhang, Hui Huang

CMPT 895

Aryan Mikaeili

2 of 27

Motivation

  • Predict functionality of 3D objects
  • Generate and synthesize interaction context for 3D objects
    • Based on functionality
    • Like humans can do
  • Do all these in a data-driven, end-to-end fashion

1

Introduction|Method|Results|Conclusion

3 of 27

Introduction

  • Three separate networks
    • fSIM network: similarity based functionality prediction
    • iGEN-Net: generating interaction context given a 3D object and a label
    • iSEG-Net: Segments a scene based on their interactions

2

Introduction|Method|Results|Conclusion

4 of 27

Functionality prediction

  • Map scenes and objects to a latent functionality space
    • Use this encoding for classification/retrieval
    • Easy for scenes
      • Functionality of the central object determined by the context
      • Use a simple encoder
    • Hard for objects
      • Object can have multiple functionalities
      • We want to predict a distribution for each object

3

Functionality space

Introduction|Method|Results|Conclusion

5 of 27

Functionality prediction (cont’d)

  •  

4

Introduction|Method|Results|Conclusion

6 of 27

Functionality prediction (cont’d)

  •  

5

sitting

pushing

Introduction|Method|Results|Conclusion

7 of 27

Fsim-Net

  •  

6

 

Introduction|Method|Results|Conclusion

8 of 27

Fsim-Net (cont’d)

  • There can be also a scene-to-object direction
    • The former can be used for scoring scenes based on a query object
    • The latter can be used for scoring objects based on a query scene

7

 

Introduction|Method|Results|Conclusion

9 of 27

Fsim-Net (cont’d)

  •  

8

Introduction|Method|Results|Conclusion

10 of 27

iGen-Net: generating context for isolated objects

9

Introduction|Method|Results|Conclusion

11 of 27

iGen-Net: generating context for isolated objects(cont’d)

10

Introduction|Method|Results|Conclusion

  • Inputs
    • an isolated object
    • a label to determine the context

12 of 27

iGen-Net: generating context for isolated objects(cont’d)

11

Introduction|Method|Results|Conclusion

  • Encode both inputs separately – finally concatenate

13 of 27

iGen-Net: generating context for isolated objects(cont’d)

12

Introduction|Method|Results|Conclusion

  • Use a decoder to generate context

14 of 27

iGen-Net: generating context for isolated objects(cont’d)

13

Introduction|Method|Results|Conclusion

  • Use a spatial transformer to scale/translate object

15 of 27

iGen-Net: generating context for isolated objects(cont’d)

14

Introduction|Method|Results|Conclusion

  • Generate scene
  • Loss
    • cross-entropy between training and synthesized voxels
    • L2 norm of the scaling-translation values

16 of 27

iGen-Net: generating context for isolated objects - results

15

Introduction|Method|Results|Conclusion

17 of 27

iSEG-Net: interaction based segmentation

  • Input
    • Scene including central object and surrounding context
    • Functionality label (why?)
  • Output
    • Type of interaction with central object for the context voxels
    • 18 types of interaction including Supported, Supporting, on the side, typing, holding, etc

16

Introduction|Method|Results|Conclusion

18 of 27

iSEG-Net: interaction based segmentation (cont’d)

17

Introduction|Method|Results|Conclusion

19 of 27

iSEG-Net: interaction based segmentation – Post processing

  • Segmentation smoothing
  • Remove small connected components

  • Scene refinement
    • Outputs of these processes are voxels
    • Nice to have high resolution meshes
    • Find closest meshes in the dataset for each connected component
      • Distance based on L2 distance in feature space of a classifier

18

Introduction|Method|Results|Conclusion

20 of 27

iSEG-Net: interaction based segmentation – Final result

19

Introduction|Method|Results|Conclusion

21 of 27

Results of fSIM-Net for retrieval

20

Introduction|Method|Results|Conclusion

22 of 27

Results of fSIM-Net for retrieval – Comparison with prior work

  • Scene to object direction
    • Compared with ICON

21

Introduction|Method|Results|Conclusion

23 of 27

Results of fSIM-Net for retrieval – Comparison with prior work (cont’d)

  • Scene to scene direction
    • Compared with ICON, Siamese and triplet networks
      • Siamese: feed a pair of data to network make positive pairs close in the latent space, make the negative pairs far
      • Triplets: Similar to Siamese, uses triplets of two positive and one negative data

22

Introduction|Method|Results|Conclusion

24 of 27

Results of fSIM-Net for Classification

  •  

23

Introduction|Method|Results|Conclusion

25 of 27

Results of iGEN-Net

24

Introduction|Method|Results|Conclusion

26 of 27

Results of iSEG-Net

25

Introduction|Method|Results|Conclusion

27 of 27

Conclusion

  • First attempt of functional analysis using deep learning
    • Understand -> generate -> refine
    • Comparable or better than previous hand-crafted methods with less pre-processing requirement
  • Limitations
    • Generation lacks complexity and diversity

26

Introduction|Method|Results|Conclusion