1 of 27

Predictive and Generative Neural Networks for Object Functionality

Ruizhen Hu, Zihao Yan, Jingwen Zhang, Oliver van Kaick, Ariel Shamir, Hao Zhang, Hui Huang

CMPT 895

Aryan Mikaeili

2 of 27

Motivation

Predict functionality of 3D objects
Generate and synthesize interaction context for 3D objects

Based on functionality
Like humans can do

Do all these in a data-driven, end-to-end fashion

Introduction|Method|Results|Conclusion

3 of 27

Introduction

Three separate networks

fSIM network: similarity based functionality prediction
iGEN-Net: generating interaction context given a 3D object and a label
iSEG-Net: Segments a scene based on their interactions

Introduction|Method|Results|Conclusion

4 of 27

Functionality prediction

Map scenes and objects to a latent functionality space

Use this encoding for classification/retrieval
Easy for scenes

Functionality of the central object determined by the context
Use a simple encoder

Hard for objects

Object can have multiple functionalities
We want to predict a distribution for each object

Functionality space

Introduction|Method|Results|Conclusion

5 of 27

Functionality prediction (cont’d)

Introduction|Method|Results|Conclusion

6 of 27

Functionality prediction (cont’d)

sitting

pushing

Introduction|Method|Results|Conclusion

7 of 27

Fsim-Net

Introduction|Method|Results|Conclusion

8 of 27

Fsim-Net (cont’d)

There can be also a scene-to-object direction

The former can be used for scoring scenes based on a query object
The latter can be used for scoring objects based on a query scene

Introduction|Method|Results|Conclusion

9 of 27

Fsim-Net (cont’d)

Introduction|Method|Results|Conclusion

10 of 27

iGen-Net: generating context for isolated objects

Introduction|Method|Results|Conclusion

11 of 27

iGen-Net: generating context for isolated objects(cont’d)

Introduction|Method|Results|Conclusion

Inputs

an isolated object
a label to determine the context

12 of 27

iGen-Net: generating context for isolated objects(cont’d)

Introduction|Method|Results|Conclusion

Encode both inputs separately – finally concatenate

13 of 27

iGen-Net: generating context for isolated objects(cont’d)

Introduction|Method|Results|Conclusion

Use a decoder to generate context

14 of 27

iGen-Net: generating context for isolated objects(cont’d)

Introduction|Method|Results|Conclusion

Use a spatial transformer to scale/translate object

15 of 27

iGen-Net: generating context for isolated objects(cont’d)

Introduction|Method|Results|Conclusion

Generate scene
Loss

cross-entropy between training and synthesized voxels
L2 norm of the scaling-translation values

16 of 27

iGen-Net: generating context for isolated objects - results

Introduction|Method|Results|Conclusion

17 of 27

iSEG-Net: interaction based segmentation

Input

Scene including central object and surrounding context
Functionality label (why?)

Output

Type of interaction with central object for the context voxels
18 types of interaction including Supported, Supporting, on the side, typing, holding, etc

Introduction|Method|Results|Conclusion

18 of 27

iSEG-Net: interaction based segmentation (cont’d)

Introduction|Method|Results|Conclusion

19 of 27

iSEG-Net: interaction based segmentation – Post processing

Segmentation smoothing
Remove small connected components

Scene refinement

Outputs of these processes are voxels
Nice to have high resolution meshes
Find closest meshes in the dataset for each connected component

Distance based on L2 distance in feature space of a classifier

Introduction|Method|Results|Conclusion

20 of 27

iSEG-Net: interaction based segmentation – Final result

Introduction|Method|Results|Conclusion

21 of 27

Results of fSIM-Net for retrieval

Introduction|Method|Results|Conclusion

22 of 27

Results of fSIM-Net for retrieval – Comparison with prior work

Scene to object direction

Compared with ICON

Introduction|Method|Results|Conclusion

23 of 27

Results of fSIM-Net for retrieval – Comparison with prior work (cont’d)

Scene to scene direction

Compared with ICON, Siamese and triplet networks

Siamese: feed a pair of data to network make positive pairs close in the latent space, make the negative pairs far
Triplets: Similar to Siamese, uses triplets of two positive and one negative data

Introduction|Method|Results|Conclusion

24 of 27

Results of fSIM-Net for Classification

Introduction|Method|Results|Conclusion

25 of 27

Results of iGEN-Net

Introduction|Method|Results|Conclusion

26 of 27

Results of iSEG-Net

Introduction|Method|Results|Conclusion

27 of 27

Conclusion

First attempt of functional analysis using deep learning

Understand -> generate -> refine
Comparable or better than previous hand-crafted methods with less pre-processing requirement

Limitations

Generation lacks complexity and diversity

Introduction|Method|Results|Conclusion