1 of 51

3D-R N

Christopher B. Choy, Danfei Xu*, JunYoung Gwak*, Kevin Chen, Silvio Savarese

A unified approach for single and multi-view 3D object reconstruction

2

Computational Vision

& Geometry Lab

2 of 51

3D Reconstruction

Navigation
Robot interaction, manipulation
3D object prototyping
3D printing

Computational Vision

& Geometry Lab

3 of 51

3D Reconstruction

Depth based method [Eigen et al., Saxena et al., etc]
Model based methods [Kar et al., Aubry et al., Choy et al., etc]
Structure from Motion (SfM) [Haming et al., Fuentes-Pacheco et al.]
Multi view Stereo [Seitz et al., Anwar et al., etc]

Computational Vision

& Geometry Lab

4 of 51

Reconstruction

Lambertian and non-uniform albedo

Non-reflective
Rich of non-homogeneous textures

Dense viewpoints (small baseline)

Computational Vision

& Geometry Lab

However, the reconstruction methods works only when several assumptions hold.

First, the viewpoint change should not be too drastic. For instance, if viewpoint change is above a certain point, the reconstruction would fail.

Second, it has to have Lambertian lighting which means non-reflective and also non-uniform albedo. Basically, shinny objects with no texuter like this car would be difficult to reconstruct.

Rapid and automatic 3D object prototyping has become a game-changing innovation in many applications related to e-commerce, visualization, and architecture, to name a few. This trend has been boosted now that 3D printing is a democratized technology and 3D acquisition methods are accurate and efficient~\cite{choi2016large}.

Moreover, the trend is also coupled with the diffusion of large scale repositories of 3D object models such as ShapeNet~\cite{shapenet}.

Script:

In this project, we propose a model called the 3D recurrent reconstruction neural network, or 3D-R2N2. Our network unifies both single- and multi-view reconstruction from an arbitrary number of input RGB images, which can be useful for a variety of applications including robotics, visualization, and detection. The output of the network is a voxel grid of probabilities which can be thresholded to produce a binary occupancy grid.

5 of 51

Reconstruction

without assumptions

Computational Vision

& Geometry Lab

6 of 51

Reconstruction

Shape Prior / Data-driven method
Single View Reconstruction (Kar et al., Aubry et al., etc)

Multi View Reconstruction (Bao et al.)

General lighting condition

without assumptions

Localize parts
Boundaries

Dense viewpoints

Computational Vision

& Geometry Lab

7 of 51

3D-Recurrent Reconstruction Neural Network

Data Driven Reconstruction

[Saxena et al., Hoiem et al., Vincente et al., Kar et al.]

Recurrent Neural Network

Sequence of images
Probabilistic voxel occupancy map

3D-Convolutional LSTM

Locality
Regularize connection
Selective Update

Computational Vision

& Geometry Lab

8 of 51

Recurrent Neural Network

[Christopher Olah] Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Computational Vision

& Geometry Lab

9 of 51

Long Short Term Memory

[Christopher Olah] Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Computational Vision

& Geometry Lab

10 of 51

Computational Vision

& Geometry Lab

11 of 51

3D Convolutional LSTM

Computational Vision

& Geometry Lab

12 of 51

Computational Vision

& Geometry Lab

13 of 51

Computational Vision

& Geometry Lab

14 of 51

Computational Vision

& Geometry Lab

15 of 51

Computational Vision

& Geometry Lab

16 of 51

3D Convolutional LSTM

Left front top

Computational Vision

& Geometry Lab

17 of 51

3D Convolutional LSTM

Locality
Regularize connection
Selective Update

Computational Vision

& Geometry Lab

18 of 51

3D Convolutional LSTM

Locality
Regularize connection
Selective Update

Computational Vision

& Geometry Lab

19 of 51

3D Convolutional LSTM

Front/Side

Back

Computational Vision

& Geometry Lab

20 of 51

Computational Vision

& Geometry Lab

21 of 51

Computational Vision

& Geometry Lab

22 of 51

Training

ShapeNet

50k CAD models
Render from arbitrary views
Random number of images w/ random order
Random background, translation

Voxel-wise cross entropy loss

Computational Vision

& Geometry Lab

23 of 51

Experiment I: Multi View Stereo vs Ours

Experiment II: Ebay Image Reconstruction

Experiment III: PASCAL 3D Single-View Reconstruction

Experiment IV: ShapeNet Multi-View Reconstruction

Computational Vision

& Geometry Lab

24 of 51

Experiments

Experiment I: Multi View Stereo vs Ours

Computational Vision

& Geometry Lab

25 of 51

SfM Limitations

Texture level
Number of views

Computational Vision

& Geometry Lab

26 of 51

Computational Vision

& Geometry Lab

27 of 51

Computational Vision

& Geometry Lab

28 of 51

Resolution

Input
Output

MVS

Ours

20

30

40

Computational Vision

& Geometry Lab

29 of 51

Experiment II: Ebay Image Reconstruction

Computational Vision

& Geometry Lab

30 of 51

Computational Vision

& Geometry Lab

31 of 51

Computational Vision

& Geometry Lab

32 of 51

Computational Vision

& Geometry Lab

33 of 51

Experiment III: PASCAL 3D Single-View Reconstruction

Computational Vision

& Geometry Lab

34 of 51

Training

PASCAL 3D+

Augmented PASCAL images with 3D CAD models
3D Intersection over Union

Computational Vision

& Geometry Lab

35 of 51

Computational Vision

& Geometry Lab

36 of 51

Computational Vision

& Geometry Lab

37 of 51

Computational Vision

& Geometry Lab

38 of 51

Computational Vision

& Geometry Lab

39 of 51

Experiment IV: ShapeNet Multi-View Reconstruction

Computational Vision

& Geometry Lab

40 of 51

Computational Vision

& Geometry Lab

41 of 51

Computational Vision

& Geometry Lab

42 of 51

Computational Vision

& Geometry Lab

43 of 51

Thank you

Computational Vision

& Geometry Lab

44 of 51

More details: http://arxiv.org/abs/1604.00449

Code: https://github.com/chrischoy/3D-R2N2

Computational Vision

& Geometry Lab

45 of 51

Computational Vision

& Geometry Lab

46 of 51

3D Convolutional LSTM

Computational Vision

& Geometry Lab

47 of 51

Results

Multi-view stereo tends to fail when

Viewpoints are sparsely positioned
Objects are textureless
Not enough views

Fig. 8 from paper

Computational Vision

& Geometry Lab

48 of 51

Computational Vision

& Geometry Lab

49 of 51

Computational Vision

& Geometry Lab

50 of 51

Computational Vision

& Geometry Lab

51 of 51

3D-Convolutional LSTM

4D tensor
No output gate
3D Convolution

Computational Vision

& Geometry Lab