1 of 60

History of Neural Radiance Fields

CS 598 LAZ

Albert Zhai, Yilin Yang, Tianhang Cheng

2 of 60

Popularity of Neural Radiance Field (NeRF)

3 of 60

Popularity of Neural Radiance Field (NeRF)

  • Currently one of the hottest topics in computer vision
  • Over 500 arXiv papers in the last 12 months:

4 of 60

NeRF: a method for multi-view 3D reconstruction

  • Reconstruct the 3D shape of a scene given multiple images
  • Many different options for 3D shape representation
    • Point clouds, voxels, meshes, etc.
  • A multi-view reconstruction method needs to choose a representation and provide an algorithm for estimating it from images

https://miro.medium.com/v2/resize:fit:1400/0*l233ieenj80ogEOa.png

5 of 60

History (before NeRF)

  • 3D Reconstruction From Multiple Views
    • Early Photography and Photosculpture 1850
    • visual hull
    • space carving
  • Lightfield
    • Plenoptic function
    • Lightfield Rendering
    • Lightfield Camera

6 of 60

3D Reconstruction From Multiple Views —— Early Photography and Photosculpture 1850 �

Photosculpture, a mechanical 19th century NeRF

The resulting raw sculpture made of wood slices

Image: https://neuralradiancefields.io/history-of-neural-radiance-fields/

7 of 60

3D Reconstruction From Multiple Views—— Early Photography and Photosculpture 1850 �

8 of 60

3D Reconstruction From Multiple Views —— Triangulation

9 of 60

3D Reconstruction From Multiple Views —— visual hull

Figure source: JC Perez-Cortes et al. A System for In-Line 3D Inspection without Hidden Surfaces Sensors, 2018

Silhouette cone

Silhouette

The intersection is the visual hull.

10 of 60

3D Reconstruction From Multiple Views —— space carving

K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999

Imagine the object is inside a volume….

Algorithm:

  1. Initialize a volume.

  • Pick up a voxel on the surface.

  • Project it into every plane. If it is not inside silhouettes in all views, get rid of it.

  • Repeat 2,3 until converge.

11 of 60

Lightfield�

Image: https://medium.com/@dc.aihub/3d-reconstruction-with-stereo-images-part-1-camera-calibration-d86f750a1ade

Maybe we don’t actually need to do the reconstruction…

12 of 60

Lightfield —— The Plenoptic Function�

The 8 dimensional full plenoptic describes light transport as light as waves

Proposed by: Gabriel Lippmann

Image: https://neuralradiancefields.io/history-of-neural-radiance-fields/

Cons:

  • 8D: Too much data to capture

  • Hard to track

13 of 60

Lightfield —— The Plenoptic Function�

5D Plenoptic Function

Figure by Leonard McMillan

14 of 60

Lightfield —— Two-plane light fields��

4D light field representation

M. Levoy and P. Hanrahan. Light field rendering. SIGGRAPH 1996

Nonplanar planes

camera plane

Image plane

15 of 60

Lightfield —— Light field Rendering��

M. Levoy and P. Hanrahan. Light field rendering. SIGGRAPH 1996

Camera plane

Image plane

16 of 60

Lightfield —— Light field Rendering��

Novel view rendering

17 of 60

Lightfield —— Light field Rendering��

18 of 60

Lightfield —— Lightfield Camera��

Lightfield Camera

Conventional Camera

19 of 60

Lightfield —— Lightfield Camera��

20 of 60

Lightfield —— Lightfield Camera��

21 of 60

Lightfield —— Lightfield Camera��

22 of 60

What is a Neural Radiance Field (NeRF)?

  • A 3D scene representation that models geometry and appearance

23 of 60

What is a Neural Radiance Field (NeRF)?

  • A 3D scene representation that models geometry and appearance
  • 5D continuous function that maps spatial location and viewing direction to density and color
    • Parameterized by a neural network (e.g. MLP)
    • Flexible, can represent arbitrary topology and resolution

24 of 60

Fitting a NeRF to images via differentiable rendering

  • Rendering: the process of generating an image from a scene representation

3D Scene Representation (NeRF)

Rendering

25 of 60

Fitting a NeRF to images via differentiable rendering

  • Rendering: the process of generating an image from a scene representation
  • If the rendering function is differentiable, then we can define a loss function on the rendering outputs and minimize it w.r.t. our scene representation using gradient descent
    • Test-time optimization

3D Scene Representation (NeRF)

Rendering

Rendering Loss Minimization

26 of 60

Fitting a NeRF to images via differentiable rendering

  • Rendering: volume rendering
  • Loss: squared error between rendered and true pixel colors

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

27 of 60

Volume Rendering

  • Designed in 1980s to render clouds, fog, flames, etc.
  • Estimate expected color of a pixel by sampling colors along ray
  • For ray 𝐫(𝑡) = 𝐨 + 𝑡𝐝:

𝛼𝑖 is the probability that there is a particle in segment i

𝑇𝑖 is the probability that there are no blocking particles

28 of 60

Volume Rendering

  • Designed in 1980s to render clouds, fog, flames, etc.
  • Estimate expected color of a pixel by sampling colors along ray
  • For ray 𝐫(𝑡) = 𝐨 + 𝑡𝐝:

𝛼𝑖 is the probability that there is a particle in segment i

𝑇𝑖 is the probability that there are no blocking particles

29 of 60

Results: Novel View Synthesis and Depth Estimation

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

30 of 60

Results: Novel View Synthesis and Depth Estimation

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

31 of 60

Summary: What is NeRF?

  • 5D continuous function that maps spatial location and viewing direction to density and color
    • Represents scene geometry and appearance
  • Can be fit to a set of posed images using differentiable volume rendering
  • High quality novel view synthesis and depth estimation

32 of 60

Was NeRF (ECCV ‘20) the first to use these ideas?

Answer: No

Occupancy Networks

(CVPR ‘19)

DeepSDF (CVPR ‘19)

Differentiable Volumetric Rendering (CVPR ‘19)

Neural Implicit Representations

Differentiable Rendering of Implicit Representations

Scene Representation Networks (NeurIPS ‘19)

Mescheder et al., Occupancy networks: Learning 3D reconstruction in function space, CVPR 2019

Park et al., DeepSDF: Learning continuous signed distance functions for shape representation, CVPR 2019

Niemeyer et al., Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision, CVPR 2019

Sitzmann et al., Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representation, NeurIPS 2019

33 of 60

NeRF is slow

Training: ~1 day

Inference: ~1 minutes

34 of 60

NeRF is ambiguous

35 of 60

Improve Speed and Quality

  • Use different 3D Representation
    • SDF, Voxel, Grid
    • Level of Detail
  • Use Pretrained network to reduce ambiguity
    • diffusion model
    • normal/depth network

36 of 60

Different Representation —— SDF

NeRF: MV-images -> density, color

NeuS: MV-images -> SDF -> density, color

SDF gives

shortest distance to surface

37 of 60

Different Representation —— SDF

38 of 60

Different Representation —— Voxel

NeRF: MV-images -> density, color

DVGO: MV-images -> query feature from voxel -> density, color

39 of 60

Different Representation —— Voxel

NeRF: MV-images -> density, color

DVGO: MV-images -> query feature from voxel -> density, color

40 of 60

Different Representation —— Grid

NeRF: MV-images -> density, color

TensoRF: MV-images -> query feature from grid -> density, color

41 of 60

Different Representation —— Grid

NeRF: coordinate -> density, color

TensoRF: MV-images -> query feature from grid -> density, color

42 of 60

Different Representation —— LoD

NeRF: MV-images -> density, color

Neuralangelo: MV-images -> query feature from different level -> density, color

43 of 60

Different Representation —— LoD

44 of 60

Different Representation —— LoD

45 of 60

Different Representation —— LoD

46 of 60

Pretrained Network —— Depth / Normal

NeRF: MV-images -> density, color

MonoSDF: MV-images -> density, color, depth, normal

https://niujinshuchong.github.io/monosdf/

47 of 60

Pretrained Network —— Diffusion model

Single Image + relative pose (R, T) -> Novel view Image

48 of 60

Pretrained Network —— Diffusion model

49 of 60

Expand the application of vanilla NeRF

  • Time-variant scenes
  • Semantics
  • Anime Head
  • BSDF texture
  • Physical Property
  • Robot Action
  • Hide message

50 of 60

New application —— Dynamic/deformable NeRF

NeRF: MV-images -> density, color

D-NeRF: MV-video + time -> displacement -> density, color

Learn a time-dependent displacement field

Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf

51 of 60

New application —— Dynamic/deformable NeRF

Learn a time-dependent displacement field

Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf

52 of 60

New application —— Semantics

NeRF: MV-images-> RGBA

Semantics NeRF: MV-semantics -> semantics vector -> semantics

Zhi, S., Laidlow, T., Leutenegger, S., & Davison, A. J. (2021). In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 15838-15847).

53 of 60

New application —— Anime Head

NeRF: MV-images-> RGBA

PAniC-3D: Single image -> anime head model

Chen, S., Zhang, K., Shi, Y., Wang, H., Zhu, Y., Song, G., ... & Zwicker, M. (2023). PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21068-21077).

54 of 60

New application —— complex texture

NeRF: MV-images -> RGBA

InvRender: MV-images -> diffuse, roughness, metallic

Zhang, Y., Sun, J., He, X., Fu, H., Jia, R., & Zhou, X. (2022). Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18643-18652),

55 of 60

New application —— Edit with Language

Clip-NeRF: Text + MV-images -> edited NeRF

Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2022). Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3835-3844), https://arxiv.org/abs/2112.05139

56 of 60

New application —— Physical Property

NeRF: MV-images -> RGBA

PAC-NeRF: MV-images -> Elasticity, hardness, viscosity, etc.

Li, X., Qiao, Y. L., Chen, P. Y., Jatavallabhula, K. M., Lin, M., Jiang, C., & Gan, C. (2023). PAC-neRF: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. arXiv preprint arXiv:2303.05512.

57 of 60

New application —— Robot Action

Multiview-Video -> state feature -> robot action

Li, Y., Li, S., Sitzmann, V., Agrawal, P., & Torralba, A. (2022, January). 3d neural scene representations for visuomotor control. In Conference on Robot Learning (pp. 112-123). PMLR.

58 of 60

New application —— Hide message

CopyRNeRF: Multiview-Image + message -> NeRF

Luo, Z., Guo, Q., Cheung, K. C., See, S., & Wan, R. (2023). CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields. arXiv preprint arXiv:2307.11526.

59 of 60

Discussion: will NeRF stand the test of time?

Advantages:

  • Highest quality 3D geometry and novel view synthesis to date
  • Leverages GPU compute
  • Progress has been rapid so far

Disadvantages:

  • Training is slow
  • Requires dense capture of scene
  • Not supported by current graphics pipelines

60 of 60

References

  1. K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999, https://www.cs.toronto.edu/~kyros/pubs/00.ijcv.carve.pdf
  2. Laurentini et al., The visual hull concept for silhouette-based image understanding, IEEE 1994, https://ieeexplore.ieee.org/document/273735
  3. M. Levoy and P. Hanrahan, Light field rendering, SIGGRAPH 1996, https://graphics.stanford.edu/papers/light/
  4. Mescheder et al., Occupancy networks: Learning 3D reconstruction in function space, CVPR 2019, https://arxiv.org/pdf/1812.03828.pdf
  5. Park et al., DeepSDF: Learning continuous signed distance functions for shape representation, CVPR 2019, https://arxiv.org/pdf/1901.05103.pdf
  6. Niemeyer et al., Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision, CVPR 2019, https://arxiv.org/pdf/1912.07372.pdf
  7. Sitzmann et al., Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representation, NeurIPS 2019, https://arxiv.org/pdf/1906.01618.pdf
  8. Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020, https://arxiv.org/pdf/2003.08934.pdf
  9. Wang et al., NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction, NeurIPS 2021, https://arxiv.org/pdf/2106.10689.pdf
  10. Sun et al., Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction, CVPR 2022, https://arxiv.org/pdf/2111.11215.pdf
  11. Chen et al., TensorRF: Tensorial Radiance Fields, ECCV 2022, https://arxiv.org/pdf/2203.09517.pdf
  12. Li et al., Neuralangelo: High-Fidelity Neural Surface Reconstruction, CVPR 2023, https://arxiv.org/pdf/2306.03092.pdf
  13. Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf
  14. Li et al., PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification, ICLR 2023, https://arxiv.org/pdf/2303.05512.pdf
  15. Luo et al., CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields, ICCV 2023, https://arxiv.org/pdf/2307.11526.pdf
  16. Zhi et al., In-Place Scene Labelling and Understanding with Implicit Scene Representation, ICCV 2021, https://arxiv.org/pdf/2103.15875.pdf
  17. Yu, Z., Peng, S., Niemeyer, M., Sattler, T., & Geiger, A. (2022). Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35, 25018-25032, https://niujinshuchong.github.io/monosdf/
  18. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., & Vondrick, C. (2023). Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, https://arxiv.org/abs/2303.11328
  19. Chen, S., Zhang, K., Shi, Y., Wang, H., Zhu, Y., Song, G., ... & Zwicker, M. (2023). PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21068-21077), https://arxiv.org/pdf/2303.14587.pdf
  20. Zhang, Y., Sun, J., He, X., Fu, H., Jia, R., & Zhou, X. (2022). Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18643-18652), https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_Modeling_Indirect_Illumination_for_Inverse_Rendering_CVPR_2022_paper.pdf
  21. Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2022). Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3835-3844), https://arxiv.org/abs/2112.05139
  22. Li, Y., Li, S., Sitzmann, V., Agrawal, P., & Torralba, A. (2022, January). 3d neural scene representations for visuomotor control. In Conference on Robot Learning (pp. 112-123). PMLR.