1 of 60

History of Neural Radiance Fields

CS 598 LAZ

Albert Zhai, Yilin Yang, Tianhang Cheng

2 of 60

Popularity of Neural Radiance Field (NeRF)

3 of 60

Popularity of Neural Radiance Field (NeRF)

Currently one of the hottest topics in computer vision
Over 500 arXiv papers in the last 12 months:

4 of 60

NeRF: a method for multi-view 3D reconstruction

Reconstruct the 3D shape of a scene given multiple images
Many different options for 3D shape representation

Point clouds, voxels, meshes, etc.

A multi-view reconstruction method needs to choose a representation and provide an algorithm for estimating it from images

https://miro.medium.com/v2/resize:fit:1400/0*l233ieenj80ogEOa.png

5 of 60

History (before NeRF)

3D Reconstruction From Multiple Views

Early Photography and Photosculpture 1850
visual hull
space carving

Lightfield

Plenoptic function
Lightfield Rendering
Lightfield Camera

6 of 60

3D Reconstruction From Multiple Views —— Early Photography and Photosculpture 1850 �

Photosculpture, a mechanical 19th century NeRF

The resulting raw sculpture made of wood slices

Image: https://neuralradiancefields.io/history-of-neural-radiance-fields/

7 of 60

https://www.youtube.com/watch?v=jS_rcwG9mxU

3D Reconstruction From Multiple Views—— Early Photography and Photosculpture 1850 �

8 of 60

3D Reconstruction From Multiple Views —— Triangulation

Figure source: https://towardsai.net/p/machine-learning/introduction-14

9 of 60

3D Reconstruction From Multiple Views —— visual hull

Figure source: JC Perez-Cortes et al. A System for In-Line 3D Inspection without Hidden Surfaces Sensors, 2018

A. Laurentini, The visual hull concept for silhouette-based image understanding TPAMI 1994

Silhouette cone

Silhouette

The intersection is the visual hull.

10 of 60

3D Reconstruction From Multiple Views —— space carving

K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999

Imagine the object is inside a volume….

Algorithm:

Initialize a volume.

Pick up a voxel on the surface.

Project it into every plane. If it is not inside silhouettes in all views, get rid of it.

Repeat 2,3 until converge.

11 of 60

Lightfield�

Image: https://medium.com/@dc.aihub/3d-reconstruction-with-stereo-images-part-1-camera-calibration-d86f750a1ade

Maybe we don’t actually need to do the reconstruction…

12 of 60

Lightfield —— The Plenoptic Function�

The 8 dimensional full plenoptic describes light transport as light as waves

Proposed by: Gabriel Lippmann

Image: https://neuralradiancefields.io/history-of-neural-radiance-fields/

Cons:

8D: Too much data to capture

Hard to track

13 of 60

Lightfield —— The Plenoptic Function�

5D Plenoptic Function

Figure by Leonard McMillan

14 of 60

Lightfield —— Two-plane light fields��

4D light field representation

M. Levoy and P. Hanrahan. Light field rendering. SIGGRAPH 1996

Nonplanar planes

camera plane

Image plane

15 of 60

Lightfield —— Light field Rendering��

M. Levoy and P. Hanrahan. Light field rendering. SIGGRAPH 1996

Camera plane

Image plane

16 of 60

Lightfield —— Light field Rendering��

Novel view rendering

17 of 60

Lightfield —— Light field Rendering��

Figure source: https://cs.brown.edu/courses/csci1290/labs/lab_lightfields/images/cameraarrays.pn

18 of 60

Lightfield —— Lightfield Camera��

Lightfield Camera

Figure source: M. Levoy https://graphics.stanford.edu/courses/cs178-13/lectures/lightfields-02may13.pdf

Conventional Camera

19 of 60

Lightfield —— Lightfield Camera��

Figure source: M. Levoy https://graphics.stanford.edu/courses/cs178-13/lectures/lightfields-02may13.pdf

20 of 60

Lightfield —— Lightfield Camera��

Figures source: M. Levoy https://graphics.stanford.edu/courses/cs178-13/lectures/lightfields-02may13.pdf

21 of 60

Lightfield —— Lightfield Camera��

Figure source: https://news.smugmug.com/what-is-depth-of-field-and-how-you-can-master-it-d82632ddf455

22 of 60

What is a Neural Radiance Field (NeRF)?

A 3D scene representation that models geometry and appearance

23 of 60

What is a Neural Radiance Field (NeRF)?

A 3D scene representation that models geometry and appearance
5D continuous function that maps spatial location and viewing direction to density and color

Parameterized by a neural network (e.g. MLP)
Flexible, can represent arbitrary topology and resolution

Image: https://www.matthewtancik.com/nerf

24 of 60

Fitting a NeRF to images via differentiable rendering

Rendering: the process of generating an image from a scene representation

Image: https://www.cs.cornell.edu/courses/cs5670/2022sp/lectures/lec22_nerf_for_web.pdf

3D Scene Representation (NeRF)

Rendering

25 of 60

Fitting a NeRF to images via differentiable rendering

Rendering: the process of generating an image from a scene representation
If the rendering function is differentiable, then we can define a loss function on the rendering outputs and minimize it w.r.t. our scene representation using gradient descent

Test-time optimization

Image: https://www.cs.cornell.edu/courses/cs5670/2022sp/lectures/lec22_nerf_for_web.pdf

3D Scene Representation (NeRF)

Rendering

Rendering Loss Minimization

26 of 60

Fitting a NeRF to images via differentiable rendering

Rendering: volume rendering
Loss: squared error between rendered and true pixel colors

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

27 of 60

Volume Rendering

Designed in 1980s to render clouds, fog, flames, etc.
Estimate expected color of a pixel by sampling colors along ray
For ray 𝐫(𝑡) = 𝐨 + 𝑡𝐝:

Image: https://www.cs.cornell.edu/courses/cs5670/2022sp/lectures/lec22_nerf_for_web.pdf

𝛼_𝑖 is the probability that there is a particle in segment i

𝑇_𝑖 is the probability that there are no blocking particles

28 of 60

Volume Rendering

Designed in 1980s to render clouds, fog, flames, etc.
Estimate expected color of a pixel by sampling colors along ray
For ray 𝐫(𝑡) = 𝐨 + 𝑡𝐝:

Image: https://www.cs.cornell.edu/courses/cs5670/2022sp/lectures/lec22_nerf_for_web.pdf

𝛼_𝑖 is the probability that there is a particle in segment i

𝑇_𝑖 is the probability that there are no blocking particles

29 of 60

Results: Novel View Synthesis and Depth Estimation

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

30 of 60

Results: Novel View Synthesis and Depth Estimation

Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020

31 of 60

Summary: What is NeRF?

5D continuous function that maps spatial location and viewing direction to density and color

Represents scene geometry and appearance

Can be fit to a set of posed images using differentiable volume rendering
High quality novel view synthesis and depth estimation

32 of 60

Was NeRF (ECCV ‘20) the first to use these ideas?

Answer: No

Occupancy Networks

(CVPR ‘19)

DeepSDF (CVPR ‘19)

Differentiable Volumetric Rendering (CVPR ‘19)

Neural Implicit Representations

Differentiable Rendering of Implicit Representations

Scene Representation Networks (NeurIPS ‘19)

Mescheder et al., Occupancy networks: Learning 3D reconstruction in function space, CVPR 2019

Park et al., DeepSDF: Learning continuous signed distance functions for shape representation, CVPR 2019

Niemeyer et al., Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision, CVPR 2019

Sitzmann et al., Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representation, NeurIPS 2019

33 of 60

NeRF is slow

Training: ~1 day

Inference: ~1 minutes

34 of 60

NeRF is ambiguous

35 of 60

Improve Speed and Quality

Use different 3D Representation

SDF, Voxel, Grid
Level of Detail

Use Pretrained network to reduce ambiguity

diffusion model
normal/depth network

36 of 60

Different Representation —— SDF

NeRF: MV-images -> density, color

NeuS: MV-images -> SDF -> density, color

SDF gives

shortest distance to surface

37 of 60

Different Representation —— SDF

38 of 60

Different Representation —— Voxel

NeRF: MV-images -> density, color

DVGO: MV-images -> query feature from voxel -> density, color

39 of 60

Different Representation —— Voxel

NeRF: MV-images -> density, color

DVGO: MV-images -> query feature from voxel -> density, color

40 of 60

Different Representation —— Grid

NeRF: MV-images -> density, color

TensoRF: MV-images -> query feature from grid -> density, color

41 of 60

Different Representation —— Grid

NeRF: coordinate -> density, color

TensoRF: MV-images -> query feature from grid -> density, color

42 of 60

Different Representation —— LoD

NeRF: MV-images -> density, color

Neuralangelo: MV-images -> query feature from different level -> density, color

43 of 60

Different Representation —— LoD

44 of 60

Different Representation —— LoD

45 of 60

Different Representation —— LoD

46 of 60

Pretrained Network —— Depth / Normal

NeRF: MV-images -> density, color

MonoSDF: MV-images -> density, color, depth, normal

https://niujinshuchong.github.io/monosdf/

47 of 60

Pretrained Network —— Diffusion model

Single Image + relative pose (R, T) -> Novel view Image

48 of 60

Pretrained Network —— Diffusion model

49 of 60

Expand the application of vanilla NeRF

Time-variant scenes
Semantics
Anime Head
BSDF texture
Physical Property
Robot Action
Hide message

50 of 60

New application —— Dynamic/deformable NeRF

NeRF: MV-images -> density, color

D-NeRF: MV-video + time -> displacement -> density, color

Learn a time-dependent displacement field

Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf

51 of 60

New application —— Dynamic/deformable NeRF

Learn a time-dependent displacement field

Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf

52 of 60

New application —— Semantics

NeRF: MV-images-> RGBA

Semantics NeRF: MV-semantics -> semantics vector -> semantics

Zhi, S., Laidlow, T., Leutenegger, S., & Davison, A. J. (2021). In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 15838-15847).

53 of 60

New application —— Anime Head

NeRF: MV-images-> RGBA

PAniC-3D: Single image -> anime head model

Chen, S., Zhang, K., Shi, Y., Wang, H., Zhu, Y., Song, G., ... & Zwicker, M. (2023). PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21068-21077).

54 of 60

New application —— complex texture

NeRF: MV-images -> RGBA

InvRender: MV-images -> diffuse, roughness, metallic

Zhang, Y., Sun, J., He, X., Fu, H., Jia, R., & Zhou, X. (2022). Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18643-18652),

55 of 60

New application —— Edit with Language

Clip-NeRF: Text + MV-images -> edited NeRF

Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2022). Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3835-3844), https://arxiv.org/abs/2112.05139

56 of 60

New application —— Physical Property

NeRF: MV-images -> RGBA

PAC-NeRF: MV-images -> Elasticity, hardness, viscosity, etc.

Li, X., Qiao, Y. L., Chen, P. Y., Jatavallabhula, K. M., Lin, M., Jiang, C., & Gan, C. (2023). PAC-neRF: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. arXiv preprint arXiv:2303.05512.

57 of 60

New application —— Robot Action

Multiview-Video -> state feature -> robot action

Li, Y., Li, S., Sitzmann, V., Agrawal, P., & Torralba, A. (2022, January). 3d neural scene representations for visuomotor control. In Conference on Robot Learning (pp. 112-123). PMLR.

58 of 60

New application —— Hide message

CopyRNeRF: Multiview-Image + message -> NeRF

Luo, Z., Guo, Q., Cheung, K. C., See, S., & Wan, R. (2023). CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields. arXiv preprint arXiv:2307.11526.

59 of 60

Discussion: will NeRF stand the test of time?

Advantages:

Highest quality 3D geometry and novel view synthesis to date
Leverages GPU compute
Progress has been rapid so far

Disadvantages:

Training is slow
Requires dense capture of scene
Not supported by current graphics pipelines

60 of 60

References

K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999, https://www.cs.toronto.edu/~kyros/pubs/00.ijcv.carve.pdf
Laurentini et al., The visual hull concept for silhouette-based image understanding, IEEE 1994, https://ieeexplore.ieee.org/document/273735
M. Levoy and P. Hanrahan, Light field rendering, SIGGRAPH 1996, https://graphics.stanford.edu/papers/light/
Mescheder et al., Occupancy networks: Learning 3D reconstruction in function space, CVPR 2019, https://arxiv.org/pdf/1812.03828.pdf
Park et al., DeepSDF: Learning continuous signed distance functions for shape representation, CVPR 2019, https://arxiv.org/pdf/1901.05103.pdf
Niemeyer et al., Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision, CVPR 2019, https://arxiv.org/pdf/1912.07372.pdf
Sitzmann et al., Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representation, NeurIPS 2019, https://arxiv.org/pdf/1906.01618.pdf
Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020, https://arxiv.org/pdf/2003.08934.pdf
Wang et al., NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction, NeurIPS 2021, https://arxiv.org/pdf/2106.10689.pdf
Sun et al., Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction, CVPR 2022, https://arxiv.org/pdf/2111.11215.pdf
Chen et al., TensorRF: Tensorial Radiance Fields, ECCV 2022, https://arxiv.org/pdf/2203.09517.pdf
Li et al., Neuralangelo: High-Fidelity Neural Surface Reconstruction, CVPR 2023, https://arxiv.org/pdf/2306.03092.pdf
Pumarola et al., D-NeRF: Neural Radiance Fields for Dynamic Scenes, CVPR 2021, https://arxiv.org/pdf/2011.13961.pdf
Li et al., PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification, ICLR 2023, https://arxiv.org/pdf/2303.05512.pdf
Luo et al., CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields, ICCV 2023, https://arxiv.org/pdf/2307.11526.pdf
Zhi et al., In-Place Scene Labelling and Understanding with Implicit Scene Representation, ICCV 2021, https://arxiv.org/pdf/2103.15875.pdf
Yu, Z., Peng, S., Niemeyer, M., Sattler, T., & Geiger, A. (2022). Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35, 25018-25032, https://niujinshuchong.github.io/monosdf/
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., & Vondrick, C. (2023). Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, https://arxiv.org/abs/2303.11328
Chen, S., Zhang, K., Shi, Y., Wang, H., Zhu, Y., Song, G., ... & Zwicker, M. (2023). PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21068-21077), https://arxiv.org/pdf/2303.14587.pdf
Zhang, Y., Sun, J., He, X., Fu, H., Jia, R., & Zhou, X. (2022). Modeling indirect illumination for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18643-18652), https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_Modeling_Indirect_Illumination_for_Inverse_Rendering_CVPR_2022_paper.pdf
Wang, C., Chai, M., He, M., Chen, D., & Liao, J. (2022). Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3835-3844), https://arxiv.org/abs/2112.05139
Li, Y., Li, S., Sitzmann, V., Agrawal, P., & Torralba, A. (2022, January). 3d neural scene representations for visuomotor control. In Conference on Robot Learning (pp. 112-123). PMLR.