MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Published in ICCV 2023
We propose a radiance field that could generalize to multiple dynamic scenes
Fengrui Tian1, Shaoyi Du1, Yueqi Duan2
1College of Artificial Intelligence, Xi’an Jiaotong University
2Department of Electrical Engineering, Tsinghua University
Code
Introduction – NeRF
NeRF pipeline
Novel view synthesis of orchid
Implicit radiance field
[1] Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 2020.
Vanilla NeRF cannot render dynamic scenes
Introduction – NeRF in Dynamic Scenes
[1] Li, Z., Niklaus, S., Snavely, N., & Wang, O., Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes. In CVPR, 2021. https://doi.org/10.1109/cvpr46437.2021.00643
[2] Video link: https://www.bilibili.com/video/BV1US4y1G7Qu/?spm_id_from=333.337.search-card.all.click&vd_source=ae30a8dad8d04331531446de3242367a
Novel view synthesis of dynamic scenes from monocular videos
Bullet time in movie production and sports event
Space-time interpolation
Challenge – Ambiguity from Monocular Video
Only one 2D video frame and 2D optical flow at any timestamp
No precise 3D information
Which one ?
Multiview
Monocular
Previous Works – Use Positions
Break the ambiguity by positions, which has no transferable ability across scenes
Can we learn a dynamic radiance field that generalizes to multiple scenes?
Two scenes are mixed due to no transferable ability of positions
Our Solution – MonoNeRF
2D video frames and optical flows are a pair of complementary constraints to jointly estimate 3D point information and trajectories.
2D video frames – spatial constraint
Optical flows – temporal constraint
Our Solution – MonoNeRF
Monocular Video
NeRF
Rendering
Spatial Constraint
Dynamic Scene
2D Video Frame
Optical Flow
Temporal Constraint
Encoder
Flow-based Feature Aggregation
Break the Ambiguity
Frame-wise Feature
Flow Field
Pipeline – Build Flow Field
Frame-wise Feature
Flow Field
Monocular Video
Encoder
Pipeline – Build Flow Field
Frame-wise Feature
Flow Field
Monocular Video
Encoder
Velocity field
Point Trajectory
Position
Time
Pipeline – Sample Point Features
Monocular Video
Frame-wise Feature
Flow Field
Encoder
Flow-based Feature Aggregation
Break the Ambiguity
Pipeline – Render Dynamic Scenes
Monocular Video
Frame-wise Feature
Flow Field
Encoder
Flow-based Feature Aggregation
Break the Ambiguity
NeRF
Rendering
Dynamic Scene
Pipeline – Spatial-Temporal Constraint
Monocular Video
NeRF
Rendering
Spatial Constraint
Dynamic Scene
2D Video Frame
Optical Flow
Temporal Constraint
Frame-wise Feature
Flow Field
Encoder
Flow-based Feature Aggregation
Break the Ambiguity
Traditional Setting – Single Scene
[1] B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 2020.
[2] Qiao, Yi-Ling and Gao, Alexander and Lin, Ming C, NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos, In NeurIPS, 2022
MonoNeRF outperforms other SOTA methods on traditional novel view synthesis task.
Application #1 – Video Stream
Training Frames
Unseen Frames
Novel view synthesis on unseen frames containing new foreground motions
[1] Gao, C., Saraf, A., Kopf, J. and Huang, J.B., 2021. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
Application #1 – Video Stream
Ours
DynNeRF
MonoNeRF could transfer to new motions, whereas DynNeRF only interpolates in the training frames.
Free-viewpoint video rendering
Application #2 – General Radiance Field
Multiple Monocular Videos
MonoNeRF
General Dynamic Radiance Field
A general dynamic radiance field of multiple scenes
Application #2 – General Radiance Field
Ours – Balloon2
Ours – Umbrella
[1] Gao, C., Saraf, A., Kopf, J. and Huang, J.B., 2021. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
DynNeRF
MonoNeRF could learn general dynamic radiance field from multiple scenes, whereas other methods fail to distinguish scenes.
Application #3 – Novel Scene Adaptation
Novel Scene Adaption
Training Scene
Novel Scene
Finetuning on novel scenes with 500 steps (10 minutes)
Application #3 – Novel Scene Adaptation
[1] Gao, C., Saraf, A., Kopf, J. and Huang, J.B., 2021. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
Ours (10-minute finetuning)
DynNeRF (10-minute finetuning)
MonoNeRF could transfer to a novel scene with 10-minute finetuning, compared to other methods that need 1 day to train from scratch.
Free-viewpoint video rendering
Application #4 – Editing
Change Background
Change Background + Flip Foreground
Change Background + Scale Foreground
Move Foreground
Code
Future Work #1 – Finetuning
Still need 500-step finetuning due to limited dataset and resources
[1] Gao, C., Saraf, A., Kopf, J. and Huang, J.B., 2021. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
Ours (500 Steps)
DynNeRF (500 Steps)
Future Work #2 – Flow Generalization Ability
Limited generalization ability of flow estimation
500-step finetuning
In Summary
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Published in ICCV 2023
Fengrui Tian, Shaoyi Du, Yueqi Duan
Code: https://github.com/tianfr/MonoNeRF
Arxiv: https://arxiv.org/abs/2212.13056
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Thank You