RT-NeRF: Real-Time On-Device Neural Radiance Fields� Towards Immersive AR/VR Rendering
Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, and Yingyan (Celine) Lin
Georgia Institute of Technology
Efficient and Intelligent Computing Lab
NeRF as a Tool to Generate Novel Views
Video source: youtu.be/HfJpQCBTqZs
Inputs: Sparsely sampled views
Outputs: Images of any new view
SOTA Efficient NeRF’s Pipeline: How Does It Work?
Video source: [Mildenhall et. al., ECCV’20]
Real-Time NeRF Is Increasingly Demanded
Virtual Meetings
Metaverse
Autonomous Driving
Simulation
Source: shorturl.at/gCFMW
Source: shorturl.at/kmnvZ
Source: shorturl.at/fGUY7
…
SOTA Efficient NeRF Desired Real-Time NeRF
6 GB
FastNeRF [Garbin et. al., ICCV’21]
TensoRF [Chen et. al., ECCV’22]
Our RT-NeRF
NeRF [Mildenhall et. al., ECCV’20]
Memory cost
Techniques
SOTA Efficient NeRF Desired Real-Time NeRF
can only achieve 0.01 FPS [Chen et. al., ECCV’22]
Memory cost
6 GB
Techniques
Throughput
30 FPS
Techniques
FastNeRF [Garbin et. al., ICCV’21]
TensoRF [Chen et. al., ECCV’22]
Our RT-NeRF
NeRF [Mildenhall et. al., ECCV’20]
Contribution 1: Analyze the Efficiency Bottlenecks
Source: [Mildenhall et. al., ECCV’20]
100%
75%
50%
25%
0%
Map pixels to rays
Contribution 1: Analyze the Efficiency Bottlenecks
Source: [Mildenhall et. al., ECCV’20]
100%
75%
50%
25%
0%
Query the features of points along the rays
Contribution 1: Analyze the Efficiency Bottlenecks
Source: [Mildenhall et. al., ECCV’20]
100%
75%
50%
25%
0%
Render pixels’ colors
Contribution 2: Identify Two Key Bottlenecks
Bottleneck 1
Contribution 2: Identify Two Key Bottlenecks
Bottleneck 1
Bottleneck 2
Zoom-in Bottleneck 1 (Locate Pre-Existing Points)
Skip the following steps
Continue the following steps
1) If zero
2) If non-zero
Zoom-in Bottleneck 1 (Locate Pre-Existing Points)
Skip the following steps
Continue the following steps
1) If zero
2) If non-zero
We identify the corresponding cause: Irregular accesses to the occupancy grid
because rays can come from any direction
+
×
×
×
Scalar Multiplication
Decompose to matrix-vector pairs
Zoom-in Bottleneck 2 (Compute Points’ Embeddings)
+
×
×
×
Scalar Multiplication
Decompose to matrix-vector pairs
Zoom-in Bottleneck 2 (Compute Points’ Embeddings)
We identify the corresponding cause: The sparse decomposed embedding grid
is treated as a dense one, i.e., the sparsity was not leveraged.
Overview of the Proposed RT-NeRF
To Alleviate Bottleneck 1
(Locate the Pre-Existing Points)
Only query points in the cube
Propose a New Efficient Rendering Pipeline
Propose a Hybrid Sparse Encoding Scheme & Bi-Direction Trees
To Alleviate Bottleneck 1
(Locate the Pre-Existing Points)
To Alleviate Bottleneck 2
(Compute Points’ Embeddings)
Only query points in the cube
Bitmap-based
Coordinate-based
Denser
Sparser
Propose a New Efficient Rendering Pipeline
Overview of the Proposed RT-NeRF
RT-NeRF’s Algorithm Contribution
Propose a Hybrid Sparse Encoding Scheme & Bi-Direction Trees
To Alleviate Bottleneck 1
(Locate the Pre-Existing Points)
To Alleviate Bottleneck 2
(Compute Points’ Embeddings)
Only query points in the cube
Bitmap-based
Coordinate-based
Denser
Sparser
Propose a New Efficient Rendering Pipeline
Contribution 3: Efficient Rendering Pipeline
Only query points in the cube
Irregular accesses
…
Stored non-zero cubes’ location
Stored non-zero cubes’ location
Existing rendering pipeline [Chen et. al., ECCV’22]
Our proposed efficient rendering pipeline
Regular accesses
…
Our Proposed Efficient Rendering Pipeline : Motivation
Efficient Rendering Pipeline: Implementation
Efficient Rendering Pipeline: Implementation
Loop over all sampled points to locate the blue one
Efficient Rendering Pipeline: Implementation
Irregular point cloud in the 3D space
Regular grid in the 2D plane
Efficient Rendering Pipeline: Implementation
Efficient Rendering Pipeline: Implementation
Efficient Rendering Pipeline: Implementation
RT-NeRF’s Hardware Contributions
Propose a Hybrid Sparse Encoding Scheme & Bi-Direction Trees
To Alleviate Bottleneck 1
(Locate the Pre-Existing Points)
To Alleviate Bottleneck 2
(Compute Points’ Embeddings)
Only query points in the cube
Bitmap-based
Coordinate-based
Denser
Sparser
Propose a New Efficient Rendering Pipeline
Contribution 4: Hybrid Sparse Encoding
Our Proposed Hybrid Sparse Encoding: Motivation
A uniform sparse encoding/decoding scheme is suboptimal
Our Proposed Hybrid Sparse Encoding: Motivation
Hybrid Sparse Encoding: Design Targets
Encoding Scheme | Storage Size (↓) | Decoding Throughput (↑) | Resource Utilization (↑) |
Bitmap -based | 🌟🌟🌟 | 🌟 | 🌟🌟🌟 |
Our proposed | 🌟🌟🌟 | 🌟🌟🌟 | 🌟🌟🌟 |
Encoding Scheme | Storage Size (↓) | Decoding Throughput (↑) | Resource Utilization (↑) |
Coordinate-based | 🌟🌟🌟 | 🌟🌟🌟 | 🌟 |
Our proposed | 🌟🌟🌟 | 🌟🌟🌟 | 🌟🌟🌟 |
Vanilla Hybrid Sparse Encoding & Decoding Implementation
Bitmap-based encoding & decoding
Coordinate-based encoding & decoding
2) If sparsity ratio ≥ 80%
1) If sparsity ratio < 80%
Low decoding throughput due to location-dependent various decoding latencies
Low decoding resource utilization when matrices are too sparse
Low Decoding Throughput For Bitmap-based Decoding
Various decoding latencies 🡪 Low decoding throughput
3 cycles for decoding*
8 cycles for decoding*
*Assuming an adder tree w/ 7 adders
Under Utilization When Decoding Sparse Matrices
Wasted hardware resources 🡪 Under utilization
Contribution 5: Improved Bitmap-based Scheme to Boost Throughput
Sparse Bitmap Matrix
Non-Zero
Element Array
1
0
Bitmap encoding: 1-bit binary metadata
Matrix Row
Pointer Vector
0
4
6
10
15
17
20
24
Matric row pointer vector: Addresses of 1st non-zero element of each row
Contribution 5: Improved Bitmap-based Scheme to Boost Throughput
Sparse Bitmap Matrix
Non-Zero
Element Array
1
0
Matrix Row
Pointer Vector
0
4
6
10
15
17
20
24
+
Index Control Unit
1-bit
Adder Tree
6
Target Non-Zero Element
Target Element Location
Cycle 1: Check the bitmap matrix element 1 or 0
Cycle 2: Sum up 1-bit bitmap vector and then add the row pointer value
1
6
6
Target Row Pointer Value
Cycle 3: Fetch the target non-zero element
7
Target Non-Zero
Element
Adder Sub-tree A
Adder Sub-tree B
Mode 1:
+
×
×
×
Compute Points’ Embeddings
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
Contribution 6: Bi-Direction Trees to Boost Utilization
Adder Sub-tree A
Search Sub-tree B
Mode 1:
Mode 2:
Adder Sub-tree A
Adder Sub-tree B
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
Contribution 6: Bi-Direction Trees to Boost Utilization
Search Path
Adder Path
Shared Path
Leaf Node
Trunk Node
Bi-Direction Trees: Reconfigurable Implementation
Evaluation Setup
Jetson Nano
ICARUS [Rao et. al., SIGGRAPH Asia’2022]
RTX 2080Ti
Tesla 2080Ti
Threadripper 3970x
Compare with edge devices on different datasets
30 FPS
RT-NeRF’s Speedup Over Baselines
Compare with edge devices on different datasets
RT-NeRF’s Energy Efficiency Over Baselines
Summary
Our RT-NeRF framework has delivered the first real-time neural rendering solution suited for edge applications
RT-NeRF: Real-Time On-Device Neural Radiance Fields� Towards Immersive AR/VR Rendering
Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, and Yingyan (Celine) Lin
Georgia Institute of Technology
This work is supported by the National Science Foundation (NSF) through the CCRI program and the NSF RTML program
Hybrid Sparse Encoding: How to Encode Sparse Matrices ?
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
Hybrid Sparse Encoding: How to Encode Sparse Matrices ?
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
(x,y)=(2,1)
(x,y)=(0,3)
(x,y)=(2,6)
(s,y)=(3,7)
(x,y)=(6,0)
(x,y)=(5,1)
(x,y)=(6,5)
(x,y)=(4,6)
Search tree: Store the coordinates in the leaves
Hybrid Sparse Encoding: How to Encode Sparse Matrices ?
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
(x,y)=(2,1)
(x,y)=(0,3)
(x,y)=(2,6)
(s,y)=(3,7)
(x,y)=(6,0)
(x,y)=(5,1)
(x,y)=(6,5)
(x,y)=(4,6)
Target Non-Zero Element
Search tree: Store the coordinates in the leaves
Hybrid Sparse Encoding: How to Decode Sparse Matrices ?