Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction
Sixu Li*, Chaojian Li*, Wenbo Zhu, Boyang Yu, Yang Zhao,
Cheng Wan, Haoran You, Huihong Shi, and Yingyan (Celine) Lin
�Georgia Institute of Technology
The 50th International Symposium on
Computer Architecture (ISCA 2023)
Have you experienced 3D reconstruction to create anything in the digital world?
[Source: https://jonbarron.info/mipnerf360]
Background: What is 3D Reconstruction?
3D
Reconstruction
[Source: https://jonbarron.info/mipnerf360]
…
3D Reconstruction Demand has Surged
3D Reconstruction Demand has Surged
[Source: y2u.be/TX9qSaGXFyg]
[Source: y2u.be/afdnbXXbBTg]
[Source: y2u.be/XXXMrD7aWNs]
Virtual Telepresence
Metaverse
Rescue Robots
[Source: shorturl.at/aDHY6]
On-Device 3D Reconstruction: Not Yet Possible
On-Device 3D Reconstruction: Not Yet Possible
[Müller et. al., SIGGRAPH’22]
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 1: Identify the Bottleneck
Contribution 2: Instant-3D Algorithm
Contribution 2: Instant-3D Algorithm
Contribution 2: Instant-3D Algorithm
Norm. Grid Size of Density Branch | Norm. Grid Size of Color Branch | Avg. Training Runtime (s)* | Avg. Test PSNR/Accuracy** |
1 | 1 | 72 | 26.0 |
0.25 | 1 | 65 (↓ 9.7%) | 25.4 |
1 | 0.25 | 63 (↓ 12.5%) | 26.0 |
* Training time is measured on an edge GPU [Source: shorturl.at/rFMS0]
**PSNR/accuracy is measured on NeRF-Synthetic Dataset [Mildenhall et. al., ECCV’20]
Color Branch
Density Branch
Winning Configuration
👍
Contribution 2: Instant-3D Algorithm
Norm. Update Freq. of Density Branch | Norm. Update Freq. of Color Branch | Avg. Training Runtime (s)* | Avg. Test PSNR/Accuracy** |
1 | 1 | 72 | 26.0 |
0.5 | 1 | 67 (↓ 6.9%) | 24.3 |
1 | 0.5 | 65 (↓ 9.7%) | 25.9 |
* Training time is measured on an edge GPU [Source: shorturl.at/rFMS0]
**PSNR/accuracy is measured on NeRF-Synthetic Dataset [Mildenhall et. al., ECCV’20]
Color Branch
Density Branch
Winning Configuration
👍
Contribution 2: Instant-3D Algorithm
Larger grid size
Smaller grid size
More frequent weights update
Less frequent weights update
Contribution 3: Instant-3D Accelerator
Contribution 3: Instant-3D Accelerator
Memory Address
Avg. Inter-Group Distance: 60,000
Avg. Intra-Group Distance: 2
(Correlate to their positions in the 3D grid)
Contribution 3: Instant-3D Accelerator
Instant-3D Accelerator: Overview
Idle banks
Low-utilization in read requests to SRAM
Observations
Timestep
Instant-3D Accelerator: Overview
Memory Address
…
…
Low-utilization in read requests to SRAM
Frequent write requests to the same address
Observations
Instant-3D Accelerator: Overview
Low-utilization in read requests to SRAM
Frequent write requests to the same address
Different model sizes for different applications
Observations
Instant-3D Accelerator: Overview
Pre-fetches & executes memory accesses
Accumulates requests into necessary ones
Fuse processing cores for flexibility
Proposed Techniques
Low-utilization in read requests to SRAM
Frequent write requests to the same address
Different model sizes for different applications
Observations
Instant-3D Acc.: Feed-Forward Read Mapper
Instant-3D Acc.: Feed-Forward Read Mapper
Memory Address
Avg. Inter-Group Distance: 60,000
Avg. Intra-Group Distance: 2
…
SRAM Bank to be Accessed
Idle banks cause low-utilization
Instant-3D Acc.: Feed-Forward Read Mapper
…
SRAM Bank to be Accessed
…
Timestep
…
…
Instant-3D Acc.: Feed-Forward Read Mapper
Feed-Forward Read Mapper (FRM) Unit
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Back-Prop. Update Merger
Timestep of a Sliding Window of 1000 Continuous Accesses
Instant-3D Acc.: Back-Prop. Update Merger
Memory Address
…
…
Temporal Locality
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Back-Prop. Update Merger
Instant-3D Acc.: Multi-Core Fusion Scheme
Instant-3D Acc.: Multi-Core Fusion Scheme
Different Grid Sizes in Instant-3D Algorithm
Instant-3D Acc.: Multi-Core Fusion Scheme
Different Grid Sizes in Instant-3D Algorithm
Different Grid Sizes in Scenes with Varying Scales
Instant-3D Acc.: Multi-Core Fusion Scheme
Instant-3D Acc.: Multi-Core Fusion Scheme
Instant-3D Acc.: Multi-Core Fusion Scheme
Instant-3D Acc.: Multi-Core Fusion Scheme
Instant-3D Acc.: Multi-Core Fusion Scheme
Evaluation Setup
[Mildenhall et. al., ECCV’20]
[Courteaux et. al., MMSys’22]
[Dai et. al., CVPR’17]
Evaluation Setup
[Mildenhall et. al., ECCV’20]
Device | Jetson Nano | Jetson TX2 | Xavier NX | Instant-3D (Ours) |
SRAM | 2.5 MB | 5 MB | 11 MB | 1.5 MB |
Area | 118 mm2 | N/A | 350 mm2 | 6.8 mm2 |
Frequency | 0.9 GHz | 1.4 GHz | 1.1 GHz | 0.8 GHz |
[Courteaux et. al., MMSys’22]
[Dai et. al., CVPR’17]
Instant-3D’s Speedup Over Baselines
Instant-NGP on Edge GPU
Our Instant-3D
[Müller et. al., SIGGRAPH’22]
Key Insights in Instant-3D
Observation:
Different quality sensitivities on color and density branches
Allocate different model complexities for color/density
Observation:
Predictable memory access patterns during dominant steps
Dedicated design to reorganize memory accesses and fuse cores
Instant-3D Algorithm
Instant-3D Accelerator
Key Insights in Instant-3D
Our Instant-3D has delivered the first instant on-device Neural Radiance Fields (NeRF)-based 3D reconstruction
Instant-3D Algorithm
Instant-3D Accelerator
Allocate different model complexities for color/density
Dedicated design to reorganize memory accesses and fuse cores
Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction
The 50th International Symposium on
Computer Architecture (ISCA 2023)
This work was supported by the NSF through the CCF program and CoCoSys, one of the seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.
Sixu Li*, Chaojian Li*, Wenbo Zhu, Boyang Yu, Yang Zhao,
Cheng Wan, Haoran You, Huihong Shi, and Yingyan (Celine) Lin
�Georgia Institute of Technology
FRM & BUM Overheads
Speedup Breakdown
Compare with NeRF Inference Accelerator
Why On-The-Fly 3D Reconstruction
[Dai et. al., CVPR’17]
Implementation Details
What If the Hash Table Size is Larger
50%
Table
3D Points
Array A
Left 50%
Table
3D Points
Array A
Step 1: Load first 50% table, process the input
Step 2: Load left 50 % table, process the same input
Why Temporal Locality Exists
Only Compare with Commercial Devices
Device | Jetson Nano | Jetson TX2 | Xavier NX | Instant-3D (Ours) |
SRAM | 2.5 MB | 5 MB | 11 MB | 1.5 MB |
Area | 118 mm2 | N/A | 350 mm2 | 6.8 mm2 |
Frequency | 0.9 GHz | 1.4 GHz | 1.1 GHz | 0.8 GHz |
Only Compare with Commercial Devices
Only Compare with Commercial Devices
SOTA Efficient Algorithm: How Does it Work?
[Mildenhall et. al., ECCV’20]
SOTA Efficient Algorithm: How Does it Work?
[Mildenhall et. al., ECCV’20]
SOTA Efficient Algorithm: How Does it Work?
[Müller et. al., SIGGRAPH’22]
Reason for the Observed Memory Access Pattern
In the same cell
Access different addresses randomly
[Müller et. al., SIGGRAPH’22]
Why Not Directly Frequent Update SRAM?
We Already Have GPUs, Why Accelerator?
We Already Have GPUs, Why Accelerator?
A Good Time for Building Accelerator?