1 of 70

Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction

Sixu Li*, Chaojian Li*, Wenbo Zhu, Boyang Yu, Yang Zhao,

Cheng Wan, Haoran You, Huihong Shi, and Yingyan (Celine) Lin

^�Georgia Institute of Technology

The 50^th International Symposium on

Computer Architecture (ISCA 2023)

2 of 70

Have you experienced 3D reconstruction to create anything in the digital world?

[Source: https://jonbarron.info/mipnerf360]

3 of 70

Background: What is 3D Reconstruction?

3D

Reconstruction

[Source: https://jonbarron.info/mipnerf360]

Input: Sparsely sampled images
Output: 2D images from any new view of the same scene

…

4 of 70

3D Reconstruction Demand has Surged

5 of 70

3D Reconstruction Demand has Surged

[Source: y2u.be/TX9qSaGXFyg]

[Source: y2u.be/afdnbXXbBTg]

[Source: y2u.be/XXXMrD7aWNs]

Virtual Telepresence

Metaverse

Rescue Robots

Estimated market value: > $1.8 billion by 2028

[Source: shorturl.at/aDHY6]

6 of 70

On-Device 3D Reconstruction: Not Yet Possible

Instant on-device 3D reconstruction is highly desirable

7 of 70

On-Device 3D Reconstruction: Not Yet Possible

[Müller et. al., SIGGRAPH’22]

8 of 70

Contribution 1: Identify the Bottleneck

9 of 70

Contribution 1: Identify the Bottleneck

10 of 70

Contribution 1: Identify the Bottleneck

11 of 70

Contribution 1: Identify the Bottleneck

12 of 70

Contribution 1: Identify the Bottleneck

13 of 70

Contribution 1: Identify the Bottleneck

14 of 70

Contribution 1: Identify the Bottleneck

15 of 70

Contribution 2: Instant-3D Algorithm

We leverage: the reconstruction quality has different sensitivities on the color and density branches

16 of 70

Contribution 2: Instant-3D Algorithm

We leverage: the reconstruction quality has different sensitivities on the color and density branches

17 of 70

Contribution 2: Instant-3D Algorithm

Experiments on the two branches w/ different grid sizes

Norm. Grid Size of Density Branch	Norm. Grid Size of Color Branch	Avg. Training Runtime (s)*	Avg. Test PSNR/Accuracy**
1	1	72	26.0
0.25	1	65 (↓ 9.7%)	25.4
1	0.25	63 (↓ 12.5%)	26.0

* Training time is measured on an edge GPU [Source: shorturl.at/rFMS0]

**PSNR/accuracy is measured on NeRF-Synthetic Dataset [Mildenhall et. al., ECCV’20]

Color Branch

Density Branch

Winning Configuration

👍

18 of 70

Contribution 2: Instant-3D Algorithm

Experiments on the two branches w/ different update freq.

Norm. Update Freq. of Density Branch	Norm. Update Freq. of Color Branch	Avg. Training Runtime (s)*	Avg. Test PSNR/Accuracy**
1	1	72	26.0
0.5	1	67 (↓ 6.9%)	24.3
1	0.5	65 (↓ 9.7%)	25.9

* Training time is measured on an edge GPU [Source: shorturl.at/rFMS0]

**PSNR/accuracy is measured on NeRF-Synthetic Dataset [Mildenhall et. al., ECCV’20]

Color Branch

Density Branch

Winning Configuration

👍

19 of 70

Contribution 2: Instant-3D Algorithm

Our Instant-3D algorithm allocates different model complexities for the color and density branches

Larger grid size

Smaller grid size

More frequent weights update

Less frequent weights update

We leverage: the reconstruction quality has different sensitivities on the color and density branches

20 of 70

Contribution 3: Instant-3D Accelerator

We observe: the memory access pattern during embedding grid interpolation is predictable

21 of 70

Contribution 3: Instant-3D Accelerator

We observe: the memory access pattern during embedding grid interpolation is predictable

Memory Address

Avg. Inter-Group Distance: 60,000

Avg. Intra-Group Distance: 2

(Correlate to their positions in the 3D grid)

22 of 70

Contribution 3: Instant-3D Accelerator

Our Instant-3D accelerator reorganizes memory accesses to reduce data movement

We observe: the memory access pattern during embedding grid interpolation is predictable

23 of 70

Instant-3D Accelerator: Overview

Instant-3D accelerator leverages the properties of Instant-3D algorithm and observations on the memory access

Idle banks

Low-utilization in read requests to SRAM

Observations

Timestep

24 of 70

Instant-3D Accelerator: Overview

Memory Address

…

Low-utilization in read requests to SRAM

Frequent write requests to the same address

Observations

Instant-3D accelerator leverages the properties of Instant-3D algorithm and observations on the memory access

25 of 70

Instant-3D Accelerator: Overview

Low-utilization in read requests to SRAM

Frequent write requests to the same address

Different model sizes for different applications

Observations

Instant-3D accelerator leverages the properties of Instant-3D algorithm and observations on the memory access

26 of 70

Instant-3D Accelerator: Overview

Instant-3D accelerator further reduces the dominant memory accesses based on the observations

Pre-fetches & executes memory accesses

Accumulates requests into necessary ones

Fuse processing cores for flexibility

Proposed Techniques

Low-utilization in read requests to SRAM

Frequent write requests to the same address

Different model sizes for different applications

Observations

27 of 70

Instant-3D Acc.: Feed-Forward Read Mapper

We observe:

The large inter-group distance results in low bandwidth utilization when accessing multi-bank SRAM

28 of 70

Instant-3D Acc.: Feed-Forward Read Mapper

Memory Address

Avg. Inter-Group Distance: 60,000

Avg. Intra-Group Distance: 2

…

SRAM Bank to be Accessed

Idle banks cause low-utilization

We observe:

The large inter-group distance results in low bandwidth utilization when accessing multi-bank SRAM

29 of 70

Instant-3D Acc.: Feed-Forward Read Mapper

…

SRAM Bank to be Accessed

We observe:

The large inter-group distance results in low bandwidth utilization when accessing multi-bank SRAM
Sequential read requests access different SRAM banks

…

Timestep

…

30 of 70

Instant-3D Acc.: Feed-Forward Read Mapper

We observe:

The large inter-group distance results in low bandwidth utilization when accessing multi-bank SRAM
Sequential read requests access different SRAM banks

Our Feed-Forward Read Mapper (FRM) pre-fetches and pre-executes memory access if idle banks exist

Feed-Forward Read Mapper (FRM) Unit

31 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

32 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Timestep of a Sliding Window of 1000 Continuous Accesses

33 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Memory Address

…

Temporal Locality

34 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Our Back-Propagation Update Merger (BUM) accumulates write requests with a buffer for making only necessary SRAM accesses

35 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Our Back-Propagation Update Merger (BUM) accumulates write requests with a buffer for making only necessary SRAM accesses

36 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Our Back-Propagation Update Merger (BUM) accumulates write requests with a buffer for making only necessary SRAM accesses

37 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Our Back-Propagation Update Merger (BUM) accumulates write requests with a buffer for making only necessary SRAM accesses

38 of 70

Instant-3D Acc.: Back-Prop. Update Merger

We observe: there exist frequent memory write requests to the same address during the back-propagation process

Our Back-Propagation Update Merger (BUM) accumulates write requests with a buffer for making only necessary SRAM accesses

39 of 70