1 of 8

Fast Machine Learning for Science Workshop 2023

Imperial College London

09/27/2023

AI Upscaling with Super Resolution CNNs on FPGAs and ASICs

2 of 8

Super Resolution

  • Super resolution (SR): Techniques aimed at enhancing the resolution and quality of an image
  • CNN-based SR increasingly popular in recent years
    • Feature extraction, upsample, add, depth to space,
  • Video streaming has become more data intensive

(SD → HD → 2K → 4K → 8K!)

  • One possible solution: Transmit lower resolution image, upscale on consumer device
  • SR upscaling challenges
    • Large inputs, three-dimensional
    • Computationally expensive and slow on CPU
  • FPGA/ASIC-based solution
    • Computation and streaming can be parallelized

Upsample

3 of 8

  • Quantization aware training
  • Batch normalization
  • Model exploration with AutoQKeras
    • Minimize BOPs to conserve resources
  • Heterogeneous quantization
  • Implement lamda layers (up-sample, depth-to-space) with hls4ml Extensions API
    • Output 9X larger than input
  • FIFO depth optimization

Up-sample

Depth-to-space

Clip

SR on FPGAs and ASICs

680x216

2040x648

4 of 8

 XCVU9P Resource Capacity

  • LUTs: 1,182,240
    • Usable for Logic: 1,182,240
    • Usable as Memory: 591,840
  • Flip Flops: 2,364,480
  • BRAMs: 4,320
  • DSPs: 6,840

  • All models >4b achieve >30 PSNR

Model Size vs. Performance (Logic Synthesis)

Model

Bits

BOPS�relative to 8b (absolute)

 BRAMs�relative to 8b (absolute)

LUT

FF

relative to 8b (absolute)

MAE

HLS

PSNR (dB)

HLS

implements logic

implements memory

base7_qkeras_8b

8

100% (1.08e+18)

100% (384.5)

100% (40,474)

100% (59,884)

100% (46,021)

4.70

30.97

base7_qkeras_7b

7

85.4% (9.22+17)

100% (384.5)

132% (53,244)

87.7% (52,536)

103% (47,329)

4.75

30.91

base7_qkeras_6b

6

72.2% (7.80e+17)

85.4% (328.5)

120% (48,676)

75.5% (45,188)

95.1% (43,766)

4.84

30.75

base7_qkeras_5b

5

60.6% (6.54e+17)

78.1% (300.5)

108% (43,594)

63.1% (37,840)

87.7% (40,344)

5.11

30.23

heterogeneous_1

2-6

60.4% (6.52e+17)

78.1% (300.5)

104% (41,996)

55.0% (32,956)

83.7% (38,529)

5.13

30.18

base7_qkeras_4b

4

50.4% (5.44e+17)

63.6% (244.5)

95.5% (38,638)

50.9% (30,492)

80.1% (36,884)

5.33

29.83

heterogeneous_2

2-5

48.9% (5.28e+17)

78.1% (300.5)

93.1% (37,709)

46.8% (28,028)

77.4% (35,621)

5.31

29.80

base7_qkeras_3b

3

41.7% (4.50e+17)

63.6% (244.5)

83.1% (33,635)

38.6% (23,144)

72.6% (33,431)

6.51

27.48

base7_qkeras_2b

2

34.5% (3.73e+17)

49.0% (188.5)

75.1% (30,416)

26.4% (15,796)

65.1% (29,985)

5.82

29.02

Percentage values are relative to 8b implementation

5 of 8

 XCVU9P Resource Capacity

  • LUTs: 1,182,240
    • Usable for Logic: 1,182,240
    • Usable as Memory: 591,840
  • Flip Flops: 2,364,480
  • BRAMs: 4,320
  • DSPs: 6,840

Percentage values are relative to 8b implementation

  • All models >2b achieve >30 PSNR

Model Size vs. Performance (Logic Synthesis) w/ Batch Normalization

Model

Bits

BOPS�relative to 8b (absolute)

 BRAMs�relative to 8b (absolute)

LUT

FF

relative to 8b (absolute)

MAE

HLS

PSNR (dB)

HLS

implements logic

implements memory

base7_qkeras_8b

8

100% (1.08e+18)

100% (384.5)

100% (49,916)

100% (59,884)

100% (56,300)

4.90

30.66

base7_qkeras_7b

7

85.4% (9.22+17)

100% (384.5)

120.6(60,222)

87.7% (52,536)

100.5% (56,576)

4.96

30.56

base7_qkeras_6b

6

72.2% (7.80e+17)

85.4% (328.5)

116.8% (58,302)

75.5% (45,188)

95.4% (53,724)

4.97

30.57

base7_qkeras_5b

5

60.6% (6.54e+17)

78.1% (300.5)

106.5% (53,180)

63.1% (37,840)

89.0% (50,133)

4.99

30.54

heterogeneous_1

2-6

60.4% (6.52e+17)

78.1% (300.5)

103.5% (51,654)

55.0% (32,956)

85.6% (48,207)

5.06

30.37

base7_qkeras_4b

4

50.4% (5.44e+17)

63.6% (244.5)

96.5% (48,187)

50.9% (30,492)

82.6% (46,506)

5.16

30.20

heterogeneous_2

2-5

48.9% (5.28e+17)

78.1% (300.5)

95.2% (47,528)

46.8% (28,028)

80.4% (45,263)

5.11

30.25

base7_qkeras_3b

3

41.7% (4.50e+17)

63.6% (244.5)

76.8% (38,340)

38.6% (23,144)

65.7% (36,998)

5.24

30.06

base7_qkeras_2b

2

34.5% (3.73e+17)

49.0% (188.5)

72.4% (36,130)

26.4% (15,796)

59.1% (33,260)

5.64

29.40

6 of 8

SR on FPGAs and ASICs

2040 × 648 pixels

3X

From RTL simulation

2040 × 648 pixels

3X

From the test set

Python

680 × 216 pixels

LR

HR

HR

7 of 8

Future Work

2040 × 648 pixels

3X

From RTL simulation

2040 × 648 pixels

3X

From the test set

Python

  • Continue work on ASIC implementation in collaboration with industry collaborators
    • Initial testing on Alveo accelerator successful

8 of 8

Thanks!

The Team: Giuseppe Di Guglielmo, Jovan Mitrevski, Ben Hawks, Javier Campos, Nhan Tran, Jules Muhizi, Ryan Forelli, David Burnette