Fast Machine Learning for Science Workshop 2023
Imperial College London
09/27/2023
AI Upscaling with Super Resolution CNNs on FPGAs and ASICs
Super Resolution
(SD → HD → 2K → 4K → 8K!)
Upsample
Up-sample
Depth-to-space
Clip
SR on FPGAs and ASICs
680x216
2040x648
XCVU9P Resource Capacity
Model Size vs. Performance (Logic Synthesis)
Model | Bits | BOPS�relative to 8b (absolute) | BRAMs�relative to 8b (absolute) | LUT | FF relative to 8b (absolute) | MAE HLS | PSNR (dB) HLS | |
implements logic | implements memory | |||||||
base7_qkeras_8b | 8 | 100% (1.08e+18) | 100% (384.5) | 100% (40,474) | 100% (59,884) | 100% (46,021) | 4.70 | 30.97 |
base7_qkeras_7b | 7 | 85.4% (9.22+17) | 100% (384.5) | 132% (53,244) | 87.7% (52,536) | 103% (47,329) | 4.75 | 30.91 |
base7_qkeras_6b | 6 | 72.2% (7.80e+17) | 85.4% (328.5) | 120% (48,676) | 75.5% (45,188) | 95.1% (43,766) | 4.84 | 30.75 |
base7_qkeras_5b | 5 | 60.6% (6.54e+17) | 78.1% (300.5) | 108% (43,594) | 63.1% (37,840) | 87.7% (40,344) | 5.11 | 30.23 |
heterogeneous_1 | 2-6 | 60.4% (6.52e+17) | 78.1% (300.5) | 104% (41,996) | 55.0% (32,956) | 83.7% (38,529) | 5.13 | 30.18 |
base7_qkeras_4b | 4 | 50.4% (5.44e+17) | 63.6% (244.5) | 95.5% (38,638) | 50.9% (30,492) | 80.1% (36,884) | 5.33 | 29.83 |
heterogeneous_2 | 2-5 | 48.9% (5.28e+17) | 78.1% (300.5) | 93.1% (37,709) | 46.8% (28,028) | 77.4% (35,621) | 5.31 | 29.80 |
base7_qkeras_3b | 3 | 41.7% (4.50e+17) | 63.6% (244.5) | 83.1% (33,635) | 38.6% (23,144) | 72.6% (33,431) | 6.51 | 27.48 |
base7_qkeras_2b | 2 | 34.5% (3.73e+17) | 49.0% (188.5) | 75.1% (30,416) | 26.4% (15,796) | 65.1% (29,985) | 5.82 | 29.02 |
Percentage values are relative to 8b implementation
XCVU9P Resource Capacity
Percentage values are relative to 8b implementation
Model Size vs. Performance (Logic Synthesis) w/ Batch Normalization
Model | Bits | BOPS�relative to 8b (absolute) | BRAMs�relative to 8b (absolute) | LUT | FF relative to 8b (absolute) | MAE HLS | PSNR (dB) HLS | |
implements logic | implements memory | |||||||
base7_qkeras_8b | 8 | 100% (1.08e+18) | 100% (384.5) | 100% (49,916) | 100% (59,884) | 100% (56,300) | 4.90 | 30.66 |
base7_qkeras_7b | 7 | 85.4% (9.22+17) | 100% (384.5) | 120.6% (60,222) | 87.7% (52,536) | 100.5% (56,576) | 4.96 | 30.56 |
base7_qkeras_6b | 6 | 72.2% (7.80e+17) | 85.4% (328.5) | 116.8% (58,302) | 75.5% (45,188) | 95.4% (53,724) | 4.97 | 30.57 |
base7_qkeras_5b | 5 | 60.6% (6.54e+17) | 78.1% (300.5) | 106.5% (53,180) | 63.1% (37,840) | 89.0% (50,133) | 4.99 | 30.54 |
heterogeneous_1 | 2-6 | 60.4% (6.52e+17) | 78.1% (300.5) | 103.5% (51,654) | 55.0% (32,956) | 85.6% (48,207) | 5.06 | 30.37 |
base7_qkeras_4b | 4 | 50.4% (5.44e+17) | 63.6% (244.5) | 96.5% (48,187) | 50.9% (30,492) | 82.6% (46,506) | 5.16 | 30.20 |
heterogeneous_2 | 2-5 | 48.9% (5.28e+17) | 78.1% (300.5) | 95.2% (47,528) | 46.8% (28,028) | 80.4% (45,263) | 5.11 | 30.25 |
base7_qkeras_3b | 3 | 41.7% (4.50e+17) | 63.6% (244.5) | 76.8% (38,340) | 38.6% (23,144) | 65.7% (36,998) | 5.24 | 30.06 |
base7_qkeras_2b | 2 | 34.5% (3.73e+17) | 49.0% (188.5) | 72.4% (36,130) | 26.4% (15,796) | 59.1% (33,260) | 5.64 | 29.40 |
SR on FPGAs and ASICs
2040 × 648 pixels
3X
From RTL simulation
2040 × 648 pixels
3X
From the test set
Python
680 × 216 pixels
LR
HR
HR
Future Work
2040 × 648 pixels
3X
From RTL simulation
2040 × 648 pixels
3X
From the test set
Python
Thanks!
The Team: Giuseppe Di Guglielmo, Jovan Mitrevski, Ben Hawks, Javier Campos, Nhan Tran, Jules Muhizi, Ryan Forelli, David Burnette