Predicting Post-Route QoR Estimates for HLS Designs using Machine Learning
Pingakshya Goswami
Speaker Info
Name: Pingakshya Goswami
PhD Student, Department of Electrical Engineering
University of Texas at Dallas
Research Interest:
High Level Synthesis FPGA Design Flow
Specification
C/C++/SystemC
Compile
LLVM IR
Allocation
Scheduling
Binding
RTL Generation
Verilog
Logic Synthesis
Tech Mapping
Optimization
Placement
Routing
Functional
Unit Library
int a, b, c, d;
int p,q;
main(){
p=a*b;
q=p*(c+d);
}
Pareto Optimal Points
Design Space Exploration
High Level Synthesis
Characteristics and Implementation Time of ADPCM
FPGA Used: Xilinx Zynq UltraScale+ XCZU7EV
0
ADPCM
HLS
Logic
Synthesis
Physical Design Stage
Run time in
Minutes
5.77
0
Place and Route
3.85
0
Total time ≈ 3500 mins
C-Synthesis
0
3316
adpcm_main
reset
encode
decode
quantl
uppol1
uppol2
upzero
logscl
filtep
scalel
filtez
Design Name | # of loops | # of arrays | # of function | Design Space |
adpcm | 12 | 9 | 11 | > 100,000 |
Quality of Results of a Design
Vivado HLS vs Post Route Values
Current State of the Art
Fast & Accurate HLS Predict
Contributions
Drawbacks
Pyramid HLS Predict
Contributions
Drawbacks
J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang and B. He, "COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications," 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
H. M. Makrani, F. Farahmand, .Sayadi, Sara Bondi , S. Dinakarrao, H.Homayoun, S. Rafatiradao, , “Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis Design," arXiv 2019
S. Dai, Y. Zhou, H. Zhang, E. Ustun, E.F. Y. Young, and Z. Zhang. Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning. Field Programmable Custom Computing Machines (FCCM), 2018.
Ironman
Contributions
Drawbacks
“#pragma HLS allocation instance=mul”.
Nan Wu, Yuan Xie, and Cong Hao. 2021. “IRONMAN: GNN-assisted Design Space Exploration in HLS using Reinforcement Learning” Proceedings of the 2021 on Great Lakes Symposium on VLSI
Comparison with Existing Works
| Predicted Parameters | C-synthesis Required? | Feature Source | Labels | ||
Reference | Resource | Latency | Clock Period | |||
Fast and Accurate | ✔️ | ❌ | ✔️ | Yes | Synthesis Files | Post Route |
Pyramid | ✔️ | ❌ | ✔️ | Yes | Synthesis Files | Post Route |
Comba | ❌ | ✔️ | ❌ | No | Analytical | C-synthesis |
Ironman | ✔️ | ❌ | ✔️ | Yes | Scheduled DFG | Post Route |
This Work | ✔️ | ✔️ | ✔️ | No | C++/LLVM IR | Post Route |
Problem Statement
Given a dataset of synthesizable C/C++ based HLS code with all the pragma information.
Is it possible to create a machine learning based model which will predict the post route clock period, latency and resource requirement of a design without synthesizing the design
Overview of LLVM
Clang Frontend parser
Optimizer
Tech Specific Backend
C/C++
IR Code
IR Code
FPGA HLS Input
Arm assembly
X86 assembly
GPU specific codes
Proposed Flow
Feature Analysis
Source Name | Example Features | Number of Features |
High Level C/C++ Code |
| 13 |
IR File |
| 44 |
Control Flow Graph |
| 6 |
Callgraph |
| 6 |
Total | | 69 |
Callgraph
Control Flow Graph
Data Flow Graph
Graphs in LLVM
Study of Feature Importance
Tool Used: xgboost plot_importance
Selection of Training/Test Data
Model Used | Clock Period | Latency | LUT |
Vivado HLS | 102% | NA | 553% |
Linear Regression | NP | NP | NP |
MLP Regression | 9.69% | 16.58% | 44.14% |
Random Forest | 7.98% | 18.25% | 16.28% |
Gradient Boost | 6.29% | 10.22% | 10.32% |
Design of Experiments
Analysis of Results
| Clock Period | Latency | Resource | ||
Design Name | This Work | Vivado HLS | This Work | This Work | Vivado HLS |
adpcm | 7.75 | 198.09 | 17.54 | 18.94 | 529.27 |
ave8 | 7.28 | 182.14 | 16.41 | 7.82 | 73.18 |
matmul | 7.92 | 67.15 | 11.12 | 8.37 | 560.78 |
sobel | 0.6 | 163.23 | 1.76 | 0.94 | 584.47 |
dfadd | 6.33 | 87.88 | 4.3 | 2.04 | 98.58 |
dfdiv | 0.19 | 26.19 | NA | 0.03 | 114.38 |
dfsin | 2.78 | 39.39 | NA | 3.78 | 167.21 |
aes | 9.44 | 103.46 | NA | 20.78 | 911.25 |
blowfish | 14.35 | 36.57 | NA | 30.25 | 491.91 |
Average | 6.29 | 100.45 | 10.22 | 10.32 | 392.33 |
Analysis of Results
| Clock Period (ns) | Latency (clock cycle) | LUT | |||
Frequency (MHz) | Actual | Predicted | Actual | Predicted | Actual | Predicted |
100 | 7.74 | 7.44 | 20104 | 21659 | 2782 | 3244 |
125 | 6.64 | 7.41 | 23604 | 21677 | 2774 | 3244 |
150 | 6.64 | 6.29 | 28254 | 26916 | 2772 | 3244 |
175 | 4.55 | 4.49 | 34004 | 40351 | 2772 | 2605 |
200 | 4.55 | 4.60 | 43154 | 41332 | 2588 | 2735 |
225 | 4.55 | 4.60 | 43204 | 41332 | 2636 | 2735 |
300 | 3.00 | 3.11 | 55954 | 43381 | 2740 | 2749 |
500 | 3.00 | 3.45 | 111654 | 47566 | 4443 | 5095 |
Analysis of Results
| Clock Period | Latency | LUT | |||
Device | Validation | Test | Validation | Test | Validation | Test |
Zynq 7000 | 5.55 | 7.75 | 20.24 | 17.54 | 19.02 | 18.94 |
Virtex 7 | 4.11 | 6.50 | 18.36 | 17.10 | 12.36 | 17.89 |
Kintex 7 | 5.51 | 5.06 | 17.69 | 19.50 | 14.14 | 11.70 |
Summary and Conclusion
Contact