Learning Generalizable Program and Architecture Representations for Performance Modeling
SC 2024
Atlanta GA
Lingda Li, Thomas Flynn, Adolfy Hoisie
Brookhaven National Laboratory
Computer Architecture Modeling and Simulation
Methodology | Speed | Accuracy | Generalizability |
Analytical Modeling | Fast | Low | High |
High | Low | ||
Discrete Event Simulation | Slow | Variable | High |
Emulation | Medium | Variable | Medium |
Machine Learning (ML)-based Modeling | Fast | High | Low |
ML-based Simulation | Medium | Variable | Medium |
Goal: explore better tradeoff using ML, especially on generalizability
PerfVec (this work) | Fast? | High? | High? |
Generalizable Performance Modeling
3
PerfVec learns the separation automatically.
Learning Separation
4
Key idea 1: have two independent ML models to capture the performance impacts of program and microarchitecture, respectively
independent
of each other
Focus of this work
Learning Program Representation
5
Input | Level of Detail | Limitation |
Profiling info (e.g., performance counters) | Low |
|
Static program info (e.g., control flow graph) | Medium |
|
Instruction execution trace | High |
|
Key idea 2: 1) learn the representations of individual instructions and 2) compose a program representation from those of all its executed instructions (divide-and-conquer)
Learning Instruction Representation
6
Factor | Microarchitecture Independent Feature |
Own properties | Static properties (e.g., operation type); dynamic behavior (e.g., branch direction); reuse distance; branch entropy |
Co-running instructions | Co-running instructions’ properties |
A sequence model
(e.g., LSTM, Transformer)
Refer to the paper on how to compose program representation
Generalizable across programs
Program traces are different sequences of instructions from the same set.
Instruction
representation
model
Instructioni’s representation
Instructioni’s features
Instructioni-1’s features
…
Instructioni-c’s features
Instruction1’s representation
…
Instructionn’s representation
…
Program representation
PerfVec Training
7
Train representations of sampled microarchitectures jointly
Key idea 3: train the instruction representation model to predict instruction latencies on randomly selected microarchitectures for generalizability
Replace the program representation model
Data Acquisition & Model Architecture
8
Generalizability Evaluation
Prediction error range against gem5 simulation across all microarchitectures
Unseen microarchitectures
Unseen programs
The trained model generalizes well to unseen programs and microarchitectures.
PerfVec Use Cases
10
DSE Procedure
11
Froze during training
A 2-layer MLP
DSE Results
Overhead: the time to construct models (hours)
Quality: how close the selected design is to the optimal design.
Lower is better
Method | Overhead | Quality |
ASPLOS06 | 150 | 4.4% |
MICRO07 | 84 | 4.7% |
DAC16 | 170 | 3.6% |
PerfVec | 11 | 3.6% |
PerfVec Summary
13