G-MAP: a Graph Neural Network approach
for Memory Access Prediction
G Abhiram
Guided by: Prof. Viktor Prasanna,
Pengmiao Zhang, PhD at Data Science Lab
P-Group
Outline
2
Introduction: Motivation
Background
3
Source:
Dubois, Michel, Murali Annavaram, and Per Stenström. Parallel computer organization and design. cambridge university press, 2012.
Data prefetching
Approximate Memory Access Latency
Processor-DRAM Memory Gap
L1 cache
L2 cache
L3 cache
Processor-Memory Performance Gap
“Moore’s Law”
Graph Analytics
Neural network
Workloads
CPU
GPU
TPU
FPGA
Processors
Problem Definition
Input: a sequence of N history block addresses
Xt = {x1, x2, x3, ……..xN}
Output: an unordered sequence of ‘k’ future deltas
(what are deltas?)
Problem to solve:
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
Page Address
Block Index
Why Graph Neural Networks?
5
Existing methods fail to capture:
ML-based prefetchers
Rule-based prefetching
Low accuracy and adaptibility
Why GNNs?
Graph Neural Networks
G-MAP: proposed pipeline
Step-1: Address Segmentation
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
Page Address
Block Index
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
1 | 2 | 5 | 3 |
Segmentation
Representation
Segmented Address
Why do this?
✔ class explosion (millions of unique memory addresses)
✔ avoids tokenization (mapping a word to numerical data, requires extra data)
✔ saves storage space
Step 2: Mem2Graph
The most vital contribution:
ID | Page | Offset |
1 | A | 5 |
2 | B | 3 |
3 | C | 1 |
4 | B | 6 |
5 | A | 2 |
6 | B | 2 |
7 | C | 2 |
8 | C | 5 |
9 | C | 7 |
3
1
2
4
5
7
6
Input
Segmented
Memory Access Sequence
Graph Mapping
Predict
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Temporal Locality:
Linking successive memory accesses
Spatial Locality:
Linking a given access to the next access with the same page address
Attributes (or features) of each node:
The segmented address vector!
Step-3: GNN Based Prediction
GNN models used for prediction:
Output: An aggregated vector then passed through a linear layer
GNN
Linear
Readout
Readout functions:
scatter_sum, scatter_max, scatter_mean
Stage 4: Delta Bitmaps
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
Delta Bitmap Output
ID | Page | Offset |
1 | A | 5 |
2 | B | 3 |
3 | C | 1 |
4 | B | 6 |
5 | A | 2 |
6 | B | 2 |
7 | C | 2 |
8 | C | 5 |
9 | C | 7 |
Input
Predict
Deltas: 3, 5
Metrics and Results
Metrics used: F1-score, calculated using the precision and recall
We evaluated our approach on the SPEC06 and SPEC17 traces, on the bwaves, milc, lbm, astar, sphinx applications
Gated GNNs outperform the other models, with performance being the best using a SUM as readout!
Problem 2: Memory Access Prediction for Secondary Storage
- Data prefetching for cache memory: a problem that has been solved using various learning algorithms
- Can the same theory be extended to ‘data prefetching for a secondary storage system’?
- How can we capture the complex block correlations in the storage system?
Ideas surveyed
- for the Top-K most frequently occurring LBA deltas
- a dummy class for all other infrequent LBA deltas (no-prefetch)
Example:
Future Directions
ii) a heterogeneous system
Main References/Papers Surveyed
Prefetching:
Thanks for your listening!
Special thanks to my guides Prof. Prasanna, Pengmiao and, Prof. Raghavendra, Andy for making this an extremely enriching experience!