1 of 18

G-MAP: a Graph Neural Network approach

for Memory Access Prediction

G Abhiram

Guided by: Prof. Viktor Prasanna,

Pengmiao Zhang, PhD at Data Science Lab

P-Group

2 of 18

Outline

2

Motivation
Problem Definition
Background
G-MAP: Graph Neural Network for Memory Access Prediction
Approach
Metrics and Results
Future Directions

3 of 18

Introduction: Motivation

Background

Development of processors: processor speed

TPUs, accelerators, heterogenous architectures

Data intensive workloads

Graph analytics, machine learning algorithms, AI applications

Bottleneck shifting towards memory performance

3

Source:

Dubois, Michel, Murali Annavaram, and Per Stenström. Parallel computer organization and design. cambridge university press, 2012.

https://web.archive.org/web/20160315021718/https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf

Data prefetching

Predict future memory accesses
Issue a fetch in advance of actual reference
Hide memory latency
Improve instructions per cycle (IPC)

Approximate Memory Access Latency

Processor-DRAM Memory Gap

L1 cache

L2 cache

L3 cache

Processor-Memory Performance Gap

“Moore’s Law”

Graph Analytics

Neural network

Workloads

CPU

GPU

TPU

FPGA

Processors

Let’s start with the motivation of this work.

Recently, the development of processors is increasing the computing performance. On the other hand, the workloads are becoming more data intensive.

This trend is increasing the gap between processor and memory performance.

Therefore, memory latency is becoming an overwhelming bottleneck in computer performance.

Prefetching is a technique that predicts future cache misses and issue a fetch in advance of actual reference.

Considering the memory hierarchy, if the data can be fetched to lower-level cache, especially from MM to cache before it is needed, the latency will be hided.

As a result, the instructions per cycle can be improved

Prefetching can be implemented through software and hardware.

Software prefetching performs statically and hard to adapt to general applications. It is commonly used for regular patterns, like in loops.

Hardware prefetching, instead, uses run-time information and can adapt to different programs and patterns. A hardware prefetcher monitors the bus, detects patterns, and issues miss requests to the next level.

We will work on hardware prefetching, using the run-time memory access as input to predict future memory accesses.

Now we have known the basic concept of hardware prefetching, why do we want to apply machine learning to prefetching?

1-3 3:00

4 of 18

Problem Definition

Input: a sequence of N history block addresses

X_t = {x₁, x₂, x₃, ……..x_N}

Output: an unordered sequence of ‘k’ future deltas

(what are deltas?)

Problem to solve:

Map the sequence to a graph G(V,E)

Build a model that takes G(V,E) as an input, predicts deltas

1	0	1	0	1	0	1	0	1	1

Page Address

Block Index

5 of 18

Why Graph Neural Networks?

5

Existing methods fail to capture:

patterns within a page
complex patterns beyond chronological order

ML-based prefetchers

TransFetch: current state-of-the-art model
LSTMs: modeling it using temporal relations
TransforMAP: Transformer
ReSemble: RL-based framework
Voyager: LSTM-based architecture

Rule-based prefetching

Low accuracy and adaptibility

Why GNNs?

model complex relationships
deriving graph-level representations
handle large graph sizes
show remarkable performance for various downstream tasks

6 of 18

Graph Neural Networks

Graph neural network (GNN) has a stack of GNN layers. Each layer follows the Aggregate-update paradigm:
Aggregate: each vertex aggregates features from its neighbors
Update: each vertex feature vector is transformed by MLP

7 of 18

G-MAP: proposed pipeline

8 of 18

Step-1: Address Segmentation

1	0	1	0	1	0	1	0	1	1

Page Address

Block Index

1	0	1	0	1	0	1	0	1	1

1	2	5	3

Segmentation

Representation

Segmented Address

10-bit block address: 7-bit page, 3-bit index

3-bit index: the segmentation basis

3 segments, 3 bits each; 1 segment with 1 bit

The address: a vector of dimension 4, [1, 2, 5, 3]

Why do this?

✔ class explosion (millions of unique memory addresses)

✔ avoids tokenization (mapping a word to numerical data, requires extra data)

✔ saves storage space

9 of 18

Step 2: Mem2Graph

The most vital contribution:

ID	Page	Offset
1	A	5
2	B	3
3	C	1
4	B	6
5	A	2
6	B	2
7	C	2
8	C	5
9	C	7

3

1

2

4

5

7

6

Input

Segmented

Memory Access Sequence

Graph Mapping

Predict

Temporal Locality:

Linking successive memory accesses

Spatial Locality:

Linking a given access to the next access with the same page address

Attributes (or features) of each node:

The segmented address vector!

10 of 18

Step-3: GNN Based Prediction

GNN models used for prediction:

Graph Convolutional Networks (GCNs):

Gated GNNs (GG-NNs): gated recurrent units

Graph Attention Network (GATs)

Output: An aggregated vector then passed through a linear layer

GNN

Linear

Readout

Readout functions:

scatter_sum, scatter_max, scatter_mean

11 of 18

Stage 4: Delta Bitmaps

0
0
1
0
1
0
0
0

Delta Bitmap Output

ID	Page	Offset
1	A	5
2	B	3
3	C	1
4	B	6
5	A	2
6	B	2
7	C	2
8	C	5
9	C	7

Input

Predict

Deltas: 3, 5

12 of 18

Metrics and Results

Metrics used: F1-score, calculated using the precision and recall

We evaluated our approach on the SPEC06 and SPEC17 traces, on the bwaves, milc, lbm, astar, sphinx applications

Gated GNNs outperform the other models, with performance being the best using a SUM as readout!

13 of 18

Problem 2: Memory Access Prediction for Secondary Storage

How do we solve the problem of memory access prediction for a large storage system (databases, etc.) given a sequence of Logical Block Address accesses?

- Data prefetching for cache memory: a problem that has been solved using various learning algorithms

- Can the same theory be extended to ‘data prefetching for a secondary storage system’?

- How can we capture the complex block correlations in the storage system?

14 of 18

Ideas surveyed

Using LBA deltas - the difference between two successive LBA accesses to build directed graphs - still a large sequence!

Why use deltas? top-1000 LBAs in MS dataset cover only 2.8% of all accesses, while the top-1000 LBA deltas cover over 91% of all accesses.

Instead of top-1000 LBAs, using the deltas to build the graph

Reducing model size: the no. of classes of LBA deltas to predict capped at (K+1)

- for the Top-K most frequently occurring LBA deltas

- a dummy class for all other infrequent LBA deltas (no-prefetch)

15 of 18

Example:

16 of 18

Future Directions

Improving upon the current Mem2Graph, building a weighted graph representation which could possible beat current State-of-the-art

Prefetching for Secondary Storage Systems

Prefetching using a Hippocampus-Neocortical Model

Disaggregation problem for i) a heterogeneous memory and

ii) a heterogeneous system

17 of 18

Main References/Papers Surveyed

Prefetching:

Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

SHARP: Software Hint-Assisted Memory Access Prediction for Graph Analytics

TransforMAP: Transformer for Memory Access Prediction

SGDP: A Stream-Graph Neural Network Based Data Prefetcher

Prefetching Using Principles of Hippocampal-Neocortical Interaction

2. Graph Neural Network for Accurate and Low-complexity SAR ATR

18 of 18

Thanks for your listening!

Special thanks to my guides Prof. Prasanna, Pengmiao and, Prof. Raghavendra, Andy for making this an extremely enriching experience!