CS-773 Paper Presentation���Filter Caching for free: The untapped potential of the Store Buffer
�
Prajeeth.S�Team(#4) Spectredown
190050117@iitb.ac.in
1
Plan:
2
Store Queue(SQ):
3
Store Buffer(SB):
4
Image taken from here
Scope for Improvements in SQ/SB
5
Image taken from here
Scope for Improvements in SQ/SB
Optimal SQ/SB has a much higher hit ratio
6
Image taken from here
Scope for Improvements in SQ/SB
7
Filter Cache: Overview
8
SQ/SB as Filter Cache?
9
Image taken from here
Reducing L1/TLB accesses using SQ/SB
10
Store Buffer Cache(SBC)
11
SBC Synonyms
12
SBC Coherence: Naive Approaches
13
SBC Coherence: Optimization 1
14
SBC Coherence: Leading to Optimization 2
15
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
16
Black epoch write: Dirty Data from SB written to L1 with black dirty bit set, moved to SBC
Image taken from here
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
17
Black epoch - Invalidation/Downgrade of non-black data: No effect on SBC
Image taken from here
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
18
Black data Invalidation/Downgrade: Flush all entries in SBC that are black and change epoch to red
Image taken from here
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
19
Red epoch write: Dirty Data from SB written to L1 with red dirty bit set, moved to SBC
Image taken from here
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
20
Red epoch - Invalidation/Downgrade of Black data - no effect on SBC
Image taken from here
SBC Coherence: Optimization 2
SQ, SB, SBC/SBC (Assuming 2 colors)
21
Red data Invalidation / Downgrade - Flush out the red data from SBC, change epoch to black
Image taken from here
SBC Coherence: Optimization 2 Extensions
22
SBC Coherence: flash-rest v/s 3 colors
23
Image taken from here
SBC Coherence: flash-rest v/s 3 colors
24
7/15/flash reset almost optimal, 3 colors good enough
Image taken from here
Predicting hits: Memory Dependence Predictor
Modern systems can predict which level of cache data is in.
Can be used to reduce latency due to filter cache
25
Image taken from here
Results: With Predictor
26
Memory dependence predictor reduces hit ratio
Image taken from here
Results : Energy Savings
27
Dynamic Energy Savings
Image taken from here
Results - Energy Savings
28
Parallel Workloads
Image taken from here
Results - IPC
29
IPC Improvements
Image taken from here
Best and Worst cases
30
Read Locality(Y/N) | Predictor Accuracy(Y/N) | Energy | Performance |
Y | Y | Improvement | Improvement |
Y | N | Same | Same |
N | Y | Same | Same |
N | N | Same | May Reduce |
Points to Discuss:
31