Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3
Appendix C & H
Outline
2
Prof. Iyad Jafar
Reading Assignment
3
Prof. Iyad Jafar
Instruction-Level Parallelism�Concepts and Challenges
4
Prof. Iyad Jafar
ILP Concepts and Challenges
CPIPipeline = CPIIdeal + Stalls due to dependencies
Stalls due to dependencies = Structural + Data + Control
5
Prof. Iyad Jafar
Techniques for Improving ILP
6
Prof. Iyad Jafar
Data Dependences and Hazards
7
Prof. Iyad Jafar
Name Dependence
8
add x3, x2, x4
ld x2, 10(x5)
add x2, x3, x4
ld x2, 10(x5)
Prof. Iyad Jafar
Data Hazards
9
ld x2, 10(x5)
add x3, x2, x4
add x3, x2, x4
ld x2, 10(x5)
add x2, x3, x4
ld x2, 10(x5)
add x3, x2, x4
ld x5, 10(x2)
WAW and WAR don’t happen in normal pipeline! Out-of-Order execution
Prof. Iyad Jafar
Control Hazards
10
For: add x3, x2, x4
ld x2, 10(x5)
bne x2, x0, For
sub x1, x10, x18
Prof. Iyad Jafar
Basic Compiler Techniques for Exposing ILP
11
Prof. Iyad Jafar
Basic Compiler Techniques for ILP
12
Prof. Iyad Jafar
Scheduling
13
Let’s analyze the performance
Prof. Iyad Jafar
Scheduling
14
No scheduling
8 issue cycles per element
With scheduling
7 issue cycles per element
3 cycles are actually needed to process an element
2 cycles stalls
2 cycles loop overhead per element?
Reduce loop overhead by unrolling the loop
Move ?
Prof. Iyad Jafar
Loop Unrolling
15
Prof. Iyad Jafar
Loop Unrolling
16
Unrolled loop without scheduling
4 elements are processed per iteration
27/4 cycles per element
13 stall cycles!
12 cycles actually needed
2 cycles overhead/ 4 elements
Better performance!
Schedule the loop
Note: number of live registers vs. original loop
Register pressure!
Prof. Iyad Jafar
Loop Unrolling
17
4 elements are processed per iteration
14 cycles/4 elements
12 cycles actually needed
2 cycles overhead/ 4 elements
0 stall cycles
Limitations
Unrolled loop with scheduling
Prof. Iyad Jafar
Strip Mining
18
Prof. Iyad Jafar
Advanced Branch Prediction Techniques
19
Prof. Iyad Jafar
Advanced Branch Prediction
20
Prof. Iyad Jafar
2-bit Predictor
21
Prof. Iyad Jafar
Correlating predictors (two-level)
22
Prof. Iyad Jafar
Correlating predictors (two-level)
23
Shift register
Prof. Iyad Jafar
Correlating predictors (two-level)
24
Prof. Iyad Jafar
G-share Predictor
25
Prof. Iyad Jafar
Tournament Predictors
26
Prof. Iyad Jafar
Tournament Predictors
27
Prof. Iyad Jafar
Tagged Hybrid Predictors (Best as of 2017)
28
Prof. Iyad Jafar
Tagged Hybrid Predictor
29
Prof. Iyad Jafar
The Evolution of Intel Core i7 Branch Predictor
30
Prof. Iyad Jafar
The Evolution of Intel Core i7 Branch Predictor
31
Prof. Iyad Jafar