Instruction Level Parallelism and its Exploitation (Part 2)
Chapter 3
Appendix H
1
Outline
2
Prof. Iyad Jafar
Dynamic Scheduling
3
Prof. Iyad Jafar
Introduction
4
Stalled for no reason!
fdiv.d f0,f2,f4
fadd.d f10,f0,f8
fsub.d f12,f8,f14
Prof. Iyad Jafar
Dynamic Scheduling
5
Prof. Iyad Jafar
Idea and Challenges
6
fdiv.d f0,f2,f4 fmul.d f6,f0,f8 fadd.d f0,f10,f14
fdiv.d f0,f2,f4 fadd.d f6,f0,f8
fsd f6,0(x1) fsub.d f8,f10,f14 fmul.d f6,f10,f8
??
??
Prof. Iyad Jafar
Idea and Challenges
7
fdiv.d f0,f2,f4
fadd.d S,f0,f8
fsd S,0(x1)
fsub.d T,f10,f14
fmul.d f6,f10,T
fdiv.d f0,f2,f4 fadd.d f6,f0,f8
fsd f6,0(x1) fsub.d f8,f10,f14 fmul.d f6,f10,f8
Prof. Iyad Jafar
Tomasulo algorithm
8
Prof. Iyad Jafar
Tomasulo algorithm
9
Prof. Iyad Jafar
Tomasulo algorithm
10
Prof. Iyad Jafar
Dynamic Scheduling - Tomasulo
11
Prof. Iyad Jafar
Tomasulo Basic Steps
12
Prof. Iyad Jafar
Dynamic Scheduling - Tomasulo
13
Prof. Iyad Jafar
Dynamic Scheduling - Tomasulo
14
In all cases, the assumption is that no instruction is allowed to initiate execution, until all branches that precede the instruction in program order have completed!
Prof. Iyad Jafar
Dynamic Scheduling - Tomasulo
15
Prof. Iyad Jafar
Notes
16
Prof. Iyad Jafar
Fields in Reservation Stations
17
Op | Qj | Qk | Vj | Vk | A | Busy |
Prof. Iyad Jafar
Fields in Register File
18
Qi | Register Content |
| … |
| … |
Prof. Iyad Jafar
Tomasulo Algorithm
19
Prof. Iyad Jafar
Tomasulo Algorithm
20
Prof. Iyad Jafar
Example 1
21
Operation | Execute Latency |
LOAD | 2 |
FP ADD | 2 |
MULTIPLY | 6 |
DIVIDE | 12 |
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Dynamic Scheduling - Tomasulo
22
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | … | 24 | 25 |
I | A | M | W | | | | | | | | | | | | | | |
| I | A | M | W | | | | | | | | | | | | | |
| | I | | | | | | | | | | | | | | | |
| | | I | | | | | | | | | | | | | | |
| | | | I | | | | | | | | | | | | | |
| | | | | I | | | | | | | | | | | | |
Compare to pipelined implementation with no dynamic scheduling!
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Example 1
23
Prof. Iyad Jafar
Example 1
24
Prof. Iyad Jafar
Example 2
25
Operation | Execute Latency |
LOAD | 1 |
FP ADD | 2 |
MULTIPLY | 6 |
DIVIDE | 12 |
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Example 2
26
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | … | 24 | 25 | 26 |
I | A | M | W | | | | | | | | | | | | | | |
| I | A | M | W | | | | | | | | | | | | | |
| | I | S | S | E | E | E | E | E | E | W | | | | | | |
| | | I | S | E | E | W | | | | | | | | | | |
| | | | I | S | S | S | S | S | S | S | E | … | E | E | W | |
| | | | | I | S | S | E | E | W | | | | | | | |
Compare to pipelined implementation with no dynamic scheduling!
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Example 2
27
Prof. Iyad Jafar
Example 3
28
Prof. Iyad Jafar
Hardware-Based Speculation
29
Prof. Iyad Jafar
Introduction
30
Prof. Iyad Jafar
Extending Tomasulo
31
Prof. Iyad Jafar
Reorder Buffer
32
Prof. Iyad Jafar
Reorder Buffer
33
Prof. Iyad Jafar
HW-Based Speculation
34
Prof. Iyad Jafar
Execution Steps
35
Prof. Iyad Jafar
Execution Steps
36
Prof. Iyad Jafar
Example 4
37
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Example 4
38
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | … | 24 | 25 | 26 | 27 | |
F | I | A | M | W | C | | | | | | | | | | | | | | | |
| F | I | A | M | W | C | | | | | | | | | | | | | | |
| | F | I | S | S | E | E | E | E | E | E | W | C | | | | | | | |
| | | F | I | S | E | E | W | - | - | - | - | - | C | | | | | | |
| | | | F | I | S | S | S | S | S | S | S | E | E | … | E | E | W | C | |
| | | | | F | I | S | S | E | E | W | - | - | - | - | - | - | - | - | C |
Compare to pipelined implementation with no dynamic scheduling!
fld f6,32(x2)
fld f2,44(x3)
fmul.d f0,f2,f4
fsub.d f8,f2,f6
fdiv.d f10,f0,f6
fadd.d f6,f8,f2
Prof. Iyad Jafar
Example 4
39
head
x
x
Prof. Iyad Jafar
HW-Based Speculation
40
Prof. Iyad Jafar