CS 61C
Discussion 7: Pipelining
John Yang
Summer 2019
Announcements
Today’s Goal
Pipelining
A nifty technique for increasing the throughput of the processor datapath
Pipelining Motivation
Pipelining
In a setting with a CPU processing RISC-V code...
Pipelining
To make this work, we require more hardware, specifically more registers! These registers store the intermediate values between each stage.
Instruction Fetch
Decode
Execute
Memory
Write
Hazards
The transition from single cycle to pipelined CPU designs introduces problems with regards to correctly executing instructions in order
Structural Hazards
Two instructions in different stages attempt to use the same hardware simultaneously
Solutions
Registers
Memory
Clock
Write Data
Read Data
Data Hazards
Instruction relies on register with a new value that hasn’t been written back to the register by the time of retrieval
Example
add x9, x8, x7
add x10, x9, x8
Inst 1 writes to x9 after Inst 2 reads (Bad!)
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | EX | MEM | WB | | | |
x9 value retrieved
x9 modified
Data Hazards
First Solution: More Stalling! Insert NOPs that delays the next instruction until the register is altered. The necessity of the delay is realized in the decode stage
Example
add x9, x8, x7
add x10, x9, x8
Delay causes it to be almost sequential!
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | ID | ID | ID | EX | MEM | WB |
x9 value retrieved
x9 modified
Data Hazards
Second Solution: First Solution + Double Pumping (recall Structural Hazards). Read/Write accomplished in one clock cycle
Example
add x9, x8, x7
add x10, x9, x8
One fewer NOP isn’t much...
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | ID | ID | EX | MEM | WB | |
x9 value retrieved
x9 modified
Data Hazards
Third Solution (The Best!): Forward the result from the Execute stage to the next instruction, specifically right before the result is actually used.
Example
add x9, x8, x7
add x10, x9, x8
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | EX | MEM | WB | | | |
x9 value retrieved
x9 modified
Forward here!
x9 register actually used!
Implementing the hardware for this is quite complicated, not something we’ll discuss
Data Hazards
Edge Case! For the “lw” instruction, the value is only ready by the MEM, not EX stage, so there’s an unavoidable NOP for data hazards starting with “lw”
Example
lw x9, 0(x5)
add x10, x9, x8
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | ID | EX | WB | | | |
x9 value retrieved
x9 modified
Forward here!
x9 register actually used!
Control Hazards
For branch + jump instructions, we don’t know if the next instruction address is PC+4 or PC+Imm until the ALU stage.
Example
loop: beq x9, x8, end
add x10, x9, x8
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Inst 2 | | IF | ID | EX | MEM | WB | | | |
Next Addr. Known
Is add or the instruction at end next?
Control Hazards
Solutions
Example
loop: beq x9, x8, end
add x10, x9, x8
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Inst 1 | IF | ID | EX | MEM | WB | | | | |
Next Addr. Known