1 of 15

CS 61C

Discussion 7: Pipelining

John Yang

Summer 2019

2 of 15

Announcements

Today’s Goal

Midterm 2 next Monday!
Check In: tinyurl.com/john61c
Homework 5 due Wednesday
Project 3 due July 31st

Learn how to improve the throughput of the single cycle datapath design with the help of pipelining.
Understand what kinds of obstacles and challenges arise in a pipelined models that may interfere with the rate of correctness of execution.

3 of 15

Pipelining

A nifty technique for increasing the throughput of the processor datapath

4 of 15

Pipelining Motivation

Currently, in a single cycle datapath setting, we wait for a single instruction to run till completion (all 5 stages) before processing the next instruction.

Think Utilization! Only ⅕ stages is active at any time => 20% Utilization isn’t great!

What if instead of waiting for an instruction to finish completely, we passed an instruction into the datapath after the previous instruction finishes the IF stage?

5 of 15

Pipelining

In a setting with a CPU processing RISC-V code...

6 of 15

Pipelining

To make this work, we require more hardware, specifically more registers! These registers store the intermediate values between each stage.

Instruction Fetch

Decode

Execute

Memory

Write

7 of 15

Hazards

The transition from single cycle to pipelined CPU designs introduces problems with regards to correctly executing instructions in order

8 of 15

Structural Hazards

Two instructions in different stages attempt to use the same hardware simultaneously

Hardware usually refers to Register File and Memory
i.e. “lw” in Writeback, “add” in Instruction Decode both use the Register File

Solutions

Stall by inserting NOPs (no operation), slower pipeline
Insert more hardware + architecture! (More hardware always resolves structural hazards)

Registers

Writes occur on Rising Edge of Clock
Reads occur on Falling Edge of Clock
A.k.a. “Double Pumping”

Memory

Split Memory into IMEM and DMEM (exactly what original pipeline did!)

Clock

Write Data

Read Data

9 of 15

Data Hazards

Instruction relies on register with a new value that hasn’t been written back to the register by the time of retrieval

Example

add x9, x8, x7

add x10, x9, x8

Inst 1 writes to x9 after Inst 2 reads (Bad!)

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	EX	MEM	WB

x9 value retrieved

x9 modified

10 of 15

Data Hazards

First Solution: More Stalling! Insert NOPs that delays the next instruction until the register is altered. The necessity of the delay is realized in the decode stage

Example

add x9, x8, x7

add x10, x9, x8

Delay causes it to be almost sequential!

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	ID	ID	ID	EX	MEM	WB

x9 value retrieved

x9 modified

11 of 15

Data Hazards

Second Solution: First Solution + Double Pumping (recall Structural Hazards). Read/Write accomplished in one clock cycle

Example

add x9, x8, x7

add x10, x9, x8

One fewer NOP isn’t much...

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	ID	ID	EX	MEM	WB

x9 value retrieved

x9 modified

12 of 15

Data Hazards

Third Solution (The Best!): Forward the result from the Execute stage to the next instruction, specifically right before the result is actually used.

Example

add x9, x8, x7

add x10, x9, x8

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	EX	MEM	WB

x9 value retrieved

x9 modified

Forward here!

x9 register actually used!

Implementing the hardware for this is quite complicated, not something we’ll discuss

13 of 15

Data Hazards

Edge Case! For the “lw” instruction, the value is only ready by the MEM, not EX stage, so there’s an unavoidable NOP for data hazards starting with “lw”

Example

lw x9, 0(x5)

add x10, x9, x8

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	ID	EX	WB

x9 value retrieved

x9 modified

Forward here!

x9 register actually used!

14 of 15

Control Hazards

For branch + jump instructions, we don’t know if the next instruction address is PC+4 or PC+Imm until the ALU stage.

Execute Stage: ALU calculates PC+Imm
Instruction Fetch requires address to retrieve next instruction

Example

loop: beq x9, x8, end

add x10, x9, x8

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB
Inst 2		IF	ID	EX	MEM	WB

Next Addr. Known

Is add or the instruction at end next?

15 of 15

Control Hazards

Solutions

More stalling! Again, not desirable because more NOPs = delay
Strategically order instructions to eliminate hazards! Code Optimization in Compiler
Branch Prediction: We guess / default to a condition

If Correct, keep executing! We’re good to go.
If Incorrect, we have to flush / reset the pipeline, then proceed

Example

loop: beq x9, x8, end

add x10, x9, x8

	1	2	3	4	5	6	7	8	9
Inst 1	IF	ID	EX	MEM	WB

Next Addr. Known