1 of 59

1/56

Computer Architecture

Chapter 8: Pipeline

2 of 59

Outline

2/56

Introduction

  • Defining Pipelining
  • Pipelining Instructions

Hazards

  • Structural hazards
  • Data Hazards
  • Control Hazards

3 of 59

A way of speeding up execution of instructions

Computer architecture relies on pipelines to process multiple instructions faster.

Key idea:

Overlap execution of multiple instructions”

3/56

What is Pipelining?

4 of 59

4/56

The Laundry Analogy

5 of 59

5/56

The Laundry Analogy

If we do laundry sequentially...

6 of 59

6/56

The Laundry Analogy

To Pipeline, overlap tasks are used;

7 of 59

7/56

What is Pipelining? => Overlap Tasks

  • Pipeline is the concept of processing multiple tasks simultaneously without waiting for them to complete (Multiple tasks operating simultaneously).
  • Pipelines do not reduce the latency of a single instruction, but increase the throughput of the entire process.
  • Pipeline rate is limited by the slowest sub step.
  • The speedup depends on the number of steps in each pipeline or instruction.
  • If the length of each pipe is not the same, it will affect the throughput (Time to “fill” the pipeline and time to “drain” it reduces speedup).

8 of 59

8/56

Pipelining a Digital System

Key idea: break big computation up into pieces

Separate each piece with a pipeline register

Non-pipelined:

1 operation finishes

every 1 ns

Pipelined:

1 operation finishes

every 200 ps

9 of 59

9/56

Limitations of Pipeline

Pipelining increases throughput, but not latency

  • Operating in every 200 ps,
  • BUT A single computation still takes 1 ns

Computations must be divisible into stage size Pipeline registers add overhead

10 of 59

10/56

Process in Pipeline

Recall the 5 steps in instruction execution:

1) Instruction fetch (IF) - Receiving new instructions from main memory or Cache.

2) Instruction decode and register read (ID) - Decoding the instructions with Register File to let the ALU know what to calculate next.

3) Execution operation or calculate address (EX) - Calculating according to the instructions of Opcode and Operands by the ALU.

4) Memory access (MEM) - Accessing memory for reading or writing data.

5) Write result into register (WB) - Writing the result of the processing

back to the register file

In the case of Single-Cycle Processor, all five steps are executed in a Single Clock Cycle, with hardware performing each step.

11 of 59

11/56

Hardware for Single-Cycle Processor

12 of 59

12/56

Basic Pipeline for MIPS

13 of 59

13/56

Basic Pipelined Processor

14 of 59

14/56

Stage 1: Instruction Fetch (IF)

15 of 59

15/56

Stage 1: Instruction Fetch (IF)

Fetch an instruction from memory every cycle

  • Use PC to index memory
  • Increment PC (assume no branches for now)

Write state to the pipeline register (IF/ID)

– The next stage will read this pipeline register

16 of 59

16/56

Stage 2: Instruction Decode (ID)

17 of 59

17/56

Stage 2: Instruction Decode (ID)

Decodes opcode bits

– Set up Control signals for later stages

Read input operands from register file

– Specified by decoded instruction bits

Write state to the pipeline register (ID/EX)

  • Opcode
  • Register contents
  • PC+1 (even though decode didn’t use it)
  • Control signals for opcode and destination register

18 of 59

18/56

Stage 3: Execution (EX)

19 of 59

Perform ALU operations

  • Calculate result of instruction
    1. Control signals select operation
    2. Contents of regA used as one input
    3. Either regB or constant offset used as second input
  • Calculate PC-relative branch target; PC+1+(constant offset)

19/56

Stage 3: Execution (EX)

Write state to the pipeline register (EX/Mem)

  • ALU result, contents of regB, and PC+1+offset
  • Control signals for opcode and destination register

20 of 59

20/56

Stage 4: Memory (MEM)

21 of 59

21/56

Stage 4: Memory (MEM)

Perform data cache access

  • ALU result contains address for LD (LOAD) or ST (STORE)
  • Opcode bits control R/W and enable signals

Write state to the pipeline register (Mem/WB)

  • ALU result and Loaded data
  • Control signals for opcode and destination register

22 of 59

22/56

Stage 5: Write-back (WB)

23 of 59

23/56

Stage 5: Write-back (WB)

Writing result to register file (if required)

  • Write Loaded data to destination register for LD
  • Write ALU result to destination register for arithmetic instruction
  • Opcode bits control register write enable signal

24 of 59

24/56

Single-Cycle vs Pipelining

25 of 59

25/56

Single-Cycle vs Pipelining

From the following instruction processing figure, if the CPU processes the instruction 3 times, find the increased processing speed (Speedup) comparing the Non-pipelined VS Pipelined operation. �Note: Pipeline operation has a register delay per instruction of 0.3 ns.

26 of 59

26/56

Pipeline Hazards

Instructions interfere with each other => hazards

- Example 1: different instructions may need the same piece of hardware (e.g., memory) in same clock cycle

27 of 59

27/56

Instructions interfere with each other => hazards

- Example 2: instruction may require a result produced by an earlier instruction that is not yet complete

Pipeline Hazards

28 of 59

28/56

Pipeline Hazards

Where one instruction cannot immediately follow another Types of hazards

  1. Structural hazards: two different instructions use same hardware in same cycle
  2. Data hazards: Instruction depends on result of prior instruction still in the pipeline
  3. Control hazards: Pipelining of branches & other instructions that change the PC

Can always resolve hazards by waiting

29 of 59

Structural Hazards

29/56

Attempt to use the same resource by two or more instructions at the same time

Example: Single Memory for instructions and data

  • Accessed by IF stage
  • Accessed at same time by MEM stage

Solutions

  • Delay the second access by one clock cycle, OR
  • Provide separate memories for instructions & data

=> Real pipelined processors have separate caches

30 of 59

Pipelined Example

30/56

Executing Multiple Instructions

Consider the following instruction sequence: lw $r0, 10($r1)

sw $r3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10

31 of 59

31/56

Executing Multiple Instructions

Clock Cycle 1

32 of 59

32/56

Executing Multiple Instructions

Clock Cycle 2

33 of 59

33/56

Executing Multiple Instructions

Clock Cycle 3

34 of 59

34/56

Executing Multiple Instructions

Clock Cycle 4

35 of 59

35/56

Executing Multiple Instructions

Clock Cycle 5

36 of 59

36/56

Executing Multiple Instructions

Clock Cycle 6

37 of 59

37/56

Executing Multiple Instructions

Clock Cycle 7

38 of 59

38/56

Executing Multiple Instructions

Clock Cycle 8

39 of 59

Multicycle Diagram

39/56

40 of 59

Multicycle Diagram

40/56

41 of 59

41/56

Stall on Structural Hazards

42 of 59

42/56

Solutions for Structural Hazards

Stall

  • low cost, simple
  • use for rare case since stalling has performance effect

Pipeline hardware resource (Separate Resource)

  • useful for multi-cycle resources
  • good performance
  • complex

Replicate resource

  • good performance
  • increases cost (+ maybe interconnect delay)
  • useful for cheap or divisible resources

43 of 59

43/56

Solutions for Structural Hazards

44 of 59

Data Hazards

44/56

Data hazards occur when data is used before it is ready

Example: The use of the result of the SUB instruction in the next three instructions causes a data hazard, since the register $2 is not written until after those instructions read it.

45 of 59

Data Hazards

45/56

Read After Write (RAW)

Write After Read (WAR)

Write After Write (WAW)

46 of 59

Data Hazards

46/56

Read After Write (RAW)

Write After Read (WAR)

Write After Write (WAW)

Can’t happen in MIPS 5 stages pipeline because all instructions take 5 stages

47 of 59

47/56

Solutions for Data Hazards

Stalling Forwarding

– connect new value directly to next stage

Reordering

48 of 59

48/56

Solutions for Data Hazards

49 of 59

49/56

Stall on Data Hazards

50 of 59

50/56

Data Hazards - Forwarding

Key idea: connect new value directly to next stage Still read s0, but ignore in favor of new result

51 of 59

51/56

Data Hazards – Stall & Forwarding

STALL still required for load - data available after MEM MIPS architecture calls delayed load.

52 of 59

52/56

Control Hazards

A control hazard is when we need to find the destination of a branch, and can’t fetch any new instructions until we know that destination.

A branch is either

  • Branch Taken: The program jumps to a new address (target address).

  • Branch Not Taken: The program continues sequentially (next instruction in memory).

53 of 59

53/56

Control Hazards

***When branch occurs, flush three instructions

54 of 59

54/56

Solutions for Control Hazard

Stall

- stop loading instructions until result is available

Predict

  • assume an outcome and continue fetching (undo if prediction is wrong)
  • lose cycles only on mis-prediction

Delayed branch

- specify in architecture that the instruction immediately following branch is always executed

55 of 59

55/56

Solutions for Control Hazard

56 of 59

56/56

Control Hazards - Stall

57 of 59

57/59

Control Hazard - Prediction

If prediction is correct

58 of 59

58/59

Control Hazard - Prediction

If prediction is wrong => lose cycles

59 of 59

59/59

Control Hazard - Delayed Branches

Delayed branches – code rearranged by compiler to place independent instruction after every branch (in delay slot).