1 of 60

William Stallings �Computer Organization �and Architecture�7th Edition

Chapter 12

CPU Structure and Function

2 of 60

CPU Structure

  • CPU must:
    • Fetch instructions: Reading an instruction from the main memory (or the cache).
    • Interpret instructions: Analyzing (decoding) the instruction to determine what/where the opcode/operands
    • Fetch data: Get the operands values (from register/memory)
    • Process data: Execute the instruction
    • Write data: Store the result (in register/memory)

3 of 60

CPU With Systems Bus

4 of 60

CPU Internal Structure

5 of 60

Registers

  • CPU must have some working space (temporary storage)
  • Called registers
  • Number and function vary between processor designs
  • One of the major design decisions
  • Top level of memory hierarchy

6 of 60

User Visible Registers

  • General Purpose (accumulator)
  • Data (MBR, IOBR, IR, IBR …)
  • Address (MAR, IOAR, PC)
  • Condition Codes

7 of 60

General Purpose Registers (1)

  • May be true general purpose
  • May be restricted
  • May be used for data or addressing
  • Data
    • Accumulator
  • Addressing
    • Segment

8 of 60

General Purpose Registers (2)

  • Make them general purpose
    • Increase flexibility and programmer options
    • Increase instruction size & complexity
  • Make them specialized
    • Smaller (faster) instructions
    • Less flexibility
  • Remember
    • You can write the instruction without operands , if the operands are known implicitly ( like ADD)
    • You have to write the operands explicitly if they are not know implicitly (like ADD A,B)
    • Always, writing the operands increases the instruction length.

9 of 60

How Many GP Registers?

  • Between 8 - 32
  • Fewer = more memory references
  • More does not reduce memory references and takes up processor real estate
  • See also RISC

10 of 60

How big?

  • Large enough to hold full address
  • Large enough to hold full word
  • Often possible to combine two data registers
    • C programming
    • double int a;
    • long int a;

11 of 60

Condition Code Registers (Flags)

  • Sets of individual bits
    • e.g. result of last operation was zero
  • Can be read (implicitly) by programs
    • e.g. Jump if zero
  • Can not (usually) be set by programs

12 of 60

Control & Status Registers

  • Program Counter
  • Instruction Decoding Register
  • Memory Address Register
  • Memory Buffer Register

  • Revision: what do these all do?

13 of 60

Program Status Word

  • A set of bits
  • Includes Condition Codes
  • Sign of last result (SF) (if result is – SF =1 , otherwise SF =0)
  • Zero (ZF) (if result is 0, ZF =1, otherwise ZF =0)
  • Carry (CF) (if there is a carry, CF =1, otherwise CF =0)
  • Equal (EF) (if you compare equal numbers EF=1, otherwise EF=0)
  • Overflow (OF) (if there is overflow in the last operation OF=1, otherwise OF=0)
  • Interrupt enable/disable (if IF=0, interrupts are disabled)
  • Supervisor

🡺 These flags are available in low level programming languages

14 of 60

Supervisor Mode

  • Intel ring zero
  • Kernel mode
  • Allows privileged instructions to execute
  • Used by operating system
  • Not available to user programs

15 of 60

Other Registers

  • May have registers pointing to:
    • Process control blocks (see O/S)
    • Interrupt Vectors (see O/S)

  • N.B. CPU design and operating system design are closely linked

16 of 60

Example Register Organizations

17 of 60

Instruction Cycle

  • Revision
  • Stallings Chapter 3

18 of 60

Indirect Cycle

  • May require memory access to fetch operands
  • Indirect addressing requires more memory accesses
  • Can be thought of as additional instruction subcycle

19 of 60

Instruction Cycle with Indirect

20 of 60

Instruction Cycle State Diagram

21 of 60

Data Flow (Instruction Fetch)

  • Depends on CPU design
  • In general:

  • Fetch
    • PC contains address of next instruction
    • Address moved to MAR
    • Address placed on address bus
    • Control unit requests memory read
    • Result placed on data bus, copied to MBR, then to IR
    • Meanwhile PC incremented by 1

22 of 60

Data Flow (Data Fetch)

  • IR is examined
  • If indirect addressing, indirect cycle is performed
    • Right most N bits of MBR transferred to MAR
    • Control unit requests memory read
    • Result (address of operand) moved to MBR

23 of 60

Data Flow (Fetch Diagram)

24 of 60

Data Flow (Indirect Diagram)

25 of 60

Data Flow (Execute)

  • May take many forms
  • Depends on instruction being executed
  • May include
    • Memory read/write
    • Input/Output
    • Register transfers
    • ALU operations

26 of 60

Interrupt sequence

27 of 60

Data Flow (Interrupt)

  • Simple
  • Predictable
  • Current PC saved to allow resumption after interrupt
  • Contents of PC copied to MBR
  • Special memory location (e.g. stack pointer) loaded to MAR
  • MBR written to memory
  • PC loaded with address of interrupt handling routine
  • Next instruction (first of interrupt handler) can be fetched

28 of 60

Data Flow (Interrupt Diagram)

29 of 60

Prefetch

  • Fetch accessing main memory
  • Execution usually does not access main memory
  • Can fetch next instruction during execution of current instruction
  • Called instruction prefetch

30 of 60

Improved Performance

  • But not doubled:
    • Fetch usually shorter than execution
      • Prefetch more than one instruction?
    • Any jump or branch means that prefetched instructions are not the required instructions
  • Add more stages to improve performance
  • Prefetch works perfectly

INC R1

ADD (100)

SUB R3

DIV (200)

LOAD R6

XYZ: MOVE R1,-3

ADD (100)

ISZ R1

JMP XYZ

DIV (200)

LOAD R6

  • Transfer of control reduces prefetch benefits

31 of 60

Pipelining

  • Fetch instruction
  • Decode instruction
  • Calculate operands (i.e. EAs)
  • Fetch operands
  • Execute instructions
  • Write result

  • Overlap these operations

INC R1

ADD (100)

SUB R3

DIV (200)

Lab: INC R1

ADD (100)

JMP Lab

DIV (200)

Lab: MOV R1,50

BRE R1,R2,Lab

Lab2 SUB X,Y

BRZ LAB2

32 of 60

Two Stage Instruction Pipeline

33 of 60

Timing Diagram for �Instruction Pipeline Operation

34 of 60

The Effect of a Conditional Branch on Instruction Pipeline Operation

35 of 60

The Effect of an Unconditional Branch on the Instruction Pipeline Operation

36 of 60

Six Stage �Instruction Pipeline

37 of 60

Alternative Pipeline Depiction

38 of 60

Speedup Factors�with Instruction�Pipelining

39 of 60

Dealing with Branches

  • Multiple Streams
  • Prefetch Branch Target
  • Loop buffer
  • Branch prediction
  • Delayed branching

40 of 60

Multiple Streams

  • Have two pipelines
  • Prefetch each branch into a separate pipeline
  • Use appropriate pipeline

  • Leads to bus & register contention
  • Multiple branches lead to further pipelines being needed

41 of 60

Prefetch Branch Target

  • Target of branch is prefetched in addition to instructions following branch
  • Keep target until branch is executed
  • Used by IBM 360/91

42 of 60

Loop Buffer

  • Very fast memory
  • Maintained by fetch stage of pipeline
  • Check buffer before fetching from memory
  • Very good for small loops or jumps
  • c.f. cache
  • Used by CRAY-1

for( int i=0; i<10; i++)

{

..

..

..

..

..

}

43 of 60

Loop Buffer Diagram

44 of 60

Branch Prediction (1)

  • Predict never taken
    • Assume that jump will not happen
    • Always fetch next instruction
    • 68020 & VAX 11/780
    • VAX will not prefetch after branch if a page fault would result (O/S v CPU design)
  • Predict always taken
    • Assume that jump will happen
    • Always fetch target instruction

45 of 60

Branch Prediction (2)

  • Predict by Opcode
    • Some instructions are more likely to result in a jump than thers
    • Can get up to 75% success
  • Taken/Not taken switch
    • Based on previous history
    • Good for loops

46 of 60

Branch Prediction (3)

  • Delayed Branch
    • Do not take jump until you have to
    • Rearrange instructions

47 of 60

Branch Prediction Flowchart

48 of 60

Branch Prediction State Diagram

49 of 60

Dealing With �Branches

50 of 60

Intel 80486 Pipelining

  • Fetch
    • From cache or external memory
    • Put in one of two 16-byte prefetch buffers
    • Fill buffer with new data as soon as old data consumed
    • Average 5 instructions fetched per load
    • Independent of other stages to keep buffers full
  • Decode stage 1
    • Opcode & address-mode info
    • At most first 3 bytes of instruction
    • Can direct D2 stage to get rest of instruction
  • Decode stage 2
    • Expand opcode into control signals
    • Computation of complex address modes
  • Execute
    • ALU operations, cache access, register update
  • Writeback
    • Update registers & flags
    • Results sent to cache & bus interface write buffers

51 of 60

80486 Instruction Pipeline Examples

52 of 60

Pentium 4 Registers

53 of 60

EFLAGS Register

54 of 60

Control Registers

55 of 60

MMX Register Mapping

  • MMX uses several 64 bit data types
  • Use 3 bit register address fields
    • 8 registers
  • No MMX specific registers
    • Aliasing to lower 64 bits of existing floating point registers

56 of 60

Mapping of MMX Registers to �Floating-Point Registers

57 of 60

Pentium Interrupt Processing

  • Interrupts
    • Maskable
    • Nonmaskable
  • Exceptions
    • Processor detected
    • Programmed
  • Interrupt vector table
    • Each interrupt type assigned a number
    • Index to vector table
    • 256 * 32 bit interrupt vectors
  • 5 priority classes

58 of 60

PowerPC User Visible Registers

59 of 60

PowerPC Register Formats

60 of 60

Foreground Reading

  • Processor examples
  • Stallings Chapter 12
  • Manufacturer web sites & specs