1 of 60

William Stallings �Computer Organization �and Architecture�7^th Edition

Chapter 12

CPU Structure and Function

2 of 60

CPU Structure

CPU must:

Fetch instructions: Reading an instruction from the main memory (or the cache).
Interpret instructions: Analyzing (decoding) the instruction to determine what/where the opcode/operands
Fetch data: Get the operands values (from register/memory)
Process data: Execute the instruction
Write data: Store the result (in register/memory)

3 of 60

CPU With Systems Bus

4 of 60

CPU Internal Structure

5 of 60

Registers

CPU must have some working space (temporary storage)
Called registers
Number and function vary between processor designs
One of the major design decisions
Top level of memory hierarchy

6 of 60

User Visible Registers

General Purpose (accumulator)
Data (MBR, IOBR, IR, IBR …)
Address (MAR, IOAR, PC)
Condition Codes

7 of 60

General Purpose Registers (1)

May be true general purpose
May be restricted
May be used for data or addressing
Data

Accumulator

Addressing

Segment

8 of 60

General Purpose Registers (2)

Make them general purpose

Increase flexibility and programmer options
Increase instruction size & complexity

Make them specialized

Smaller (faster) instructions
Less flexibility

Remember

You can write the instruction without operands , if the operands are known implicitly ( like ADD)
You have to write the operands explicitly if they are not know implicitly (like ADD A,B)
Always, writing the operands increases the instruction length.

9 of 60

How Many GP Registers?

Between 8 - 32
Fewer = more memory references
More does not reduce memory references and takes up processor real estate
See also RISC

10 of 60

How big?

Large enough to hold full address
Large enough to hold full word
Often possible to combine two data registers

C programming
double int a;
long int a;

11 of 60

Condition Code Registers (Flags)

Sets of individual bits

e.g. result of last operation was zero

Can be read (implicitly) by programs

e.g. Jump if zero

Can not (usually) be set by programs

12 of 60

Control & Status Registers

Program Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register

Revision: what do these all do?

13 of 60

Program Status Word

A set of bits
Includes Condition Codes
Sign of last result (SF) (if result is – SF =1 , otherwise SF =0)
Zero (ZF) (if result is 0, ZF =1, otherwise ZF =0)
Carry (CF) (if there is a carry, CF =1, otherwise CF =0)
Equal (EF) (if you compare equal numbers EF=1, otherwise EF=0)
Overflow (OF) (if there is overflow in the last operation OF=1, otherwise OF=0)
Interrupt enable/disable (if IF=0, interrupts are disabled)
Supervisor

🡺 These flags are available in low level programming languages

14 of 60

Supervisor Mode

Intel ring zero
Kernel mode
Allows privileged instructions to execute
Used by operating system
Not available to user programs

15 of 60

Other Registers

May have registers pointing to:

Process control blocks (see O/S)
Interrupt Vectors (see O/S)

N.B. CPU design and operating system design are closely linked

16 of 60

Example Register Organizations

17 of 60

Instruction Cycle

Revision
Stallings Chapter 3

18 of 60

Indirect Cycle

May require memory access to fetch operands
Indirect addressing requires more memory accesses
Can be thought of as additional instruction subcycle

19 of 60

Instruction Cycle with Indirect

20 of 60

Instruction Cycle State Diagram

21 of 60

Data Flow (Instruction Fetch)

Depends on CPU design
In general:

Fetch

PC contains address of next instruction
Address moved to MAR
Address placed on address bus
Control unit requests memory read
Result placed on data bus, copied to MBR, then to IR
Meanwhile PC incremented by 1

22 of 60

Data Flow (Data Fetch)

IR is examined
If indirect addressing, indirect cycle is performed

Right most N bits of MBR transferred to MAR
Control unit requests memory read
Result (address of operand) moved to MBR

23 of 60

Data Flow (Fetch Diagram)

24 of 60

Data Flow (Indirect Diagram)

25 of 60

Data Flow (Execute)

May take many forms
Depends on instruction being executed
May include

Memory read/write
Input/Output
Register transfers
ALU operations

26 of 60

Interrupt sequence

27 of 60

Data Flow (Interrupt)

Simple
Predictable
Current PC saved to allow resumption after interrupt
Contents of PC copied to MBR
Special memory location (e.g. stack pointer) loaded to MAR
MBR written to memory
PC loaded with address of interrupt handling routine
Next instruction (first of interrupt handler) can be fetched

28 of 60

Data Flow (Interrupt Diagram)

29 of 60

Prefetch

Fetch accessing main memory
Execution usually does not access main memory
Can fetch next instruction during execution of current instruction
Called instruction prefetch

30 of 60

Improved Performance

But not doubled:

Fetch usually shorter than execution

Prefetch more than one instruction?

Any jump or branch means that prefetched instructions are not the required instructions

Add more stages to improve performance

Prefetch works perfectly

INC R1

ADD (100)

SUB R3

DIV (200)

LOAD R6

XYZ: MOVE R1,-3

ADD (100)

ISZ R1

JMP XYZ

DIV (200)

LOAD R6

Transfer of control reduces prefetch benefits

31 of 60

Pipelining

Fetch instruction
Decode instruction
Calculate operands (i.e. EAs)
Fetch operands
Execute instructions
Write result

Overlap these operations

INC R1

ADD (100)

SUB R3

DIV (200)

Lab: INC R1

ADD (100)

JMP Lab

DIV (200)

Lab: MOV R1,50

BRE R1,R2,Lab

Lab2 SUB X,Y

BRZ LAB2

32 of 60

Two Stage Instruction Pipeline

33 of 60

Timing Diagram for �Instruction Pipeline Operation

34 of 60

The Effect of a Conditional Branch on Instruction Pipeline Operation

35 of 60

The Effect of an Unconditional Branch on the Instruction Pipeline Operation

36 of 60

Six Stage �Instruction Pipeline

37 of 60

Alternative Pipeline Depiction

38 of 60

Speedup Factors�with Instruction�Pipelining

39 of 60

Dealing with Branches

Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching

40 of 60

Multiple Streams

Have two pipelines
Prefetch each branch into a separate pipeline
Use appropriate pipeline

Leads to bus & register contention
Multiple branches lead to further pipelines being needed

41 of 60

Prefetch Branch Target

Target of branch is prefetched in addition to instructions following branch
Keep target until branch is executed
Used by IBM 360/91

42 of 60

Loop Buffer

Very fast memory
Maintained by fetch stage of pipeline
Check buffer before fetching from memory
Very good for small loops or jumps
c.f. cache
Used by CRAY-1

for( int i=0; i<10; i++)

{

}

43 of 60

Loop Buffer Diagram

44 of 60

Branch Prediction (1)

Predict never taken

Assume that jump will not happen
Always fetch next instruction
68020 & VAX 11/780
VAX will not prefetch after branch if a page fault would result (O/S v CPU design)

Predict always taken

Assume that jump will happen
Always fetch target instruction

45 of 60

Branch Prediction (2)

Predict by Opcode

Some instructions are more likely to result in a jump than thers
Can get up to 75% success

Taken/Not taken switch

Based on previous history
Good for loops

46 of 60

Branch Prediction (3)

Delayed Branch

Do not take jump until you have to
Rearrange instructions

47 of 60

Branch Prediction Flowchart

48 of 60

Branch Prediction State Diagram

49 of 60

Dealing With �Branches

50 of 60

Intel 80486 Pipelining

Fetch

From cache or external memory
Put in one of two 16-byte prefetch buffers
Fill buffer with new data as soon as old data consumed
Average 5 instructions fetched per load
Independent of other stages to keep buffers full

Decode stage 1

Opcode & address-mode info
At most first 3 bytes of instruction
Can direct D2 stage to get rest of instruction

Decode stage 2

Expand opcode into control signals
Computation of complex address modes

Execute

ALU operations, cache access, register update

Writeback

Update registers & flags
Results sent to cache & bus interface write buffers

51 of 60

80486 Instruction Pipeline Examples

52 of 60

Pentium 4 Registers

53 of 60

EFLAGS Register

54 of 60

Control Registers

55 of 60

MMX Register Mapping

MMX uses several 64 bit data types
Use 3 bit register address fields

8 registers

No MMX specific registers

Aliasing to lower 64 bits of existing floating point registers

56 of 60

Mapping of MMX Registers to �Floating-Point Registers

57 of 60

Pentium Interrupt Processing

Interrupts

Maskable
Nonmaskable

Exceptions

Processor detected
Programmed

Interrupt vector table

Each interrupt type assigned a number
Index to vector table
256 * 32 bit interrupt vectors

5 priority classes

58 of 60

PowerPC User Visible Registers

59 of 60

PowerPC Register Formats

60 of 60

Foreground Reading

Processor examples
Stallings Chapter 12
Manufacturer web sites & specs