1 of 61

CS232 Computer Organization

Week 11.

Computer Arch & Machine Language

Prepared by: Dr. Jun Yuan

Portions credit:

Prof. Nisan & Schocken (www.nand2tetris.org)

Prof. Krste Asanovic

Prof. David Lu

1

2 of 61

Agenda

  • Machine languages
  • Basic elements
  • The Hack computer from nand game
  • Build the CPU controller

2

3 of 61

Agenda

  • Machine languages
  • Basic elements
  • The Hack computer from nand game
  • Build the CPU controller

3

4 of 61

Review: Stored program concept

ALU

Computer System

Memory

CPU

registers

output

intput

5 of 61

Stored program concept

ALU

Computer System

program

Memory

CPU

registers

output

intput

data

6 of 61

Stored program concept

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

7 of 61

Machine language

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

8 of 61

Machine language

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

current instruction

9 of 61

Machine language

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

Handling instructions:

current instruction

10 of 61

Machine language

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

Handling instructions:

  • 1011 means “addition”

operation

current instruction

11 of 61

Machine language

ALU

Computer System

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

output

intput

0101110011100110

1011000101010100

1110001011111100

...

Handling instructions:

  • 1011 means “addition”
  • 000101010100 means “operate on memory address 340”

operation

addressing

current instruction

12 of 61

Machine language

ALU

Computer System

0101110011100110

1011000101010100

1110001011111100

...

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

Handling instructions:

  • 1011 means “addition”
  • 000101010100 means “operate on memory address 340”
  • Next we have to execute the instruction at address 2

output

intput

operation

addressing

control

current instruction

13 of 61

Compilation

0101111100111100

1010101010101010

1101011010101010

1001101010010101

1101010010101010

1110010100100100

0011001010010101

1100100111000100

1100011001100101

0010111001010101

...

machine language

load and�execute

high-level program

compile

while (n < 100) {

sum += arr[i];

n++

}

14 of 61

Mnemonics

Interpretation 1:

  • The symbolic form add R3 R2 doesn’t really exist
  • It is just a convenient mnemonic that can be used to present machine language instructions to humans

Instruction:

1011000011000010

add

R3

R2

sample instruction

15 of 61

Mnemonics

Interpretation 2:

  • Allow humans to write symbolic machine language instructions,�using assembly language
  • Use an assembler program to translate the symbolic code into binary form.

Instruction:

1011000011000010

add

R3

R2

sample instruction

16 of 61

Symbols

The assembler will resolve the symbol index into a specific address.

add 1, Mem[129]

Assembly:

add 1, index

Instruction:

1011000110000001

add

1

Mem[129]

Friendlier syntax:�we assume that index stands for Mem[129]

17 of 61

Agenda

  • Machine languages
  • Basic elements
  • The Hack computer from nand game
  • Build the CPU controller

17

18 of 61

Machine language

  • Specification of the hardware/software interface:
    • What supported operations?
    • What do they operate on?
    • How is the program controlled?
  • Usually in close correspondence to the hardware architecture
    • But not necessarily so

19 of 61

Machine operations

  • Usually correspond to the operations that the hardware is designed to support:
    • Arithmetic operations: add, subtract, …
    • Logical operations: and, or, …
    • Flow control: “goto instruction n

“if (condition) then goto instruction n

  • Differences between machine languages:
    • Instruction set richness (division? bulk copy? …)
    • Data types (word width, floating point…).

20 of 61

Addressing

ALU

Computer System

0101110011100110

1011000101010100

1110001011111100

...

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

current instruction

output

intput

How does the language allow us to specify on which data the instruction should operate?

21 of 61

Memory hierarchy

  • Accessing a memory location is expensive
    • Need to supply a long address
    • Getting the memory contents into the CPU takes time

22 of 61

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns

Branch mispredict ............................ 5 ns

L2 cache reference ........................... 7 ns

Mutex lock/unlock ........................... 25 ns

Main memory reference ...................... 100 ns

Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs

Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs

SSD random read ........................ 150,000 ns = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

Round trip within same datacenter ...... 500,000 ns = 0.5 ms

Read 1 MB sequentially from SSD ..... 1,000,000 ns = 1 ms

Disk seek ........................... 10,000,000 ns = 10 ms

Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms

Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms

23 of 61

Memory hierarchy

ALU

CPU

Registers

Main Memory

Cache Memory

Disk Memory

more storage space, slower access time

  • Accessing a memory location is expensive:
    • Need to supply a long address
    • Getting the memory contents into the CPU takes time
  • Solution: memory hierarchy:

24 of 61

Memory hierarchy

24

  • SRAM: on chip. much faster, much more expensive, 4-6 transistors per unit

  • DRAM: off chip, slower than SRAM, much cheaper than RAM, 1 transistor per unit

  • Secondary storage: non-volatile. For instance, rotating disk (HDD)

25 of 61

Random Access vs Sequential Access...

25

26 of 61

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns

Branch mispredict ............................ 5 ns

L2 cache reference ........................... 7 ns

Mutex lock/unlock ........................... 25 ns

Main memory reference ...................... 100 ns

Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs

Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs

SSD random read ........................ 150,000 ns = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

Round trip within same datacenter ...... 500,000 ns = 0.5 ms

Read 1 MB sequentially from SSD ..... 1,000,000 ns = 1 ms

Disk seek ........................... 10,000,000 ns = 10 ms

Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms

Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms

27 of 61

Registers

32767

ALU

CPU

Registers

Memory

0

1

2

. . .

136

137

138

. . .

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

28 of 61

Registers

32767

ALU

CPU

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

Data registers:

29 of 61

Registers

32767

ALU

CPU

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

Data registers:

add R1, R2

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

30 of 61

Registers

32767

ALU

CPU

10

25

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

before

Data registers:

add R1, R2

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

31 of 61

Registers

32767

ALU

CPU

10

35

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

after

Data registers:

add R1, R2

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

32 of 61

Registers

32767

ALU

CPU

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

Data registers:

add R1, R2

Address registers:

store R1, @A

A

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

33 of 61

Registers

32767

ALU

CPU

77

137

Registers

Memory

0

1

2

. . .

136

137

138

. . .

R1

R2

Data registers:

add R1, R2

Address registers:

store R1, @A

A

before

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

34 of 61

Registers

32767

ALU

CPU

77

137

Registers

Memory

0

1

2

. . .

77

136

137

138

. . .

R1

R2

Data registers:

add R1, R2

Address registers:

store R1, @A

A

after

  • The CPU typically contains a few, easily accessed, registers
  • Their number and functions are a central part of the machine language

35 of 61

Addressing modes

Register

add R1, R2 // R2 ← R2 + R1

Direct

add R1, M[200] // Mem[200] ← Mem[200] + R1

Indirect

add R1, @A // Mem[A] ← Mem[A] + R1

Immediate

add 73, R1 // R1 ← R1 + 73

36 of 61

Flow control

ALU

Computer System

0101110011100110

1011000101010100

1110001011111100

...

Memory

CPU

registers

0

1

2

...

1100101010010101

1100100101100111

0011001010101011

...

n

n+1

n+2

...

instructions

data

current instruction

output

intput

How does the language allow us to decide, and specify,�which instruction to process next?

37 of 61

Flow control

  • Usually the CPU executes machine instructions in sequence

38 of 61

Flow control

  • Usually the CPU executes machine instructions in sequence
  • Sometimes we need to “jump” unconditionally to another location,�e.g. in order to implement a loop:

101:

102:

103:

...

...

156:

load R1,0

add 1, R1

...

// do something with R1 value

...

jmp 102 // goto 102

Example:

load R1,0

LOOP:

add 1, R1

...

// do something with R1 value

...

jmp LOOP // goto loop

Symbolic version:

39 of 61

Flow control

  • Usually the CPU executes machine instructions in sequence
  • Sometimes we need to “jump” unconditionally to another location,�e.g. in order to implement a loop
  • Sometimes we need to jump only if some condition is met:

Example:

jgt R1, 0, CONT // if R1>0 jump to CONT

sub R1, 0, R1 // R1 ← (0 - R1)

CONT:

...

// Do something with positive R1

40 of 61

Agenda

  • Machine languages
  • Basic elements
  • The Hack computer from nand game
  • Build the CPU controller

40

41 of 61

Hack computer: hardware

A 16-bit machine consisting of:

  • Data memory (RAM): a sequence of 16-bit registers:� RAM[0], RAM[1], RAM[2],…
  • Instruction memory (ROM): a sequence of 16-bit registers:� ROM[0], ROM[1], ROM[2],…
  • Central Processing Unit (CPU): performs 16-bit instructions
  • Instruction bus / data bus / address buses.

instructions

data out

data in

CPU

instruction

memory

data

memory

42 of 61

Harvard Model?

42

  • Why does Hack use a Harvard Model???
    • What is the difference between Princeton and Harvard model?
    • CPU cycle: fetch and execute

43 of 61

Computer architecture

43

program

Memory

CPU

data

Registers

ALU

intput

output

44 of 61

Computer architecture

44

program

Memory

CPU

data

Registers

control bus

address bus

data bus

ALU

45 of 61

Basic CPU loop

Repeat:

  • Fetch an instruction from the program memory
  • Execute the instruction.

45

46 of 61

Fetching

  • Put the location of the next instruction in the Memory address input
  • Get the instruction code by reading the contents at that Memory location

46

Program Counter

instruction

program

Memory

data

Memory

address input

Memory output

  • Default: PC++
  • Jump: PC = address

47 of 61

Executing

  • The instruction code specifies “what to do”
    • Which arithmetic or logical instruction to execute
    • Which memory address to access (for read / write)
    • If / where to jump

47

different subsets of the instruction bits control different aspects of the operation

  • Executing the instruction involves:
    • accessing registers
    • and / or:
    • accessing the data memory.

48 of 61

Computer architecture

48

program

Memory

CPU

data

Registers

control bus

address bus

data bus

ALU

49 of 61

Fetch – Execute

49

program

Memory

data

ALU

50 of 61

Fetch – Execute

50

program

Memory

data

data address

instruction address

control bus

address bus

address bus

(data flows not shown, to minimize clutter)

data

instruction

ALU

51 of 61

Fetch – Execute clash

51

program

Memory

data

data address

instruction address

control bus

address bus

address bus

data

instruction

If the Memory is one address space:

This scheme will not work:

  • one Memory input
  • one Memory output

ALU

52 of 61

Fetch – Execute clash

52

program

Memory

data

data address

instruction address

Memory address input

control bus

address bus

Memory output

address bus

If the Memory is one address space:

This scheme will not work:

  • one Memory input
  • one Memory output

53 of 61

Solution: multiplex

53

program

Memory

data

data address

instruction address

Memory address input

control bus

address bus

Memory output

address bus

mux

54 of 61

Solution: multiplex

54

program

Memory

data

data address

mux

instruction address

Memory address input

control bus

address bus

Memory output

address bus

data, when executing

instruction, when fetching

fetch /

execute

bit

55 of 61

Solution: multiplex, using an instruction register

55

program

Memory

data

ALU

instruction register

data address

fetch /

execute

bit

mux

instruction

instruction address

Memory address input

control bus

address bus

Memory output

address bus

data, when executing

load on fetch

56 of 61

Solution: multiplex, using an instruction register

56

program

Memory

data

ALU

instruction register

data address

fetch /

execute

bit

mux

instruction

instruction address

Memory address input

control bus

address bus

Memory output

address bus

data, when executing

load on fetch

instruction

57 of 61

Solution: multiplex, using an instruction register

57

program

Memory

data

ALU

instruction register

data address

fetch /

execute

bit

mux

instruction

instruction address

Memory address input

control bus

address bus

Memory output

address bus

(data flows not shown, to minimize clutter)

load on fetch

instruction

data

58 of 61

Why Harvard Architecture?

Two physically separate memory units:

    • Instruction memory
    • Data memory
  • Advantage:
    • Complication avoided
  • Disadvantage:
    • Two memory chips instead of one
    • The size of the two chips is fixed.

58

Each can be addressed and manipulated

separately, and simultaneously

59 of 61

Hack computer: software

Hack machine language:

    • 16-bit A-instructions
    • 16-bit C-instructions

Hack program = sequence of instructions written in the� Hack machine language

RAM

ROM

instructions

data out

data in

CPU

60 of 61

Hack computer: control

RAM

ROM

instructions

data out

data in

CPU

Control:

    • The ROM is loaded with a Hack program
    • The reset button is pushed
    • The program starts running

reset

61 of 61

Hack computer: registers

RAM

ROM

instructions

data out

data in

CPU

The Hack machine language recognizes three 16-bit registers:

  • D: used to store data
  • A: used to store data / address the memory
  • M: represents the currently addressed memory register: M = RAM[A] => *A

M register

A register

D register