1 of 44

Operating Systems and C�Fall 2022�4. Interpreter

2 of 44

Outline

  1. Instruction Set Architecture (ISA)
  2. Logic Design
  3. Moore’s Law and Dennard Scaling

3 of 44

Instruction Set Architecture

Assembly Language View

Processor state

Registers, memory, …

Instructions

addq, pushq, ret, …

How instructions are encoded as bytes

Layer of Abstraction

Above: how to program machine

Processor executes instructions in a sequence

Below: what needs to be built

Use variety of tricks to make it run fast

E.g., execute multiple instructions simultaneously

ISA

Compiler

OS

CPU

Design

Circuit

Design

Chip

Layout

Application

Program

4 of 44

ARM ISA

5 of 44

Intel x86-64 & Extensions

https://en.wikichip.org/wiki/x86

Little endian�variable length

instruction set

6 of 44

Relevance for Data Science

7 of 44

CISC Instruction Sets

Complex Instruction Set Computer

e.g., IA32

Stack-oriented instruction set

Use stack to pass arguments, save program counter

Explicit push and pop instructions

Arithmetic instructions can access memory

addq %rax, 12(%rbx,%rcx,8)

requires memory read and write

Complex address calculation

Condition codes

Set as side effect of arithmetic and logical instructions

Philosophy

Add instructions to perform “typical” programming tasks

8 of 44

RISC Instruction Sets

Reduced Instruction Set Computer

Internal project at IBM, later popularized by Hennessy and Patterson

ARM, RISC-V

Fewer, simpler instructions

Might take more to get given task done

Can execute them with small and fast hardware

Register-oriented instruction set

Many more registers

Use for arguments, return pointer, temporaries

Only load and store instructions can access memory

Similar to X86-64 mov

No Condition codes

Test instructions return 0/1 in register

9 of 44

RISC V

https://www.youtube.com/watch?v=Rl2NFy9Xe-0

10 of 44

CISC vs. RISC

Original Debate

Strong opinions!

CISC proponents---easy for compiler, fewer code bytes

RISC proponents---better for optimizing compilers, can make run fast with simple chip design

Current Status

For desktop processors, choice of ISA not a technical issue

With enough hardware, can make anything run fast

Code compatibility more important

x86-64 adopted many RISC features

More registers; use them for argument passing

For embedded processors, RISC makes sense

Smaller, cheaper, less power

Most cell phones use ARM processor

11 of 44

Relevance for Data Science

12 of 44

Reflections on ISA

13 of 44

Logic Design => Digital Design

  • Everything expressed in terms of values 0 and 1
  • Communication
    • Low or high voltage on wire
  • Computation
    • Compute Boolean functions
  • Storage
    • Store bits of information

14 of 44

Digital Signals

Use voltage thresholds to extract discrete values from continuous signal

Simplest version: 1-bit signal

Either high range (1) or low range (0)

With guard range between them

Not strongly affected by noise or low quality circuit elements

Can make circuits simple, small, and fast

Voltage

Time

0

1

0

15 of 44

Looking Forward

Solid-state qubits

A qubit is a two-state (or two-level) quantum-mechanical system.

The state space of a two-level quantum mechanical system (qubit)�can be represented as a Riemann sphere.

16 of 44

Looking Back

ENIAC, U.Penn, 1945

Analytical Engine, C.Babbage, 1837,

17 of 44

ENIAC Architecture

  • Representation of digits by electric pulses
  • Base 10 (data bus)
    • 10 wires for data
    • 1 wire for sign
  • Hardwired functional units (add/multiply)
    • Vacuum tubes as basic component
    • Programming with flip-flops, switches and cables

18 of 44

Von Neumann Architecture

https://ieeexplore.ieee.org/document/238389

19 of 44

Computing with Logic Gates

    • Outputs are Boolean functions of inputs
    • Respond continuously to changes in inputs
      • With some, small delay

Voltage

Time

a

b

Rising Delay

Falling Delay

a && b

20 of 44

Combinational Circuits

Acyclic Network of Logic Gates

Vertices are logic gates; edges are signal transport

Continously responds to changes on primary inputs

Primary outputs become (after some delay) Boolean functions of primary inputs

Acyclic Network

Primary

Inputs

Primary

Outputs

21 of 44

Bit Equality

Bit equal

a

b

eq

bool eq = (a&&b)||(!a&&!b)

22 of 44

Word Equality

64-bit word size

HDL representation

Equality operation

Generates Boolean value

b63

Bit equal

a63

eq63

b62

Bit equal

a62

eq62

b1

Bit equal

a1

eq1

b0

Bit equal

a0

eq0

Eq

=

B

A

Eq

Word-Level Representation

bool Eq = (A == B)

23 of 44

Bit-Level Multiplexor

Control signal s

Data signals a and b

Output a when s=1, b when s=0

Bit MUX

b

s

a

out

bool out = (s&&a)||(!s&&b)

24 of 44

Arithmetic Logic Unit

Combinational logic

        • Continuously responding to inputs

Control signal selects function computed

        • Corresponding to 4 arithmetic/logical operations
        • Overflow, Carry, Zero flags: OF, CF, ZF

OF

ZF

CF

OF

ZF

CF

OF

ZF

CF

OF

ZF

CF

A

L

U

Y

X

X + Y

0

A

L

U

Y

X

X - Y

1

A

L

U

Y

X

X & Y

2

A

L

U

Y

X

X ^ Y

3

A

B

A

B

A

B

A

B

25 of 44

Storing bits

Boolean logic and acyclic circuit of logic gates used to represent arithmetic operations

Can logic gates be used to store bits?

Yes! With cyclic circuits of logic gates.

26 of 44

Storing 1 Bit

Vin

V1

V2

27 of 44

Storing 1 Bit (cont.)

Stable 0

Stable 1

Metastable

Vin

V1

V2

Vin

V1

V2

Vin = V2

28 of 44

Physical Analogy

Stable 0

Stable 1

Metastable

.

Stable left

.

Stable right

.

Metastable

29 of 44

Storing 1 Bit (cont.)

Bistable Element

Q+

Q–

q

!q

q = 0 or 1

A

When A is 0

Q+ = q = 1� Q- = !q = 0

Q+

Q–

q

!q

q = 0 or 1

When no input

Q+ = q = 1� Q- = !q = 0

30 of 44

Storing 1 Bit (cont.)

Bistable Element

Q+

Q–

q

!q

q = 0 or 1

A

When A is 1

Q+ = q = 0� Q- = !q = 1

Q+

Q–

q

!q

q = 0 or 1

When no input

Q+ = q = 0� Q- = !q = 1

31 of 44

Storage Element: Latch

Resetting

1

0

1

0

0

1

Setting

0

1

0

1

1

0

Storing

0

0

!q

q

q

!q

Bistable element

Q+

Q–

q

!q

q = 0 or 1

Q+

Q–

R

S

R-S Latch

32 of 44

Relevance for Data Science

33 of 44

FPGAs

Configurable Logic Block:

  • Lookup Tables (LUT)
    • Truth table
    • Up to 6 inputs
  • Storage Elements
    • Flip-flop or latch
    • Control signals

+ Multiplexers, Carry Logic, Distributed RAM, Shift register

34 of 44

Programming FPGAs

https://www.xilinx.com/products/design-tools/vitis.html

35 of 44

FPGA Synthesis Process

VHDL program

36 of 44

Hardware Description Language

Very simple hardware description language

Can only express limited aspects of hardware operation

Data Types

bool: Boolean

      • a, b, c, …

int: words

      • A, B, C, …
      • Does not specify word size---bytes, 64-bit words, …

Statements

bool a = bool-expr ;

int A = int-expr ;

37 of 44

HDL Operations

Classify by type of value returned

Boolean Expressions

Logic Operations

      • a && b, a || b, !a

Word Comparisons

      • A == B, A != B, A < B, A <= B, A >= B, A > B

Set Membership

      • A in { B, C, D }
        • Same as A == B || A == C || A == D

Word Expressions

Case expressions

      • [ a : A; b : B; c : C ]
      • Evaluate test expressions a, b, c, … in sequence
      • Return word expression A, B, C, … for first successful test

38 of 44

FPGA Synthesis Process

Functional Verification

39 of 44

Logic Design Summary

Computation

Performed by combinational logic

Computes Boolean functions

Continuously reacts to input changes (clock)

Storage elements combined into

Random-access memories

        • Hold multiple words
        • Possible multiple read or write ports
        • Read word when address input changes
        • Write word as clock rises

40 of 44

Moore’s Law and Dennard Scaling

Gordon Moore (1965): The number of transistors in a dense integrated circuit doubles approximately every two years.

Robert Dennard (1974): As transistors get smaller their power density stays constant

41 of 44

Consequence

42 of 44

The situation is getting more complex

https://newsroom.intel.com/wp-content/uploads/sites/11/2019/11/intel-oneapi-info.pdf

43 of 44

Diverse Cores

44 of 44

Take-Aways

Choice of CPU based on ISA + extensions (security, performance).

FPGA-based accelerators programmed as acyclic networks of logic gates.

Trends lead to diverse cores (Dennard scaling).