Operating Systems and C�Fall 2022�4. Interpreter
Outline
Instruction Set Architecture
Assembly Language View
Processor state
Registers, memory, …
Instructions
addq, pushq, ret, …
How instructions are encoded as bytes
Layer of Abstraction
Above: how to program machine
Processor executes instructions in a sequence
Below: what needs to be built
Use variety of tricks to make it run fast
E.g., execute multiple instructions simultaneously
ISA
Compiler
OS
CPU
Design
Circuit
Design
Chip
Layout
Application
Program
ARM ISA
Intel x86-64 & Extensions
https://en.wikichip.org/wiki/x86
Little endian�variable length
instruction set
Relevance for Data Science
CISC Instruction Sets
Complex Instruction Set Computer
e.g., IA32
Stack-oriented instruction set
Use stack to pass arguments, save program counter
Explicit push and pop instructions
Arithmetic instructions can access memory
addq %rax, 12(%rbx,%rcx,8)
requires memory read and write
Complex address calculation
Condition codes
Set as side effect of arithmetic and logical instructions
Philosophy
Add instructions to perform “typical” programming tasks
RISC Instruction Sets
Reduced Instruction Set Computer
Internal project at IBM, later popularized by Hennessy and Patterson
ARM, RISC-V
Fewer, simpler instructions
Might take more to get given task done
Can execute them with small and fast hardware
Register-oriented instruction set
Many more registers
Use for arguments, return pointer, temporaries
Only load and store instructions can access memory
Similar to X86-64 mov
No Condition codes
Test instructions return 0/1 in register
RISC V
https://www.youtube.com/watch?v=Rl2NFy9Xe-0
CISC vs. RISC
Original Debate
Strong opinions!
CISC proponents---easy for compiler, fewer code bytes
RISC proponents---better for optimizing compilers, can make run fast with simple chip design
Current Status
For desktop processors, choice of ISA not a technical issue
With enough hardware, can make anything run fast
Code compatibility more important
x86-64 adopted many RISC features
More registers; use them for argument passing
For embedded processors, RISC makes sense
Smaller, cheaper, less power
Most cell phones use ARM processor
Relevance for Data Science
Reflections on ISA
Logic Design => Digital Design
Digital Signals
Use voltage thresholds to extract discrete values from continuous signal
Simplest version: 1-bit signal
Either high range (1) or low range (0)
With guard range between them
Not strongly affected by noise or low quality circuit elements
Can make circuits simple, small, and fast
Voltage
Time
0
1
0
Looking Forward
Solid-state qubits
A qubit is a two-state (or two-level) quantum-mechanical system.
The state space of a two-level quantum mechanical system (qubit)�can be represented as a Riemann sphere.
Looking Back
ENIAC, U.Penn, 1945
Analytical Engine, C.Babbage, 1837,
ENIAC Architecture
Von Neumann Architecture
https://ieeexplore.ieee.org/document/238389
Computing with Logic Gates
Voltage
Time
a
b
Rising Delay
Falling Delay
a && b
Combinational Circuits
Acyclic Network of Logic Gates
Vertices are logic gates; edges are signal transport
Continously responds to changes on primary inputs
Primary outputs become (after some delay) Boolean functions of primary inputs
Acyclic Network
Primary
Inputs
Primary
Outputs
Bit Equality
Bit equal
a
b
eq
bool eq = (a&&b)||(!a&&!b)
Word Equality
64-bit word size
HDL representation
Equality operation
Generates Boolean value
b63
Bit equal
a63
eq63
b62
Bit equal
a62
eq62
b1
Bit equal
a1
eq1
b0
Bit equal
a0
eq0
Eq
=
B
A
Eq
Word-Level Representation
bool Eq = (A == B)
Bit-Level Multiplexor
Control signal s
Data signals a and b
Output a when s=1, b when s=0
Bit MUX
b
s
a
out
bool out = (s&&a)||(!s&&b)
Arithmetic Logic Unit
Combinational logic
Control signal selects function computed
OF
ZF
CF
OF
ZF
CF
OF
ZF
CF
OF
ZF
CF
A
L
U
Y
X
X + Y
0
A
L
U
Y
X
X - Y
1
A
L
U
Y
X
X & Y
2
A
L
U
Y
X
X ^ Y
3
A
B
A
B
A
B
A
B
Storing bits
Boolean logic and acyclic circuit of logic gates used to represent arithmetic operations
Can logic gates be used to store bits?
Yes! With cyclic circuits of logic gates.
Storing 1 Bit
Vin
V1
V2
Storing 1 Bit (cont.)
Stable 0
Stable 1
Metastable
Vin
V1
V2
Vin
V1
V2
Vin = V2
Physical Analogy
Stable 0
Stable 1
Metastable
.
Stable left
.
Stable right
.
Metastable
Storing 1 Bit (cont.)
Bistable Element
Q+
Q–
q
!q
q = 0 or 1
A
When A is 0
Q+ = q = 1� Q- = !q = 0
Q+
Q–
q
!q
q = 0 or 1
When no input
Q+ = q = 1� Q- = !q = 0
Storing 1 Bit (cont.)
Bistable Element
Q+
Q–
q
!q
q = 0 or 1
A
When A is 1
Q+ = q = 0� Q- = !q = 1
Q+
Q–
q
!q
q = 0 or 1
When no input
Q+ = q = 0� Q- = !q = 1
Storage Element: Latch
Resetting
1
0
1
0
0
1
Setting
0
1
0
1
1
0
Storing
0
0
!q
q
q
!q
Bistable element
Q+
Q–
q
!q
q = 0 or 1
Q+
Q–
R
S
R-S Latch
Relevance for Data Science
FPGAs
Configurable Logic Block:
+ Multiplexers, Carry Logic, Distributed RAM, Shift register
Programming FPGAs
https://www.xilinx.com/products/design-tools/vitis.html
FPGA Synthesis Process
VHDL program
Hardware Description Language
Very simple hardware description language
Can only express limited aspects of hardware operation
Data Types
bool: Boolean
int: words
Statements
bool a = bool-expr ;
int A = int-expr ;
HDL Operations
Classify by type of value returned
Boolean Expressions
Logic Operations
Word Comparisons
Set Membership
Word Expressions
Case expressions
FPGA Synthesis Process
Functional Verification
Logic Design Summary
Computation
Performed by combinational logic
Computes Boolean functions
Continuously reacts to input changes (clock)
Storage elements combined into
Random-access memories
Moore’s Law and Dennard Scaling
Gordon Moore (1965): The number of transistors in a dense integrated circuit doubles approximately every two years.
Robert Dennard (1974): As transistors get smaller their power density stays constant
Consequence
The situation is getting more complex
https://newsroom.intel.com/wp-content/uploads/sites/11/2019/11/intel-oneapi-info.pdf
Diverse Cores
Take-Aways
Choice of CPU based on ISA + extensions (security, performance).
FPGA-based accelerators programmed as acyclic networks of logic gates.
Trends lead to diverse cores (Dennard scaling).