1 of 27

Hello my name is Burak,

Its nice to meet you!

2 of 27

A bit about me

Education:

University of Texas at Austin

Class of 2022

Major GPA: 3.85

GPA: 3.65

Hobbies: Gaming (4X/TBS/RTS), Outreach/Mentoring (UT IEE RAS), Software/Hardware design (Website, 3D engine in pure C and … ), Photo-editing (see the picture to the right)
Lived in Texas for ~90% of my life, about 5% in Pennsylvania, 2.5% in Turkey.

3 of 27

Some of my previous work experience:

AMD (SLT & ATE Intern)

Created task Automation Scripts, created test code, and operated 93k (v93k, hp93k…) Testers and the ‘hand-test’ as well as GS1 Handler

Silicon Labs (‘Global Ops Intern’)

Was in foundry group but mostly did Software/tool development. Worked on joint project between CAD and foundry team to make better use of available data.

Vast (SWE Intern)

‘Wore many hats’ while here. Did ‘full-stack’ web development (From servers to HTML), data mining, app development, and advertising.

Summer Advanced Research Camp (TA)

Taught other students how to conduct and present their own research as well as working on my own project with guidance from UT professors

Elk Electric (Apprentice Electrician)

Created wiring diagrams and installed lighting and power

4 of 27

A Custom MCU �& ISA design

Or: How I learned to let go of small things like ‘value proposition’, ‘opportunity cost’ and ‘reason’ in order to embrace the transistor.

5 of 27

Overview of MCU Design

Custom 8-bit RISC CMOS Design

To be Implemented with discrete parts.

Version 0.1 Had only 1060 Transistors.

Current (V0.3) Transistor Count of 2494

2 General Purpose Registers

2 Accumulator-like Registers

1 Hidden Swap Register

1 status register

No Hardware Stack Pointer or Interrupts!

6 of 27

Typical Module implementation

All registers and persistent gates in design are implemented using D latches* instead of the more typical Master Slave D flip flop. This is due to the higher number of transistors required to implement a Master Slave D flip flop**.
All logic circuits (ADD, AND, etc.) continuously evaluate but are gated from outputting onto the bus.

** - values represented as Per-Bit/Base. Master Slave D flip Flop: 36/4. D latch: 20/2.

7 of 27

Overview of ISA

All instructions are uniform length
No Store operation exists*
5 Addressing/Input Modes

Register (Addressing/Input)
PC + Offset (Addressing/Input**)
PC + Register (Addressing)
Immediate (Input)
Implicit (Input)

Implicit input mode uses A and/ or B as implied
Destination and Source are in reasonable positions to be used in non EEPROM state control

Instruction	Sub-Op	Description	Bits
MOV		Dest <- Imm	00DR 0Imm
MOV		Dest <- SRC	00DR 10SR
SWP		Dest <-> SRC	00DR 11SR
ALU	NOTB ADD SUB AND OR PC IOG IOS BNK	Dest <- Impl Dest <- Impl Dest <- Impl Dest <- Impl Dest <- Impl Dest <- Impl Dest <- IO IO <- SRC BNK <- SRC	01DR 0000 01DR 0001 01DR 0010 01DR 0011 01DR 0100 01DR 0101 01DR 0110 0100 11SR 0101 11SR
LD		Dest<-MEM[PC+Imm]	10DR 0Imm
LDR		Dest<-MEM[SRC]	10DR 10SR
LDP		Dest<-MEM[PC+SRC]	10DR 11SR
JMP		PC <- PC+Imm	111I mmmm
JMPR	JMPR(n/z/p)	PC <- SRC	1100 NZSR
JMPP	JMPI(n/z/p)	PC <- PC+SRC	1101 NZSR

*- except IO and SWP ** - PC only, DR – Dest, Imm – Immediate, SR – Source Register, NZ - NZP bits, SWP – Swap, IO – Input Output

8 of 27

Typical Instruction Implementation

Takes a lot of cycles (0)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

9 of 27

Typical Instruction Implementation

Takes a lot of cycles (1)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

PC

0

PC+1

ALU Has Incr. Control Line

PC+1

10 of 27

Typical Instruction Implementation

Takes a lot of cycles (2)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

PC is SR latch so cannot R/W at same time.

PC+1

11 of 27

Typical Instruction Implementation

Takes a lot of cycles (3)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

MEM[PC+1]

Jumps have condition and SC will check now

12 of 27

Typical Instruction Implementation

Takes a lot of cycles (4)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

A

Since A&B are not just input buffers long swap sequences are possible

13 of 27

Typical Instruction Implementation

Takes a lot of cycles (5)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

S1

14 of 27

Typical Instruction Implementation

Takes a lot of cycles (6)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

A

15 of 27

Typical Instruction Implementation

Takes a lot of cycles (7)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

S1

PC

PC+S1

16 of 27

Typical Instruction Implementation

Takes a lot of cycles (8)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

PC+S1

17 of 27

Typical Instruction Implementation

Takes a lot of cycles (9)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

S1

18 of 27

Typical Instruction Implementation

Takes a lot of cycles(10)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

A

19 of 27

Typical Instruction Implementation

Takes a lot of cycles(11)!
JMPPp S1
Fetch : get the instruction from memory into the Instruction Register.
Decode : Internally decode what it must do
Execute : Use loaded values for performing instruction
Store : store the result back into another register.

S1

20 of 27

Power Consumption

Worst Case power consumption of a chip can be estimated by, P=V²CF for VLSI chips.

The transistors used here have capacitance of 46 picofarad each.

However capacitance due to the traces exists that is not accounted for. This capacitance is equal to t*(kE₀A/d) and is approx. 1 pF per transistor using generous numbers.

Given: R4-Standard Tg 140C has dielectric constant k of 4.58. A = l*w & w=3.556*10^-4m (14mil) & l_average= .1m . d = .0016m. kE₀A/d = .90125pF.

With this approximation, at a clock of 10Mhz we would see power usage of 2.93 Watts

21 of 27

Value Proposition

This Custom PCB MCU
4 Registers
5 Input/Addressing Modes
64k addressable memory
18 Instructions
2494 transistors (CMOS design)
No Interrupts
Price of approx. $724 per unit as of version v.3

MOS 6502 Knock Off
3 Registers
13 Input/Addressing Modes
64k addressable memory
56 instructions
3,510 transistors (NMOS design)
Interrupts
Bulk Price of <$0.08 per unit if you buy a couple thousand

22 of 27

Future development plans

End

Evaluate ISA From user perspective

ISA Improvement possible

Generate gate level update for MCU

Evaluate compatibility at transistor level

Find if it is Functional

Create Verilog models to verify functionality

Do Bugs exist?

Simulate full circuit at electrical level

Do Bugs exist?

Layout all components and verify compliance with schematics

23 of 27

Questions?

Bonus slides ahead!

24 of 27

Inspiration MOnSter 6502

25 of 27

Concept Drawing �(V.3)

26 of 27

Change in Layout concept in upcoming v.4

30cm * 30cm PCB

Cost: ~200 USD

10*10cm*10cm PCB

Cost: ~8 USD

Using a Stackable design poses additional challenge as

It will have negative effects on amount capacitance and

Inductance as well as increasing effort for lay out,

but significantly reduces costs and increases room.

Like design of TM4C dev board in concept.

27 of 27

Improvements in V.4

D-latch PC
Add overflow flags
Unify IO Ports (Also a second IO port would be good.)
Reset handling
Store Op

Would involve redesigning memory architecture. So that some ROM is ‘On chip’ and some RAM is on a bus to maintain idea of the MCU.

Improve or remove immediate.
Make JMPP implicitly use A register