1 of 27

Hello my name is Burak,

Its nice to meet you!

2 of 27

A bit about me

Education:

University of Texas at Austin

Class of 2022

Major GPA: 3.85

GPA: 3.65

  • Hobbies: Gaming (4X/TBS/RTS), Outreach/Mentoring (UT IEE RAS), Software/Hardware design (Website, 3D engine in pure C and … ), Photo-editing (see the picture to the right)
  • Lived in Texas for ~90% of my life, about 5% in Pennsylvania, 2.5% in Turkey.

3 of 27

Some of my previous work experience:

AMD (SLT & ATE Intern)

    • Created task Automation Scripts, created test code, and operated 93k (v93k, hp93k…) Testers and the ‘hand-test’ as well as GS1 Handler

Silicon Labs (‘Global Ops Intern’)

    • Was in foundry group but mostly did Software/tool development. Worked on joint project between CAD and foundry team to make better use of available data.

Vast (SWE Intern)

    • ‘Wore many hats’ while here. Did ‘full-stack’ web development (From servers to HTML), data mining, app development, and advertising.

Summer Advanced Research Camp (TA)

    • Taught other students how to conduct and present their own research as well as working on my own project with guidance from UT professors

Elk Electric (Apprentice Electrician)

    • Created wiring diagrams and installed lighting and power

4 of 27

A Custom MCU �& ISA design

Or: How I learned to let go of small things like ‘value proposition’, ‘opportunity cost’ and ‘reason’ in order to embrace the transistor.

5 of 27

Overview of MCU Design

Custom 8-bit RISC CMOS Design

To be Implemented with discrete parts.

Version 0.1 Had only 1060 Transistors.

Current (V0.3) Transistor Count of 2494

2 General Purpose Registers

2 Accumulator-like Registers

1 Hidden Swap Register

1 status register

No Hardware Stack Pointer or Interrupts!

6 of 27

Typical Module implementation

  • All registers and persistent gates in design are implemented using D latches* instead of the more typical Master Slave D flip flop. This is due to the higher number of transistors required to implement a Master Slave D flip flop**.
  • All logic circuits (ADD, AND, etc.) continuously evaluate but are gated from outputting onto the bus.

** - values represented as Per-Bit/Base. Master Slave D flip Flop: 36/4. D latch: 20/2.

7 of 27

Overview of ISA

  • All instructions are uniform length
  • No Store operation exists*
  • 5 Addressing/Input Modes
    • Register (Addressing/Input)
    • PC + Offset (Addressing/Input**)
    • PC + Register (Addressing)
    • Immediate (Input)
    • Implicit (Input)
  • Implicit input mode uses A and/ or B as implied
  • Destination and Source are in reasonable positions to be used in non EEPROM state control

Instruction

Sub-Op

Description

Bits

MOV

Dest <- Imm

00DR 0Imm

MOV

Dest <- SRC

00DR 10SR

SWP

Dest <-> SRC

00DR 11SR

ALU

NOTB

ADD

SUB

AND

OR

PC

IOG

IOS

BNK

Dest <- Impl

Dest <- Impl

Dest <- Impl

Dest <- Impl

Dest <- Impl

Dest <- Impl

Dest <- IO

IO <- SRC

BNK <- SRC

01DR 0000

01DR 0001

01DR 0010

01DR 0011

01DR 0100

01DR 0101

01DR 0110

0100 11SR

0101 11SR

LD

Dest<-MEM[PC+Imm]

10DR 0Imm

LDR

Dest<-MEM[SRC]

10DR 10SR

LDP

Dest<-MEM[PC+SRC]

10DR 11SR

JMP

PC <- PC+Imm

111I mmmm

JMPR

JMPR(n/z/p)

PC <- SRC

1100 NZSR

JMPP

JMPI(n/z/p)

PC <- PC+SRC

1101 NZSR

*- except IO and SWP ** - PC only, DR – Dest, Imm – Immediate, SR – Source Register, NZ - NZP bits, SWP – Swap, IO – Input Output

8 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (0)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

9 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (1)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

PC

0

PC+1

ALU Has Incr. Control Line

PC+1

PC+1

10 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (2)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

PC is SR latch so cannot R/W at same time.

PC+1

PC+1

11 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (3)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

MEM[PC+1]

MEM[PC+1]

Jumps have condition and SC will check now

12 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (4)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

A

A

Since A&B are not just input buffers long swap sequences are possible

13 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (5)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

S1

S1

14 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (6)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

A

A

15 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (7)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

S1

PC

PC+S1

PC+S1

16 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (8)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

PC+S1

PC+S1

17 of 27

Typical Instruction Implementation

  • Takes a lot of cycles (9)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

S1

S1

18 of 27

Typical Instruction Implementation

  • Takes a lot of cycles(10)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

A

A

19 of 27

Typical Instruction Implementation

  • Takes a lot of cycles(11)!
  • JMPPp S1
  • Fetch : get the instruction from memory into the Instruction Register.
  • Decode : Internally decode what it must do
  • Execute : Use loaded values for performing instruction
  • Store : store the result back into another register.

S1

S1

20 of 27

Power Consumption

  • Worst Case power consumption of a chip can be estimated by, P=V2CF for VLSI chips.
    • The transistors used here have capacitance of 46 picofarad each.
  • However capacitance due to the traces exists that is not accounted for. This capacitance is equal to t*(kE0A/d) and is approx. 1 pF per transistor using generous numbers.
    • Given: R4-Standard Tg 140C has dielectric constant k of 4.58. A = l*w & w=3.556*10-4m (14mil) & laverage= .1m . d = .0016m. kE0A/d = .90125pF.
  • With this approximation, at a clock of 10Mhz we would see power usage of 2.93 Watts

21 of 27

Value Proposition

  • This Custom PCB MCU
  • 4 Registers
  • 5 Input/Addressing Modes
  • 64k addressable memory
  • 18 Instructions
  • 2494 transistors (CMOS design)
  • No Interrupts
  • Price of approx. $724 per unit as of version v.3

  • MOS 6502 Knock Off
  • 3 Registers
  • 13 Input/Addressing Modes
  • 64k addressable memory
  • 56 instructions
  • 3,510 transistors (NMOS design)
  • Interrupts
  • Bulk Price of <$0.08 per unit if you buy a couple thousand

22 of 27

Future development plans

End

Evaluate ISA From user perspective

ISA Improvement possible

Generate gate level update for MCU

Evaluate compatibility at transistor level

Find if it is Functional

Create Verilog models to verify functionality

Do Bugs exist?

Simulate full circuit at electrical level

Do Bugs exist?

Layout all components and verify compliance with schematics

23 of 27

Questions?

Bonus slides ahead!

24 of 27

Inspiration MOnSter 6502

25 of 27

Concept Drawing �(V.3)

26 of 27

Change in Layout concept in upcoming v.4

30cm * 30cm PCB

Cost: ~200 USD

10*10cm*10cm PCB

Cost: ~8 USD

Using a Stackable design poses additional challenge as

It will have negative effects on amount capacitance and

Inductance as well as increasing effort for lay out,

but significantly reduces costs and increases room.

Like design of TM4C dev board in concept.

27 of 27

Improvements in V.4

  • D-latch PC
  • Add overflow flags
  • Unify IO Ports (Also a second IO port would be good.)
  • Reset handling
  • Store Op
    • Would involve redesigning memory architecture. So that some ROM is ‘On chip’ and some RAM is on a bus to maintain idea of the MCU.
  • Improve or remove immediate.
  • Make JMPP implicitly use A register