EE360N: Computer Architecture Lecture #3

Department of Electrical and Computer Engineering

The University of Texas at Austin October 9, 2008

Disclaimer: The contents of this document are scribe notes for The University of Texas at Austin EE360N Fall 2008, Computer Architecture* The notes capture the class discussion and may contain erroneous and unverified information and comments.


Microcoded Machines


Lecture #3: 09/08/2008

Lecturer: Derek Chiou

Scribe: Manny Alvarez, Anuj Arora



  1. Recap of Last Lecture and Outline of This Lecture


ISAs depend on the underlying micro-architecture of the computer. Microcoded machines are easy to understand, implement, and add instructions to. It is also easy to implement ISAs on top of microcoded machines. However, a down side to the ease of adding instructions to a microcoded machine is that it often results in a proliferation of instructions,making the ISAs too complex. For example, the VAX polynomial solve instruction may have seemed like a good idea at the time it was created because it was easy to create and someone had a need for it, but it quickly became a liability. To maintain software backward compatibility, instructions cannot disappear from an instruction set, implying that instruction sets tend to never shrink in the number of instructions.

This lecture builds more upon the control structure of the microcode engine and improving microcoded machine performance.


  1. Changing Instruction Sets


It is difficult and expensive for a company to change ISAs, but sometimes it is advantageous to do so to adapt to technology changes and to enable better microarchitectures. For example, Digital Equipment Corporation (DEC or Digital) switched instruction sets from the VAX instruction set, that was a very complex ISA to the Alpha instruction set, that was much simpler and thus easier to implement efficiently.The VAX instruction set was originally implemented as a microcoded machine and grew instructions like the polynomial solve instruction, making it difficult to implement efficiently as time went on and competitors grew faster On the other hand, machines implemented with the Alpha ISA were so fast they could sometimes simulate the x86 ISA as fast as an x86 processor could run.


Apple is another example of switching ISAs not once, but twice. Apple switched from the Motorola 68K ISA to PowerPC to get IBM’s and Motorola’s help against WinTel. It then switched to Intel x86 processors when it became clear that the PowerPC could not keep up from a price/performance perspective to Intel.



3. Microcode:


    1. Where Does Microcode Live?


The microcode is stored in the Control unit. The Control unit “executes” the microcode and issues control signals to all components of the datapath as well as components within itself. The microcode is essentially a program, encoded as a finite state machine, that interprets and executes ISA instructions by controlling the datapath, cycle-by-cycle, appropriately.


Before executing every instruction, the microcode must first fetch the instruction. It does so by first reading the instruction from memory, using the address contained in the Program Counter (PC). In the LC-3b microarchitecture (not in the simplified microarchitecture we use in lecture), while the PC is being loaded into the MAR so the instruction can be read from memory, it is simultaneously incremented by two in preparation to fetch the next instruction. Note that, in the case of a branch instruction, the PC needs to be adjusted again. Once the instruction is read from memory, it is then loaded into the IR (Instruction Register) to be interpreted.


    1. Adding Realism: Multiple ADDs How to Implement?


the LC-3b ISA, has two different ADD instructions. One adds two registers (RegA+RegB) and the other adds a register to an immediate value specified in immediate field the instruction itself (RegA + immed) (we use the term “register” here to mean the value contained in the register.) Bit 5 of the ADD instruction is used to decide which ADD instruction it is. This can be implemented by adding a MUX between the B register on the ALU and the bus. Bit 5 is then used as a selection bit for the MUX to select either the immediate value or the second register value to be loaded into the B register of the ALU to be added to register A.


    1. Branch (BRn LABEL)?


There are a set of bits called condition codes that are sent from the ALU to the Control unit to be latched. These three bits tell the microcode if the last number to go through the ALU was greater than zero, less than zero, or equal to zero. Using these bits (called condition codes), the microcode can do the right thing when it interprets branch instructions such as the BRn (branch if negative) instruction. The state machine (as specified by the microcode) can specify whether to use those condition code bits to help decide the next state in the state machine. Thus, in the case of the BRn instruction, if the bit indicating the previous qualifying ALU operation was negative, the microcode precedes to a state that will extract the immediate from the instruction (the immediate value was generated by the assembler based on the address of the BRn instruction and the label that the branch is jumping to) and increment the PC with that immediate.





    1. Implementing Microcode: Microcode Store


There are many ways to implement microcode. One way is to implement the microcode completely in logic. Another way, which is the way we use in class, is to implement the microcode in a microcode memory. Conceptually, every control signal in the datapath has a corresponding column in the microcode. Each row of microcode represents a single state in the microcode state machine, meaning all of the datapath control signals for that particular state are specified by each row of the microcode. For example, if data is being moved from memory to the A register of the ALU, the microcode bits for the memory output and the register A of the ALU’s input would be asserted and all others would be deasserted.


Of course, logic can still be used to assist and/or compress the microcode in the microcode memory. For example, we know that only one bus device can be driving the bus at any given time. If there are four bus devices that are attached to the bus, only one should be allowed to drive the bus at any given time. The microcode could have four columns, one for each bus device, each of which specifies whether that bus device’s tri-state to the bus is enabled or not. If the microcode is correct, that is, only one of the bus device’s tristate is enabled in any particular microcode state (meaning that in any particular row of microcode only one of those signals is asserted), there are no problems. However, it is possible to make a mistake and assert more than one of those tristate enable bits, in which case more than one of the bus devices would be driving the bus, leading to problems. You could literally burn up your machine that way.


An alternative would be to use two bits to indicate which one of the four bus devices should drive the bus at each microcode state. Doing that would require logic (a decoder) to go from the two indicator bits to a “one-hot” representation of those two bits (where one signal is asserted and the other three are deasserted.) Using such a scheme guarantees that only one bus device is driving the bus at any time, thus eliminating the potential for burning up your machine (at least in that case.)


There are also additional bits in each row of microcode to help determine the next state. For example, if the microcode has a total of 64 states, 6 bits in the microcode might be reserved to determine the next microcode state. Of course, whether those 6 bits are used as the next state depends on whether the microcode will conditionally branch or not.



    1. More about Microcode Branching



A branch instruction is one where the PC (or uPC) is set differently based on the condition that the branch instruction depends on.


Note that there is a difference between an ISA branch and a microcode branch. In order to implement an ISA-level branch, the microcode needs to execute a conditional branch itself to conditionally change the PC depending on the value of the condition code registers.


Any branch instruction (ISA or microcode) can be implemented in two ways:


  1. Having two branch destinations: This implementation is more powerful and gives the user a flexibility to branch to two different locations depending on whether the branch condition is met or not.

If <conditional code> goto branch1 else goto branch2


  1. Having one branch destination and fall through: This implementation allows the user to branch to a specific location if the condition is met else the PC is incremented to the instruction following the branch instruction in the program.

If <conditional code> goto branch1 else got next_inst


Obviously, if for each goto location, additional space in the ISA instruction/microcode instruction is required to store that goto location.


If certain bits in the data path field are not used in micro-ops, they can instead be made use of to store the additional next states when they are needed. These are known as the dual semantic fields. Of course, this means that the microcode instructions that need an additional next field cannot use those bits to also mean the other meaning of those bits.


  1. Importance of Microcode:


Microcoding is important as it ensures correctness of implementation and saves hardware resources. Microcoding gives us the flexibility of implementing the ISA in software and thus has all advantages of using software over hardware. If we compare the number of bits required to save a program with a specific number of instructions, it is seen that this number is considerably smaller when the instructions are implemented using microcode. This can be observed from the following two equations:














Thus, each instance of an instruction in the program is really a subroutine call to a microcode routine that will interpret that instruction.


  1. Length of Instructions:


The length of instructions can be variable as long as we have information in the opcode to determine what the length of the instruction is. Consider two instructions with a 4 bit and a 10 bit opcode. In order for these to be implemented correctly, we should be able to tell in the first four bits of each opcode whether the next six bits are needed or not. However, reserving one out the first four bits for this purpose leads to inefficiency. Thus we can use one or two states out of the available 16 states (a four bit opcode can represent 16 states as 2­4 = 16) to implement the same behavior in a more efficient manner.


  1. Improving the performance:


The following optimizations can be made in order to improve the performance of a system.



    1. More Register Bandwidth


This allows us to provide the values to both registers A and B of the ALU simultaneously from the register file as compared to the having a single register file with a single read port where the registers A and B received values one after the other. What this requires is a register file that can read two values in one cycle, or a two-ported register file. Two ways to implement two ported register files are as follows:


  1. Having 2 register files, each with one read port and one write each. Each write is performed to both register files, thus ensuring that both register files contain precisely the same information at all times. Thus, either register can be read to get the desired data, effectively creating the equivalent of a two read port register file.



  1. By having a single register file with two read ports and one write port.




The size of a register file is proportional to the square of the number of ports it has. Consider the first implementation, we have 2 register files with 2 ports each. Each 2 ported register file takes 22 = 4 units of area compared to a single ported register file. Thus, the size of both register files is 8 times bigger than a single ported register file. The size of the register file in the second implementation is proportional to 32 or 9 units as it has 3 ports.


Three input muxes are shown to steer the appropriate value to registers A and B. Two input muxes, connecting one of the inputs to the bus and the other to one of the two register files can also be used. Doing so, however, may eliminate every possible opportunity to optimize since it is less general. For example, if A and MAR are attached to the same register read port and B and MDR are attached to the other, it is possible that A and MDR want the value of R0 and MAR and B want the value of R1. With two-input muxes, that cannot be done in one cycle.


    1. Letting Next Instruction Fetch occur while the current instruction is executing


An extra ALU dedicated to the computation of PC allows the next PC to be computed while the current instruction is being executed. An additional memory for storing the instructions, called the instruction memory, allows the next instruction to be fetched while the previous instruction is being executed. This is a simple form of prefetching the next instruction.



The changes are shown below:




  1. Improvements in ISA


The ISAs have changed over the years owing to new requirements and technologies. These changes have been motivated by a need of higher performance, lower component/transistor costs and the availability of better compilers.

* Copyright 2008 Manny Alvarez, Anuj Arora, and Derek Chiou, all rights reserved. This work may be reproduced and redistributed, in whole or in part, without prior written permission, provided all copies cite the original source of the document including the names of the copyright holders and “The University of Texas at Austin EE360N Fall 2008, Computer Architecture”.