1 of 58

MIPS Intro

COMP1521 23T2: lec01+02

2 of 58

How do our programs execute?

In COMP1[59]11:

  • We run a compiler (dcc?)
  • ./hello
  • profit ??

What’s going on here? What’s even in hello?

3 of 58

Abiram’s show and tell: hard disk drives

Long-term, non-volatile storage

  • non-volatile means that data is preserved when power shuts off
  • relatively cheap (1TB = 1024GB of storage for under $50)

This is where we typically save files! *

  • eg. photos, videos, documents, C code, League of Legends

Okay - so my compiler spat out hello and saved it to my hard drive - what next?

* hard disk drives are much less common these days - replaced by SSDs which are functionally equivalent

4 of 58

Abiram’s show and tell: hard disk drives

Long-term, non-volatile storage

  • non-volatile means that data is preserved when power shuts off
  • relatively cheap (1TB = 1024GB of storage for under $50)

This is where we typically save files! *

  • eg. photos, videos, documents, C code, League of Legends

Okay - so my compiler spat out hello and saved it to my hard drive - what next?

* hard disk drives are much less common these days - replaced by SSDs which are functionally equivalent

5 of 58

Abiram’s show and tell: RAM (or ‘memory’)

  • A program needs to be ‘in memory’ in order for it to run
    • ‘memory’ typically refers to RAM
    • Communicating between the CPU and drives is too slow
  • RAM is just a massive 1D array which we divide into sections
    • An address is really just an ‘index’ into that array
  • RAM is volatile (flushed when it loses power)

6 of 58

Abiram’s show and tell: RAM (or ‘memory’)

  • hello contains information on how to set up memory
    • What instructions does the CPU need to follow?
    • What strings do we need loaded into memory?
    • Variables take up room!
      • Global variables are relevant in this course!
      • What about local variables?
    • Where do we put malloced memory?

7 of 58

Abiram’s show and tell: the CPU

  • We have instructions in RAM!
  • The CPU can fetch an instruction from memory
    • An instruction consists of an operator, and zero or more operands

8 of 58

But wait…

We’ve discussed memory and storage drives as being a place to store things.

  • But how is information actually stored?
  • Computers are really just massive circuits
    • Can think of electricity as being off or on
    • 0 or 1 - this is a base-2 system
  • All data on a computer is represented as binary behind-the-scenes
    • This will become incredibly important in Week 5

9 of 58

Abiram’s show and tell: the CPU

  • We have instructions in RAM!
  • The CPU can fetch an instruction from memory
  • Circuitry within the CPU decodes the instruction to determine what to do
  • The CPU then executes that instruction, before moving on to fetch the next instruction!

10 of 58

Inside a CPU

11 of 58

What can instructions do?

  • Computations: eg. add, subtract, multiply, divide, bitwise (Week 5), …
  • Load/store: no point having memory if we can’t modify it or read it
  • Branch: jump to execute different instructions
    • Can’t have logic (eg. if statements) if our program continues linearly
  • System calls: call-a-friend for help - more on this soon

and more!

12 of 58

A day in the life of a CPU - as C code

int program_counter = START_ADDRESS;

while (1) {

// Fetch an instruction from memory

int instruction = memory[program_counter];

// Move to the next instruction

program_counter++;

// Execute the next instruction

execute(instruction, &program_counter);

// ^ note: some instructions may

// modify the program counter

}

13 of 58

Writing instructions ourselves

In this course we will be writing CPU instructions ourselves instead of making a compiler do it.

Why might we do this?

  • Optimising code for performance
    • Less instructions = faster to execute = saving picoseconds!
  • Sometimes it’s necessary
    • eg. writing code to interact directly with a device (i.e. drivers)
  • Form a better understanding of how a compiled program executes
    • Primary reason in this course
    • Can be helpful when debugging
    • Also handy to identify security vulnerabilities and exploit binaries (see: COMP6447)

14 of 58

Assembly

Instructions are really just 0s and 1s

  • Would be a pain to read/write literal instructions
  • Instead, we use assembly language to form a human-readable representation of each instruction
    • Each instruction we write in assembly language typically represents a single CPU instruction
    • An assembler translates this to binary CPU instructions

15 of 58

A sample instruction + assembly

00100001000010010000000000001100

addi $t1, $t0, 12

16 of 58

Assembly

  • may also add some niceties such as constants
  • give us a way to configure other memory sections (for global variables, strings, etc.)
  • give us labels - a way to name points in memory without having to deal with addressses

17 of 58

Instruction sets

  • Different types of CPUs may speak different ‘languages’
    • That is, they understand a different set of instructions
    • Some instruction sets may be more complex than others
    • Influenced by different design choices
  • Some examples include x86, ARM, PowerPC, RISC V
  • In COMP1521, we learn the MIPS instruction set architecture
    • Relatively simple and well-known architecture
      • Once used everywhere from console to supercomputers
      • Still sometimes used in routers, TVs
    • Lots of learning resources available
    • Good stepping stone if you need to branch out to other ISAs

18 of 58

“But I don’t have a MIPS CPU!”

We can’t run our MIPS instructions on our x86-64/ARM CPUs.

Instead, we use an emulator called mipsy:

  • recreates the behaviour of a real MIPS CPU
    • written by Zac* (past course admin, now graduated and lecturing COMP6991)
    • can optionally download and run on your own machine: https://github.com/insou22/mipsy/
    • comes with a command-line interface to run in your terminal
  • mipsy_web builds on top of mipsy and runs entirely in your browser
    • written by Shrey* and linked on course website: https://cgi.cse.unsw.edu.au/~cs1521/mipsy
  • vscode extension
    • written by Xavier 🎉 - can download the ‘mipsy editor features’ extension

* some contributions from Josh Harcombe, Dylan Brotherston and me :)

19 of 58

When will he shut up and actually write a MIPS program?

20 of 58

soon™

two more things to cover.

21 of 58

Registers

  • memory is fast, but not fast enough
  • still physically separate from the rest of the CPU

The CPU has a small amount of storage on the chip itself:

  • cache: not covered in COMP1521, keeps copies of frequently accessed memory
  • registers:
    • 32 general-purpose registers (32-bits each, same size as a typical C integer)
    • floating point registers used for non-integer arithmetic, not covered in COMP1521
    • Hi/Lo are special registers used for mult/div - not too important in this course
    • program counter keeps track of which instruction to fetch and execute next
      • modified by branch/jump instructions

22 of 58

Registers

Almost all of our computations happen between registers!

Want to multiply 2 and 3 and store the result�Load 2 and 3 into registers:

And store the result:

li $t0, 2

li $t1, 3

mul $t2, $t0, $t1

23 of 58

Registers

Registers are denoted by a $ and can be referred to using a number ($0…$31) or by symbolic names ($zero…$ra)

$zero ($0) is special!

  • Always has the value 0 -> attempts to change it have no effect

$ra ($31) is also special!

  • Directly affected by two instructions we use in Week 3

24 of 58

Registers

Could use the other 30 registers however we please technically, but there are some conventions we have to follow - will be discussed in next week’s tutes + Week 3 lectures.

25 of 58

Relevant registers (for now)

  • $t0 to $t9 are free real estate - can use however we want
  • Will also need $v0, $a0, $ra for certain things at the moment
  • Should not need to use any other registers (yet)
    • We will cover the other registers when we talk about functions in Week 3

26 of 58

System calls

Our programs are useless!

Let’s go back and look at the types of instructions mentioned earlier:

27 of 58

What can instructions do?

  • Computations: eg. add, subtract, multiply, divide, bitwise (Week 5), …
  • Load/store: no point having memory if we can’t modify it or read it
  • Branch: jump to execute different instructions
    • Can’t have logic (eg. if statements) if our program continues linearly
  • Move: copy values between registers
  • System calls: call-a-friend for help 👀

and more!

28 of 58

System calls

  • None of the instructions we have access to can interact with the outside world (eg. printing, scanning)
  • Instead, we request the operating system to perform these tasks for us - this process is called a system call
    • The operating system can access privileged instructions on the CPU (eg. communicating to other devices)
    • mipsy simulates a very basic operating system
    • Will explore real system calls and their raison d’etre in the second half of the course

29 of 58

Common mipsy syscalls

We won’t use syscalls 8, 12 much in COMP1521 - most input will be integers.

30 of 58

Other mipsy syscalls - seldom used

Probably not needed for COMP1521 - except maybe challenge exercises/provided code.

31 of 58

The system call workflow

  • We specify which system call we want in $v0
    • eg. print_int is syscall 1:

  • We specify arguments (if any)

  • We transfer execution to the operating system
    • The OS will fulfil our request if it looks sane

  • Some syscalls may return a value - check syscall table

li $v0, 1

li $a0, 42

syscall

32 of 58

MIPS and mipsy documentation

Literally your best friend (it’ll even be there for you in the exam 🥺)

33 of 58

Lecture chat

  • Place to ask questions/make comments in the lecture (mostly) anonymously, if you like
    • Can deanonymise if the need arises - please follow UNSW Code of Conduct
    • Don’t spam
    • Supports Discord Markdown!
  • Mild shitposting is fine, in moderation
  • Don’t make me blacklist you >:(

34 of 58

Lecture chat

35 of 58

Recap of lec01

  • Exploring different types of storage/memory
  • RAM contains everything a program needs in a given moment
  • Instructions!
  • Assembly language!
  • Registers!
  • System calls!

36 of 58

The system call workflow

  • We specify which system call we want in $v0
    • eg. print_int is syscall 1:

  • We specify arguments (if any)

  • We transfer execution to the operating system
    • The OS will fulfil our request if it looks sane

  • Some syscalls may return a value - check syscall table

li $v0, 1

li $a0, 42

syscall

37 of 58

Finally, we can write hello world.

38 of 58

DISCLAIMER:

Code written in lectures may not necessarily have the best style!

  • Lecture code is meant to be quick and dirty, to demonstrate a concept
  • Will quickly overview good style soon, but refer to your tutor, tut solutions, lab solutions

39 of 58

li vs la vs move

  • li (load immediate) is for immediate, fixed values that you need to load into a register with an instruction
  • la (load address) is for loading fixed addresses into a register
    • remember, labels really just represent addresses!
  • move is for copying values between two registers

40 of 58

Syntax overview

Assembly language programs contain:

  • Assembly instructions, each on their own line
    • These are generally a 1:1 mapping from CPU instructions to real instructions
    • However, assemblers also provide pseudo-instructions for convenience
    • Some of these assembly pseudo-instructions turn into 2-3 real CPU instructions
      • li is an example - ask why on the forum if curious!
  • Labels … appended with :
  • Comments … starting with a #
  • Directives … symbol beginning with .
  • Constant definitions - like #defines in C:

MAX_NUMBERS = 256

41 of 58

Style

  • We generally don’t indent to show structure
    • i.e no indenting within conditionals, if statements, etc.
  • Instead:
    • don’t indent labels
    • indent instructions by one step
    • have equivalent C code as inline comments
  • Huge recommendation: indent with 8-wide tabs
    • Ask on forum if anyone wants my vscode config

42 of 58

Simplified C

Translating C code directly to MIPS is not fun

Pro strat - simplify your C code and then translate it:

  • Map down to ‘simplified’ C
    • Simplified C is generally written so that each line of C code maps to one MIPS instruction
    • Compile your simplified C and make sure it still works as expected
    • Translate each line of simplified C to MIPS
    • Profit!!

43 of 58

MIPS Control

COMP1521 23T2: lec02

44 of 58

So far…

All of our programs so far have implemented fixed, predictable behaviour.

  • Execute linearly - that is, we always go down to the next instruction

However, what if we want to implement logic in our code?

  • If statements, where we may not always execute the same code, depending on a condition
  • For/while loops, where we may want to repeat the same instructions?

if/else and loops don’t exist in MIPS - we have to use branching to implement these ourselves

45 of 58

Branch/jump instructions

  • Allows you to transfer the flow of execution to a different instruction conditionally
    • except b, which is unconditional
  • Also j, jal, jalr, jr - unconditional jump instructions which we will talk about in MIPS Functions
  • Can replace with a constant in mipsy

46 of 58

In other words

A lot of these branch instructions are of the form:

“if condition is true, jump to instruction”

How do we implement this for our simplified C code?

47 of 58

COMP1511 staff hate this one simple trick!

In C, goto allows jumping to any arbitrary point within a program - as long as we define a label - meaning we can effectively yeet around within a program however we wish.

48 of 58

Simplifying if, if/else:

print_if_even, odd_even

49 of 58

goto is cool for simplification!

but don’t use it in your actual C programs.

  • goto makes programs more difficult to read
  • goto makes it hard for compilers to optimise code, resulting in slower programs
  • In general, do not use goto without good reason!
    • Typically only kernel/embedded programmers use goto

50 of 58

More complex conditionals: || and soft serve machines

if (milk_age > 48 ||

milk_level < 10) {

printf("Replace milk\n");

} else {

printf("Milk okay!\n");

}

printf("Done!\n");

if (milk_age > 48) goto milk_replace;

if (milk_level < 10) goto milk_replace;

printf("Milk okay!\n");

goto milk_replace__end;

milk_replace:

printf("Replace milk\n");

milk_replace__end:

printf("Done!");

51 of 58

More complex conditionals: &&

if (x >= 0 && x <= 100) {

// in bounds

} else {

// out of bounds

}

return 0;

if (x < 0 || x > 100) {

// out of bounds

} else {

// in bounds

}

return 0;

Invert the condition to use || (De Morgan’s Law)

52 of 58

More complex conditionals: &&

if (x < 0 || x > 100) {

// out of bounds

} else {

// in bounds

}

return 0;

Split into separate conditionals:

if (x < 0) goto x_out_of_bounds;

if (x > 100) goto x_out_of_bounds;

// in bounds

goto epilogue;

x_out_of_bounds:

// out of bounds

epilogue:

return 0;

53 of 58

Simplifying loop structures

  • for loops should be broken down to while loops
  • while loops should be broken down into if/goto

General structure:

  • loop init
  • loop condition (do we need to exit the loop?)
  • loop body
  • loop step
  • loop end

Use labels to show structure!

54 of 58

Counting to 10

for (int i = 0; i < 10; i++) {

printf("%d\n", i);

}

int i = 0;

while (i < 10) {

printf("%d\n", i);

i++;

}

55 of 58

Counting to 10

int i = 0;

while (i < 10) {

printf("%d\n", i);

i++;

}

loop_i_to_10__init:;

int i = 0;

loop_i_to_10__cond:

if (i >= 10) goto loop_i_to_10__end;

loop_i_to_10__body:

printf("%d", i);

putchar('\n');

loop_i_to_10__step:

i++;

loop_i_to_10__end:

// ...

56 of 58

Simplifying for loops:

sum_100_squares

57 of 58

Sidenote: C break/continue

break can be used in a loop to completely exit the loop.

The loop condition here makes this look like an infinite loop:

but break means it’s possible for the loop to be exited.�In simplified C/MIPS, a break is really just equivalent to going to the loop’s end label.

Avoid writing C code with break where possible.

while (1) {

int c = getchar();

if (c == EOF) break;

}

58 of 58

Sidenote: C break/continue

continue can be used to proceed to the next iteration of a for loop.

This would be a (terrible) way to print even numbers:

In simplified C/MIPS, a continue is really just equivalent to going to the loop’s step label.

Avoid writing C code with continue where possible.

for (int i = 0; i < 10; i++) {

if (i % 2 != 0) continue;

printf("%d\n", i);

}