A Variable is a Named Memory Cell
What is a variable in computer
programming? How do computers remember and think when they're just a bunch of
electric currents on a chip? How are numbers and words stored in the memory cells of a computer?
In this chapter, we'll answer these questions and look at computing from the bottom up.Electric Currents
A Computer is really just electric currents on a chip. The random access memory (RAM) in the computer is just a bunch of electric currents. But we don't want to actually twiddle the electronic currents when we program. Instead, levels of abstraction allow us to program in high-level languages.
The first abstraction is: if electricity is running through a current, it is a 1. If no electricity is running through it, we call that a 0. Everything-- operating systems, programs, videos-- is built from sequences of these 0s and 1s. This is why binary numbers, and powers of 2 (e.g., 64) are fundamental to computing.
Each current is called a bit. A bit holds either a 0 or a 1. These 0s and 1s form the basis of binary arithmetic, which we'll discuss in another chapter. In a nutshell, all data is really just a sequence of bits, e.g., 00011010.
Now humans aren't so great with 0s and 1s-- not much would get done if we had to communicate at this level. Much of the history of computer science has involved abstraction-- hiding the details of the 0/1 machine, and allowing humans to talk to a computer in something closer to natural language.High-Level Programming Languages
Java, Python, C and Ruby are examples of High-Level Programming Languages. High-level programming languages allow people to use symbolic names instead of just 0s and 1s.
Translators-- Compilers and interpreters--do the dirty work of converting high-level commands into the bits that the computer:
A translator is a computer program itself, with a special job of facilitating the execution of other computer programs. When new hardware is created, translators are the first programs developed after an operating system. Because it is a relatively low-level language, a C compiler is often the first translator developed.
A Variable is a Named Memory Cell
A computer’s memory is a contiguous sequence of slots, or memory cells. Most memory systems are byte-addressable: a number address is assigned to each 8 bit slot. So addresss 0 refers to bits 0-7, address 1 refers to bits 8-15, and address 50 refers to bits 400-407. All types of data-- whole numbers, floating point numbers, strings of characters, boolean values-- can be stored in memory cells. Each data type is allocated some number of bytes for storage.One of the key abstractions that a programming language provides is symbolic names for memory cells. Instead of code such as:
mov 345, 8762
which moves the value 345 into memory cell 8762, a programmer can use variables to represent memory cells. The following high-level code assigns a value to a variable 'principal':
principal = 345.
'principal' is really just a name for some memory cell, e.g., 8762, but as a programmer we don't really care where in memory the data is stored, only that we can get to it with the term 'principal. We call it a variable because the value in the cell can change. The above command is called an assignment, and is read 'the variable principal is assigned 345'.
Data Types
High-level languages also allow programmers to work with data types such as integers, floating points, and symbolic characters, as opposed to 0s and 1s. When a programmer refers to a whole number (integer) as in the principal example, the system sets up a 32 bit memory cell to store the number. Of course the number is really just 0s and 1s (well, really just electric currents). But those bits are interpreted as representing a whole number. Note that the same bit sequence could be interpreted as an integer, floating point number, string of symbolic characters, or another data type.
Here are the most common data types and the number of bits used to represent them:
- A character
is a symbol on the keyboard. A character is generally stored in 2
bytes. ASCII and Unicode are protocols for how symbols are mapped to
numbers.
- A string is a sequence of characters, e.g., 'cat'. 2 bytes are stored for each character in the string, along with an end-of-string character (0).
- An integer is a whole number. An integer is generally stored in 4 bytes, or 32 bits.
- A floating point number is a number with a decimal point, e.g., 34.56. A floating point number is stored in 8 bytes.
Assignment Statements
A common program statement is an assignment statement like the following:
principal=10000
This
means "put the number 10000 in the memory cell named 'principal'". Beneath the hood, the
system keeps track of a mapping between variable name and memory cell. For instance,
'principal' might be mapped to the address 8342. Then the assignment statement above would mean:
put 10000 in memory cell 8342
Though it is important to understand what is really going on, with high-level languages the programmer need not be concerned with
the actual address of a variable.
The = in an assignment statement does not mean equals. It means 'is assigned'. We say 'the variable principle is assigned 10000'. A statement like:
principal=principal + 1
is one that can mystify beginning programmers. It means: 'the memory cell principal is assigned its current value plus one.'
The
right-hand expression of the assignment operator ‘=’ is evaluated
first, then placed in the memory cell denoted by the left-side
variable. In our sample, we first evaluate the right-hand side,
'principal + 1'. Part of evaluating is fetching the value of
variables. In this case, the current value of principal is 10000. So we evaluate the right-hand side expression as
10000+1=10001. This value is then placed back into the variable principal.
Here's a simplified example of what happens when a program is executed:
Program Code
principal=10000
interestRate = 0.2
oneYearReturn=principal*interestRate
Memory
| address | value |
| 0 | 10000 |
| 4 | 0.2 |
| 12 | 2000 |
Symbol Table | name | address |
principal
| 0 |
| interestRate | 4 |
| oneYearReturn | 12 |
The
program puts data in contiguous memory. So the variable 'principal' is
at address 0 and it takes 4 bytes because it is an integer. 'interestRate' is place at address 4 and takes 8 bytes because it is a floating point number. This places oneYearReturn at address
12.
Note that the programmer who wrote the interest rate program would never see the symbol table--it is a data structure of the Python interpreter, which handles the mapping between symbolic variable names and actual memory addresses.
Problems
1. Given the 'trace' shown above as example, show the main memory and symbol table after execution of the following:
name="joe"
x= 92
y = x+77
2.
The main memory shown is too simplistic. In a real program, variable
storage would not begin at address 0. How is memory for real computer
applications organized?