2 of 28

How to represent data in a computer?
What is fixed-point and floating point number representation?
How to represent signed numbers?
What is IEEE 754 Standard?

Digital Computers and Arithmetic: Historical perspective and von Neumann computers, Fixed and floating-point representation of numbers, Addition and Subtraction, Multiplication and Division algorithms, Floating- point arithmetic

operations.

Unit-1

What you’ll learn

krraju.in

3 of 28

Digital Computers and Numbers

Numbers can be represented in multiple number base systems.

Decimal (base 10) system uses digits 0 to 9.
Computers use binary (base 2) system, as they are having digital components operating in two states - on and off. Basic unit of information is bit (binary digit) 0 and 1

Binary Coded Decimal: A decimal number represented by four binary bits, 0000 to 1001.

In computing, hexadecimal (base 16) or octal (base 8) number systems are used to represent binary numbers in a compact form.

Octal: 0-7 (Binary Coded Octal 000 to 111)
Hexadecimal: 0-9,A-F (Binary Coded Hexadecimal 0000 to 1111)

krraju.in

4 of 28

Representation of Numbers

Fixed-point

Number

Floating point

Unsigned

Signed

Signed-magnitude

Signed-1’s complement

Signed-2’s complement

(14) 0000 1110

(+14) 0000 1110

(-14) 1000 1110

(+14) 0000 1110

(-14) 1111 0001

(+14) 0000 1110

(-14) 1111 0010

Mantissa

Exponent

(.1001110)

01001110

000100

Fraction

Exponent

krraju.in

5 of 28

Fixed-point Representation

In integer numbers, radix point is fixed and assumed to be to the right of the rightmost digit.

As the radix point is fixed, the number system is referred to as fixed point number system.

Advantage

Consume less computing resources and are easy to perform arithmetic operations

Disadvantage

Relatively limited range of values

Fixed-point representation is convenient for representing numbers with bounded orders of magnitude.

krraju.in

6 of 28

Unsigned Numbers

The fixed point/ integer numbers are represented in signed and unsigned forms.

Unsigned representation

Used to represent positive numbers including zero.
Has an implied binary point between the integer and fraction bits analogous to decimal point.

+∞

krraju.in

7 of 28

Signed Numbers

Signed representation

Sign bit (0 for +ve and 1 for -ve) placed in the left most position of a number

Positive number: Sign bit is 0 and the magnitude is a positive number

Negative number: Sign bit is 1 and the rest of the number is represented in one of the following

Signed-magnitude: Consists of the magnitude and a -ve sign
Signed-1’s complement: 1’s complement of its positive value
Signed-2’s complement: 2’s complement of its positive value

Only one way to represent +ve number and three different ways to represent -ve number

-3

-4

-1

-2

+∞

-∞

krraju.in

8 of 28

Signed Magnitude Representation

For an n-bit number the leftmost bit is the sign bit (0 for a +ve and 1 for a -ve) and the remaining n-1 bits represent the magnitude.

e.g., (8-bit words): +14 = 0 000 1110 and -14 = 1 000 1110
Range: (-2^n-1-1, +2^n-1-1)

Easy to obtain the magnitude and the negative of a number.
Problems:

The sign must be considered during arithmetic operations

It makes arithmetic difficult because the binary sum of a number and its sign-magnitude negative is not zero.

The dual representation of zero (-0 and +0)

krraju.in

9 of 28

One’s Complement Representation

Binary case of diminished radix complement ( like 9s complement for base 10 numbers).
Negative numbers are represented by bit-by-bit complementation of the (positive) magnitude (the process of negation).

e.g., (8-bit words): +14 = 0 000 1110 and -14 = 1 111 0001

Magnitude can be obtained by

clearing the most significant bit (for positive numbers, MSB =0)
complement (for negative numbers, MSB = 1)

Still have a dual representation for zero (all zeros and all ones)

krraju.in

10 of 28

Two’s Complement Representation

Given the representation for +X, the representation for –X is found by taking the 1s complement of +X and adding 1

e.g., (8-bit words): +14 = 0 000 1110 and -14 = 1 111 0010
Difficult to calculate -ve numbers and magnitudes
Converting between two-word lengths

e.g., 8-bit into a 16-bit format requires a sign extension from its current location up to the new location and all bits in the extension take on the value of the old sign bit.

+14 = 0 000 1110 (8 bit) and 0 000 0000 0000 1110 (16 bit)

-14 = 1 111 0010 (8 bit) and 1 111 1111 1111 0010 (16 bit)

krraju.in

11 of 28

An 8-Bit Number Representation

Number	Representation	Example
2’s Complement
x = 0	0	0 (0000 0000)
0 < x < 127	x	77 (0100 1101)
-128 ≤ x <0	256 - \|x\|	-56 (1100 1000)

Number	Representation	Example
Sign-Magnitude
x = 0	0 or 128	0 (0000 0000) 0 (1000 0000)
0 < x < 127	x	77 (0100 1101)
-127 ≤ x <0	128 + \|x\|	-56 (1011 1000)
1’s Complement
x = 0	0 or 255	0 (0000 0000) 0 (1111 1111)
0 < x < 127	x	77 (0100 1101)
-127 ≤ x <0	255 - \|x\|	-56 (1100 0111)

krraju.in

12 of 28

Comparison Complement Systems

Operation		1’s Complement	2’s Complement	Sign bit of the result
	If the carry from the MSB is	Then perform:	Then perform:
Add	0	(Result is in complement form); complement to convert to sign magnitude form		1
	1	(Result is in sign magnitude form); add 1 to the LSB of the result	(Result is in sign magnitude form); neglect the carry	0
Left shift		Copy sign bit into the LSB	Insert 0 into the LSB	Sign bit = MSB of magnitude
Right shift		Copy sign bit into the MSB of the magnitude		Sign bit unchanged

krraju.in

13 of 28

Design considerations

Speed of arithmetic (addition, multiplication)
Speed of conversion
Range of values that can be represented.

Two’s complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values.

Which Representation is Better?

krraju.in

14 of 28

Floating-point Representation

Real number representation format that allows a fixed number of digits before and after the binary (decimal) point.

Represents wide range of numbers, from very small to very large values useful for scientific and engineering applications

In computations, the position of the binary point is to float and is automatically adjusted so it is called floating point representation.

This representation is not exact and can result in rounding errors.

The precision decreases as the numbers get larger or smaller, which can lead to loss of accuracy.

krraju.in

15 of 28

Floating-point Numbers

Floating point numbers have a sign, mantissa (m) or significand (S), radix (r) or base (b) and exponent (e)

Analogous to scientific notation.
No limitation of having constant number of integer and fractional bits .

The floating point representation of a number has two parts

First part: Signed fixed point number called mantissa.
Second part: Designates the position of the binary (decimal) point and is called the exponent.

krraju.in

16 of 28

Expressible Numbers

Overflow and Underflow happen when a value is out of range. If the value is too big, it is overflow, if the value is too small, it is underflow.

krraju.in

17 of 28

Biased Representation

An integer representation that skews the bit patterns so as to look just like unsigned but actually represent negative numbers.

Exponents are signed values in order to represent both tiny and huge values.
In two's complement, the usual representation for signed values, comparison harder.

Biased exponent has advantages in performing bitwise comparison of two floating-point numbers for equality.

krraju.in

18 of 28

Biased Exponent

A constant value (called bias) is added to the true exponent. This allows the exponent to be represented as an unsigned integer.

For IEEE single-precision floats, this value is 127.

An exponent of zero means that 127 is stored in the exponent field.

Biased exponents are used in computer hardware and software to represent floating-point numbers efficiently and accurately.

krraju.in

19 of 28

Implied Base

The base is not represented in the format.

The base is assumed to be 2
In some systems, to increase the range of numbers that can be represented, the base is assumed as 4 or 16.

By increasing the implied base or the number of bits in the exponent field we can increase the range of numbers that can be represented.

Density of Floating-point numbers

krraju.in

20 of 28

Normalization

A number in scientific notation with no leading 0s is called a Normalised Number (e.g., 1.0 × 10^-8or 1.0 × 2^-3, Non normalised form: 10.0 × 10^-9)

Simplifies the exchange of data
Simplifies the arithmetic algorithms to know that the numbers will always be in this form
Increases the accuracy of numbers that can be stored in a word

Maximum possible number of significant digits.

Unnecessary leading 0 is replaced by another significant digit to the right of the decimal (binary) point

krraju.in

21 of 28

Normalized Number

Floating point numbers are usually normalized. The exponent is adjusted so that leading bit (MSB) of mantissa is 1

±0.1bbb…b×2^±e

Where b is either 0 or 1
Leftmost bit of mantissa is always 1

Since it is obviously unnecessary to store this bit, it is implicit.

Zero cannot be normalized and it is represented by all 0’s in the mantissa and exponent

By increasing the number of bits in the mantissa field we can increase the precision of a number.

Doubling the mantissa bits will result in double precision numbers

krraju.in

22 of 28

Normalization Example: Exponentiation to the base 2

Structured Computer Organization by A.S.Tanenbaum, 6th ed. P.677

krraju.in

23 of 28

Normalization Example: Exponentiation to the base 16

Structured Computer Organization by A.S.Tanenbaum, 6th ed. P.677

krraju.in

24 of 28

Precision and Range

In floating point numbers, the mantissa part of the number represents its magnitude or value. The mantissa, together with the exponent, determines the precision and range.

Increasing the mantissa bits will increase the precision of the number.
Increasing the exponent bits will increase the range of the number.

Sign extension means converting a floating point number from one precision to another precision, while maintaining the same value of the number.

From a smaller number of bits to a larger number of bits.

krraju.in

25 of 28

Floating Number Systems

	Proposed standard	IBM system 360, 370	Burroughs B-5500	Control data 6000, 7000 series
Bits used	0-31	0-31	1-47	0-59
Radix	2	16	8	8
Radix point	Before the first bit (with assumed 1 to left)	Before the first digit	After the last digit	After the last digit
Mantissa sign position	0	0	1	0
Value position	9-31	8-31	9-47	12-59
Representation	Sign magnitude, fractional, normalized with most significant bit assumed	Sign magnitude	Sign magnitude	One’s complement of the entire word
Exponent sign position	-	1	2	1
Value position	1-8	1-7	3-8	1-11
Representation	Value +127 (a non zero number must have a nonzero representation)	Value + 64	Sign magnitude	Value +1024 if >=0; value +1023 if <0
Range of value	-126 to 127	-64 to +63	-63 to +63	-1023 to +1023

(Int. to Computer Architecture by H.S.Stone, 2^nd ed. 1988, p.76)

krraju.in

1 of 28

2 of 28

3 of 28

4 of 28

5 of 28

6 of 28

7 of 28

8 of 28

9 of 28

10 of 28

11 of 28

12 of 28

13 of 28

14 of 28

15 of 28

16 of 28

17 of 28

18 of 28

19 of 28

20 of 28

21 of 28

22 of 28

23 of 28

24 of 28

25 of 28

26 of 28

27 of 28

28 of 28