3 of 48

Motivation: Performance

09.09.2021

https://www.quora.com/What-is-the-best-way-to-declare-binary-matrices-in-C

Data Representation is crucial for data science, e.g.,

How to represent a binary matrix? (bits ⇒ ops bit-level)

4 of 48

Motivation: Security

09.09.2021

Similar to code found in FreeBSD’s implementation of getpeername
There are legions of smart people trying to find vulnerabilities in programs

expects type size_t (unsigned), �given an int. what happens? problem?

5 of 48

Outline

Two’s complement
Integer Arithmetic
Bit Manipulation

09.09.2021

we need a deeper understand of how data is represented.

6 of 48

Two issues

Finite representation

There is a limit to the number of integers (put differently, a max value) that can be represented on a fixed number of bytes

Representing positive and negative integers

Sign and values must be represented

More than one way to skin a cat.�What’s a sensible way?

· 6

09.09.2021

one where mathematical operations make sense!

7 of 48

Spoiler: Algebra: Ring (Finite Field) - Unsigned

Modular / Clock arithmetic.�https://stackoverflow.com/questions/7221409/is-unsigned-integer-subtraction-defined-behavior

photo: https://dncsite.wordpress.com/tag/twos-complement/

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(8)

parentheses:�how we interpret �the bits.

8 of 48

Spoiler: Algebra: Ring (Finite Field) - Signed

Modular / Clock arithmetic.�https://stackoverflow.com/questions/7221409/is-unsigned-integer-subtraction-defined-behavior

photo: https://dncsite.wordpress.com/tag/twos-complement/

parentheses:�how we interpret �the bits.

why: because then, if you keep incrementing, you get larger numbers.

note: binary addition overflowed, but correct value got stored.

note: binary addition of 3 data-bits overflowed, thus overwriting the sign bit

9 of 48

Integer Types in C

Signed vs. unsigned

Number of bytes

1B: char
2B: short
4B: int
4B (32 bits) or 8B (64 bits): long
8B: long long

· 9

09.09.2021

10 of 48

Sequences of bits

Shorts in C are coded on 2B. Used for examples in these slides.

2B = 16 bits,

or 4 Hexadecimals (prefixed with Ox)

�Operations on sequences of bits:

bitwise operations: and (&), or (|), not (~), xor (^), shift (<<, >>)
(interpreted as Booleans) 0 is false; anything but 0 is true�logical operations: and (&&), or (||), not (!)
(interpreted as integers) �arithmetic operations: +, -, *, /

· 10

09.09.2021

see book on dec-bin, bin-dec,

dec-hex, hex-dec,�bin-hex, hex-bin

or:�onlinetoolz.net

hex: compact representation of bits.

11 of 48

Bitwise Operations: Examples

0x0000+1 = 0x0001; 0x000E+1 = 0x000F; �0x000F+1 = 0x0010
1 << 15 = 0x8000; 1 << 8 = 0x0100; �1 << 3 = 0x0008
0xEFFF+1 = 0xF000; 0x24BC+1 = 0x24BD
~0x0000 = 0xFFFF; ~0x0001 = 0xFFFE
X+~X = 0xFFFF ; X&~X = 0

· 11

09.09.2021

Bitwise on cos

https://github.com/mellowcandle/bitwise

12 of 48

Signed vs. Unsigned Interpretation

· 12

09.09.2021

Unsigned

Signed (two’s complement)

Sign

Bit

A sequence of bits is interpreted differently�depending on whether the integer type �is unsigned or signed.

0xA000

40960

-24576

https://onlinetoolz.net/unsigned-signed

same bits; different interpretations as numbers.

-x is not just

bits of x w/ a 1 in sign

how much each bit contributes to value

13 of 48

Identical Interpretation?

book has interesting results on �when bits have identical unsigned & signed interpretation.�( outside range, conversion either adds or subtracts 2^w )

14 of 48

Integer Types in C

$ vi /usr/include/limits.h

· 14

09.09.2021

0x0000

0xFFFF

0x8000

0x7FFF

1 bit reserved for sign

there’s 1 more�neg than pos

15 of 48

Signed Integers

Two’s Complement. Representation on N bytes

Positive numbers:

Sign bit is 0
Binary representation of value on (8*N)-1 bits

Negative numbers:

Binary representation of corresponding positive value on 8*N bits
Invert all digits (0 becomes 1; 1 becomes 0)
Add one

· 15

09.09.2021

short int x = 15213;

short int y = -15213;

o.O why?

yup, that checks out. but why?

given 0s and 1s, �how do I interpret it?

16 of 48

Why this works?

Substracting from 00000000 or substracting from 100000000 is the same for all practical purpose (as the borrowing is carried forward beyond the 8th bit).

So, in 2’s complement -x is represented by

2⁸ – x = 100000000 - x = 11111111 + 1 - x = (11111111 - x) + 1

As we have seen (now with 8 bits):

~x + x = 11111111 => (11111111 - x) = ~x

In two’s complement on w bits

-x is represented as 2^w – x

=> ~x + 1

· 16

09.09.2021

EXAMPLE WITH 8 bits

recall: ~x is x with bits flipped

in other words: -x is�“what must I add to x to get 0?”�(answer: ~x + 1)

17 of 48

Two’s complement

What does 0xFFFFFFFF represent?

How about 0x80000000?

· 17

09.09.2021

18 of 48

Two’s complement

What does 0xFFFFFFFF represent?

How about 0x80000000?

· 18

09.09.2021

hint

19 of 48

Bit extension - adding a ring

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(8)

1 in front

0 in front

20 of 48

Bit extension - adding a ring

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(8)

0 in front

1 in front

…

why this way: storing �4-bit value in 5-bit space�⇒ old bits are unchanged!�(new 5th bit is just 0)

21 of 48

Sign extension - adding a ring (bit extension, signed)

1 in front

0 in front

22 of 48

Sign extension - adding a ring (bit extension, signed)

1 in front�(new neg)

0 in front�(new pos)

1 in front�(old neg)

0 in front�(old pos)

23 of 48

Sign extension - adding a ring (bit extension, signed)

…

new pos

new neg

even more positive, �even more negative

this helps explain�the following slide.�(analogy: � arithmetic right shift)

1 in front�(old neg)

0 in front�(old pos)

why this way: storing �4-bit value in 5-bit space�⇒ old bits are unchanged!�(5th bit is 0 if pos, 1 if neg)

24 of 48

Sign Extension

Make k copies of sign bit:

X ′ = x_w_–1,…, x_w_–1, x_w_–1, x_w_–2,…, x₀

· 24

09.09.2021

• • •

X ′

• • •

k copies of Most Significant Bit

Most Significant Bit

Expanding: Converting from smaller to larger integer data type

the overall value of negative numbers does not change.

25 of 48

Sign Extension

· 25

09.09.2021

C automatically performs sign extension

short int x = 15213;

int ix = (int) x;

short int y = -15213;

int iy = (int) y;

Decimal

Hex

Binary

15213

3B 6D

00111011 01101101

15213

00 00 3B 6D

00000000 00000000 00111011 01101101

-15213

C4 93

11000100 10010011

-15213

FF FF C4 93

11111111 11111111 11000100 10010011

easy in C!

(one of the nice properties of �two’s complement)

26 of 48

Sign Extension

Expanding (e.g., short int to int)

Unsigned: zeros added (on left)

Signed: sign extension (as shown on previous slides)

Both yield expected result

Truncating (e.g., unsigned to unsigned short)

Unsigned/signed: bits are truncated

Result reinterpreted

Unsigned: mod operation

Signed: depends on bit pattern (large negative number might be truncated to positive number)

· 26

09.09.2021

27 of 48

Why do we care?

In C type system,

If there is a mix of signed and unsigned values in an expression, then signed values are implicitly cast to unsigned:

The bit pattern is maintained
But re-interpreted!!
Can have unexpected effects => adding or subtracting 2^N

· 27

09.09.2021

28 of 48

Security, Revisited

09.09.2021

29 of 48

memcpy

· 29

09.09.2021

From GNU glibc manual /Appendix A – Language Features /Important Data Types:

30 of 48

Security - Woops

09.09.2021

-528 in two’s complement:

0xFFFFFDF0

Reinterpreted as unsigned within

memcpy: 4294966768 (decimal)

31 of 48

Why do we care?

· 31

09.09.2021

Always be mindful/careful

of unsigned integers!

32 of 48

Outline

Two’s complement
Integer Arithmetic
Bit Manipulation

09.09.2021

33 of 48

Integer Arithmetic

How to deal with finite representation?

Adding two integers encoded on w bytes

Should take w+1 bytes

How to encode the sum on w bytes?

No magic!! The sum will overflow

· 33

09.09.2021

story time (Java standard library)

34 of 48

Unsigned Addition

· 34

09.09.2021

• • •

u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits

UAdd_w(u , v)

UAdd_w(u , v) is u+v mod 2^w

overflow

35 of 48

Signed Addition

· 35

09.09.2021

TAdd and UAdd have Identical Bit-Level Behavior

Signed vs. unsigned addition in C:

int s, t, u, v;

s = (int) ((unsigned) u + (unsigned) v);

t = u + v

Will give s == t

• • •

u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits

TAdd_w(u , v)

overflow

36 of 48

Signed Addition

· 36

09.09.2021

Beware:�This is undefined�behavior in C!

37 of 48

Unsigned Multiplication in C

· 37

09.09.2021

Standard Multiplication Function

Ignores high order w bits

Implements Modular Arithmetic

UMult_w(u , v) = u · v mod 2^w

• • •

u · v

• • •

True Product: 2*w bits

Operands: w bits

Discard w bits: w bits

UMult_w(u , v)

• • •

38 of 48

· 38

09.09.2021

Signed Multiplication in C

Standard Multiplication Function

Ignores high order w bits

Some of which are different for signed vs. unsigned multiplication

Lower bits are the same

• • •

u · v

• • •

True Product: 2*w bits

Operands: w bits

Discard w bits: w bits

TMult_w(u , v)

• • •

39 of 48

Power-of-2 Multiply with Shift

· 39

09.09.2021

Operation

u << k gives u * 2^k

Both signed and unsigned

Examples

u << 3 == u * 8

u << 5 - u << 3 == u * 24

• • •

•••

2^k

u · 2^k

True Product: w+k bits

Operands: w bits

Discard k bits: w bits

UMult_w(u , 2^k)

•••

• • •

•••

TMult_w(u , 2^k)

•••

40 of 48

Outline

Two’s complement
Integer Arithmetic
Bit Manipulation

09.09.2021

41 of 48

Bit Representation

TCP Header

· 41

09.09.2021

Is Ack flag set?

Is any TCP flag set?

Is a single flag set?

How many TCP flags are set?

to answer these, �you need to work with bitwise representations.

small puzzles; �what the assignment is about

42 of 48

Sequence of bits and boolean

00000000 interpreted as False
Byte containing at least a 1, e.g.,�00100010 interpreted as True
X is a sequence of bit

X interpreted as Boolean expression (X != 0)
!X interpreted as Boolean expression (X == 0)

Beware difference between:

Bitwise operations applied on sequence of bits�&, |, ~, ^, >>, <<
Logical operations applied on Booleans�&&, ||, !

· 42

09.09.2021

EXAMPLE WITH 8 bits

43 of 48

TCP Problem

Consider the TCP flags encoded on 8 bits as X.

How do you test whether the Ack flag (0x10) is set?

How do you test whether any flag is set?

How do you test if a single flag is set?

How do you count the number of flags set?

· 43

09.09.2021

44 of 48

Answers

How do you test whether the Ack flag (0x10) is set?

(X & 0x10 )

=> interpreted as false if flag not set, true otherwise

How do you test whether any flag is set?

(X)� => interpreted as false if no flag is set , true otherwise

How do you test if a single flag is set?

((X & (X-1)) == 0) && (X != 0)

=> (X & (X-1)) transforms the rightmost 1-bit into a 0-bit�

· 44

09.09.2021

small puzzles! �some hate it; we like it (beauty)

45 of 48

Answers

How do you count the number of flags set?

Divide and conquer

Trivial for 2 bits

0+0 -> 00

0+1 -> 01

1+0 -> 01

1+1 -> 10

· 45

09.09.2021

46 of 48

Answers

· 46

09.09.2021

x = (x & 0x5555) + ((x >> 1) & 0x5555)

x = (x & 0x3333) + ((x >> 2) & 0x3333)

x = (x & 0x0F0F) + ((x >> 4) & 0x0F0F)

x = (x & 0x00FF) + ((x >> 8) & 0x00FF)

& 0x5555

& 0x3333

& 0x0F0F

x is a short: 2B

Masks

Bitwise�operations�in parallel

& 0x00FF

47 of 48

Motivation: Performance

09.09.2021

https://www.quora.com/What-is-the-best-way-to-declare-binary-matrices-in-C

Fast binary matrix operations

48 of 48

Take Aways

Two complements used to represent signed integers

Two complements procedure is:

Write positive value in binary
Invert all digits
Add 1

Beware implicit cast to unsigned in C!!

Bit manipulation enables efficient operations. �It is a rich algorithmic playground.

· 48

09.09.2021

security

performance