Operating Systems and C�Fall 2022�2. Representing Data
· 1
09.09.2021
Operating Systems and C�Fall 2022�2. Representing Data
· 2
09.09.2021
why do we bother looking into this?
Motivation: Performance
3
09.09.2021
https://www.quora.com/What-is-the-best-way-to-declare-binary-matrices-in-C
Data Representation is crucial for data science, e.g.,
How to represent a binary matrix? (bits ⇒ ops bit-level)
Motivation: Security
4
09.09.2021
expects type size_t (unsigned), �given an int. what happens? problem?
Outline
5
09.09.2021
we need a deeper understand of how data is represented.
Two issues
More than one way to skin a cat.�What’s a sensible way?
· 6
09.09.2021
one where mathematical operations make sense!
Spoiler: Algebra: Ring (Finite Field) - Unsigned
Modular / Clock arithmetic.�https://stackoverflow.com/questions/7221409/is-unsigned-integer-subtraction-defined-behavior
7
photo: https://dncsite.wordpress.com/tag/twos-complement/
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(8)
parentheses:�how we interpret �the bits.
Spoiler: Algebra: Ring (Finite Field) - Signed
Modular / Clock arithmetic.�https://stackoverflow.com/questions/7221409/is-unsigned-integer-subtraction-defined-behavior
8
photo: https://dncsite.wordpress.com/tag/twos-complement/
parentheses:�how we interpret �the bits.
why: because then, if you keep incrementing, you get larger numbers.
note: binary addition overflowed, but correct value got stored.
note: binary addition of 3 data-bits overflowed, thus overwriting the sign bit
Integer Types in C
· 9
09.09.2021
Sequences of bits
Shorts in C are coded on 2B. Used for examples in these slides.
2B = 16 bits,
or 4 Hexadecimals (prefixed with Ox)
�Operations on sequences of bits:
· 10
09.09.2021
see book on dec-bin, bin-dec,
dec-hex, hex-dec,�bin-hex, hex-bin
or:�onlinetoolz.net
hex: compact representation of bits.
Bitwise Operations: Examples
· 11
09.09.2021
Bitwise on cos
https://github.com/mellowcandle/bitwise
Signed vs. Unsigned Interpretation
· 12
09.09.2021
Unsigned
Signed (two’s complement)
Sign
Bit
A sequence of bits is interpreted differently�depending on whether the integer type �is unsigned or signed.
0xA000
40960
-24576
https://onlinetoolz.net/unsigned-signed
same bits; different interpretations as numbers.
-x is not just
bits of x w/ a 1 in sign
how much each bit contributes to value
Identical Interpretation?
13
book has interesting results on �when bits have identical unsigned & signed interpretation.�( outside range, conversion either adds or subtracts 2w )
Integer Types in C
$ vi /usr/include/limits.h
· 14
09.09.2021
0x0000
0xFFFF
0x8000
0x7FFF
1 bit reserved for sign
there’s 1 more�neg than pos
Signed Integers
Two’s Complement. Representation on N bytes
· 15
09.09.2021
short int x = 15213;
short int y = -15213;
o.O why?
yup, that checks out. but why?
given 0s and 1s, �how do I interpret it?
Why this works?
Substracting from 00000000 or substracting from 100000000 is the same for all practical purpose (as the borrowing is carried forward beyond the 8th bit).
So, in 2’s complement -x is represented by
28 – x = 100000000 - x = 11111111 + 1 - x = (11111111 - x) + 1
As we have seen (now with 8 bits):
~x + x = 11111111 => (11111111 - x) = ~x
In two’s complement on w bits
-x is represented as 2w – x
=> ~x + 1
· 16
09.09.2021
EXAMPLE WITH 8 bits
recall: ~x is x with bits flipped
in other words: -x is�“what must I add to x to get 0?”�(answer: ~x + 1)
Two’s complement
What does 0xFFFFFFFF represent?
How about 0x80000000?
· 17
09.09.2021
Two’s complement
What does 0xFFFFFFFF represent?
How about 0x80000000?
· 18
09.09.2021
hint
Bit extension - adding a ring
19
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(8)
1 in front
0 in front
Bit extension - adding a ring
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(8)
0 in front
1 in front
…
why this way: storing �4-bit value in 5-bit space�⇒ old bits are unchanged!�(new 5th bit is just 0)
Sign extension - adding a ring (bit extension, signed)
1 in front
0 in front
Sign extension - adding a ring (bit extension, signed)
1 in front�(new neg)
0 in front�(new pos)
1 in front�(old neg)
0 in front�(old pos)
Sign extension - adding a ring (bit extension, signed)
23
…
new pos
new neg
even more positive, �even more negative
this helps explain�the following slide.�(analogy: � arithmetic right shift)
1 in front�(old neg)
0 in front�(old pos)
why this way: storing �4-bit value in 5-bit space�⇒ old bits are unchanged!�(5th bit is 0 if pos, 1 if neg)
Sign Extension
Make k copies of sign bit:
X ′ = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0
· 24
09.09.2021
• • •
X
X ′
• • •
• • •
• • •
w
w
k
k copies of Most Significant Bit
Most Significant Bit
Expanding: Converting from smaller to larger integer data type
the overall value of negative numbers does not change.
Sign Extension
· 25
09.09.2021
C automatically performs sign extension
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;
Decimal
Hex
Binary
x
15213
3B 6D
00111011 01101101
ix
15213
00 00 3B 6D
00000000 00000000 00111011 01101101
y
-15213
C4 93
11000100 10010011
iy
-15213
FF FF C4 93
11111111 11111111 11000100 10010011
easy in C!
(one of the nice properties of �two’s complement)
Sign Extension
Expanding (e.g., short int to int)
Unsigned: zeros added (on left)
Signed: sign extension (as shown on previous slides)
Both yield expected result
Truncating (e.g., unsigned to unsigned short)
Unsigned/signed: bits are truncated
Result reinterpreted
Unsigned: mod operation
Signed: depends on bit pattern (large negative number might be truncated to positive number)
· 26
09.09.2021
Why do we care?
In C type system,
If there is a mix of signed and unsigned values in an expression, then signed values are implicitly cast to unsigned:
· 27
09.09.2021
Security, Revisited
28
09.09.2021
memcpy
· 29
09.09.2021
From GNU glibc manual /Appendix A – Language Features /Important Data Types:
Security - Woops
30
09.09.2021
-528 in two’s complement:
0xFFFFFDF0
Reinterpreted as unsigned within
memcpy: 4294966768 (decimal)
Why do we care?
· 31
09.09.2021
Always be mindful/careful
of unsigned integers!
Outline
32
09.09.2021
Integer Arithmetic
How to deal with finite representation?
Adding two integers encoded on w bytes
Should take w+1 bytes
How to encode the sum on w bytes?
No magic!! The sum will overflow
· 33
09.09.2021
story time (Java standard library)
Unsigned Addition
· 34
09.09.2021
• • •
• • •
u
v
+
• • •
u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits
UAddw(u , v)
UAddw(u , v) is u+v mod 2w
overflow
Signed Addition
· 35
09.09.2021
TAdd and UAdd have Identical Bit-Level Behavior
Signed vs. unsigned addition in C:
int s, t, u, v;
s = (int) ((unsigned) u + (unsigned) v);
t = u + v
Will give s == t
• • •
• • •
u
v
+
• • •
u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits
TAddw(u , v)
overflow
Signed Addition
· 36
09.09.2021
Beware:�This is undefined�behavior in C!
Unsigned Multiplication in C
· 37
09.09.2021
Standard Multiplication Function
Ignores high order w bits
Implements Modular Arithmetic
UMultw(u , v) = u · v mod 2w
• • •
• • •
u
v
*
• • •
u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bits
UMultw(u , v)
• • •
· 38
09.09.2021
Signed Multiplication in C
Standard Multiplication Function
Ignores high order w bits
Some of which are different for signed vs. unsigned multiplication
Lower bits are the same
• • •
• • •
u
v
*
• • •
u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bits
TMultw(u , v)
• • •
Power-of-2 Multiply with Shift
· 39
09.09.2021
Operation
u << k gives u * 2k
Both signed and unsigned
Examples
u << 3 == u * 8
u << 5 - u << 3 == u * 24
• • •
0
0
1
0
0
0
•••
u
2k
*
u · 2k
True Product: w+k bits
Operands: w bits
Discard k bits: w bits
UMultw(u , 2k)
•••
k
• • •
0
0
0
•••
TMultw(u , 2k)
0
0
0
•••
•••
Outline
40
09.09.2021
Bit Representation
TCP Header
· 41
09.09.2021
Is Ack flag set?
Is any TCP flag set?
Is a single flag set?
How many TCP flags are set?
to answer these, �you need to work with bitwise representations.
small puzzles; �what the assignment is about
Sequence of bits and boolean
Beware difference between:
· 42
09.09.2021
EXAMPLE WITH 8 bits
TCP Problem
Consider the TCP flags encoded on 8 bits as X.
How do you test whether the Ack flag (0x10) is set?
How do you test whether any flag is set?
How do you test if a single flag is set?
How do you count the number of flags set?
· 43
09.09.2021
Answers
(X & 0x10 )
=> interpreted as false if flag not set, true otherwise
(X)� => interpreted as false if no flag is set , true otherwise
((X & (X-1)) == 0) && (X != 0)
=> (X & (X-1)) transforms the rightmost 1-bit into a 0-bit�
· 44
09.09.2021
small puzzles! �some hate it; we like it (beauty)
Answers
How do you count the number of flags set?
Divide and conquer
Trivial for 2 bits
0+0 -> 00
0+1 -> 01
1+0 -> 01
1+1 -> 10
· 45
09.09.2021
Answers
· 46
09.09.2021
x = (x & 0x5555) + ((x >> 1) & 0x5555)
x = (x & 0x3333) + ((x >> 2) & 0x3333)
x = (x & 0x0F0F) + ((x >> 4) & 0x0F0F)
x = (x & 0x00FF) + ((x >> 8) & 0x00FF)
& 0x5555
& 0x3333
& 0x0F0F
x is a short: 2B
Masks
Bitwise�operations�in parallel
& 0x00FF
Motivation: Performance
47
09.09.2021
https://www.quora.com/What-is-the-best-way-to-declare-binary-matrices-in-C
Fast binary matrix operations
Take Aways
Two complements used to represent signed integers
Two complements procedure is:
Beware implicit cast to unsigned in C!!
Bit manipulation enables efficient operations. �It is a rich algorithmic playground.
· 48
09.09.2021
security
performance