Extending AddressSanitizer support for C++ collections
@ WarCon 2022
By Disconnect3d & Tacet
1
# about us
Dominik 'Disconnect3d' Czarnota
Tacet
We both work for Trail of Bits
& play CTFs in justCatTheFish team
2
AddressSanitizer 101
3
What is AddressSanitizer?
From https://github.com/google/sanitizers/wiki/AddressSanitizer:
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Average slowdown of the instrumented program is ~2x
4
ASan is really just two parts
5
ASan is really just two parts
Sources�(*.c, *.cpp, �*.h, *.hxx etc.)
Compiler instrumentation
(LLVM, GCC, MSVC)
-fsanitize=address
Instrumented binary linked to ASan dynamic library
6
The dynamic lib implements modified functions like malloc etc.
ASan compiled program memory layout
7
Picture inspired by https://medium.com/@jjuou2/advanced-debugging-and-the-address-sanitizer-8d6232127f53 ;)
Application �memory
Shadow memory
Legend:
Shadow memory mapping
8
f1
f1
f1
00
06
f2
f2
f2
In practice, each shadow byte (below) represents 8 process bytes (above) and its value encodes information.
Application memory
Shadow memory
Shadow byte legend:
Addressable: 00 (all 8 bytes are accessible)
Partially addressable: 01 02 03 04 05 06 07
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
8 byte block
8 byte block
Shadow memory mapping
9
f1
f1
f1
00
06
f2
f2
f2
In practice, each shadow byte (below) represents 8 process bytes (above) and its value encodes information.
Application memory
Shadow memory
Shadow byte legend:
Addressable: 00 (all 8 bytes are accessible)
Partially addressable: 01 02 03 04 05 06 07
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
8 byte block
8 byte block
Shadow memory mapping
10
f1
f1
f1
00
06
f2
f2
f2
In practice, each shadow byte (below) represents 8 process bytes (above) and its value encodes information.
Application memory
Shadow memory
Shadow byte legend:
Addressable: 00 (all 8 bytes are accessible)
Partially addressable: 01 02 03 04 05 06 07
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
8 byte block
8 byte block
Example
11
void inc(uint64_t* ptr) {
*ptr += 1;
}
Example
12
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
Example
13
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
Example
14
Compiler flags: -O3 -fsanitize=address |
inc(unsigned long*): mov rax, rdi shr rax, 3 cmp BYTE PTR [rax+2147450880], 0 jne .L7 add QWORD PTR [rdi], 1 ret .L7: push rax call __asan_report_load8 |
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
Example
15
Compiler flags: -O3 -fsanitize=address |
inc(unsigned long*): mov rax, rdi shr rax, 3 cmp BYTE PTR [rax+2147450880], 0 jne .L7 add QWORD PTR [rdi], 1 ret .L7: push rax call __asan_report_load8 |
if (shadow_memory[ptr] != 0)� __asan_report_load8()
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
Example
16
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
Compiler flags: -O3 -fsanitize=address |
inc(unsigned long*): mov rax, rdi shr rax, 3 cmp BYTE PTR [rax+2147450880], 0 jne .L7 add QWORD PTR [rdi], 1 ret .L7: push rax call __asan_report_load8 |
if (shadow_memory[ptr] != 0)� __asan_report_load8()
Example
17
Compiler flags: -O3 -fsanitize=address |
inc(unsigned long*): mov rax, rdi shr rax, 3 cmp BYTE PTR [rax+2147450880], 0 jne .L7 add QWORD PTR [rdi], 1 ret .L7: push rax call __asan_report_load8 |
if (shadow_memory[ptr] != 0)� __asan_report_load8()
else
*ptr += 1
void inc(uint64_t* ptr) {
*ptr += 1;
}
Compiler flags: -O3 |
inc(unsigned long*): add QWORD PTR [rdi], 1 ret |
18
$ clang++ -fsanitize=address main.cpp && ./a.out
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
19
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
$ clang++ -fsanitize=address main.cpp && ./a.out
20
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
21
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
22
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
23
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
Shadow bytes around the buggy address:
0x10002d547e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002d547e60: f1 f1 f1 f1 00[f2]f2 f2 00 00 00 00 00 00 00 00
0x10002d547e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
00 => variable x is accessible
F2 => 8-bytes after x are not
24
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
Shadow bytes around the buggy address:
0x10002d547e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002d547e60: f1 f1 f1 f1 00[f2]f2 f2 00 00 00 00 00 00 00 00
0x10002d547e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
inc(&x + 1);
}
25
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
Shadow bytes around the buggy address:
0x10002d547e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002d547e60: f1 f1 f1 f1 00[f2]f2 f2 00 00 00 00 00 00 00 00
0x10002d547e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
26
$ clang++ -fsanitize=address main.cpp && ./a.out
==8226==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd6aa7f328 at pc 0x55c4d0921928 bp 0x7ffd6aa7f2f0 sp 0x7ffd6aa7f2e0
WRITE of size 8 at 0x7ffd6aa7f328 thread T0
#0 0x55c4d0921927 in inc(unsigned long*) main.cpp:6
#1 0x55c4d0921927 in main main.cpp:11
#2 0x7f1e94219c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#3 0x55c4d0921979 in _start (/home/dc/a.out+0x979)
Address 0x7ffd6aa7f328 is located in stack of thread T0 at offset 40 in frame
#0 0x55c4d092182f in main main.cpp:9
This frame has 1 object(s):
[32, 40) 'x' <== Memory access at offset 40 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow main.cpp:6 in inc(unsigned long*)
Shadow bytes around the buggy address:
0x10002d547e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002d547e60: f1 f1 f1 f1 00[f2]f2 f2 00 00 00 00 00 00 00 00
0x10002d547e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
But ASan can't detect all invalid accesses
(let's see some of its limitations)
27
1. Out of bounds may hit another valid data
28
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
- inc(&x + 1);
+ inc(&x + 6);
}
1. Out of bounds may hit another valid data
29
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
- inc(&x + 1);
+ inc(&x + 6);
}
Compiler flags: -O3 -fsanitize=address |
inc(unsigned long*): mov rax, rdi shr rax, 3 cmp BYTE PTR [rax+2147450880], 0 jne .L7 add QWORD PTR [rdi], 1 ret .L7: push rax call __asan_report_load8 |
1. Out of bounds may hit another valid data
30
void inc(uint64_t* ptr) {
*ptr += 1;
}
int main(int argc, char* argv[]) {
uint64_t x = argc;
- inc(&x + 1);
+ inc(&x + 6);
}
Shadow bytes:
0x10002d547e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e60: f1 f1 f1 f1 00 f2 f2 f2 00 00 00 00 00 00 00 00
0x10002d547e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002d547e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
x variable
x+6*8 access
This won't be detected
1.5. Out of bounds may hit "unexpected" invalid data
31
int main(int argc, char* argv[]) {
char *ptr = argv[0];
char* heap = new char[3];
size_t idx = (size_t)(heap-argv[0]+4);
ptr[idx];
1.5. Out of bounds may hit "unexpected" invalid data
32
int main(int argc, char* argv[]) {
char *ptr = argv[0];
char* heap = new char[3];
size_t idx = (size_t)(heap-argv[0]+4);
ptr[idx];
1.5. Out of bounds may hit "unexpected" invalid data
33
int main(int argc, char* argv[]) {
char *ptr = argv[0];
char* heap = new char[3];
size_t idx = (size_t)(heap-argv[0]+4);
ptr[idx];
1.5. Out of bounds may hit "unexpected" invalid data
34
int main(int argc, char* argv[]) {
char *ptr = argv[0];
char* heap = new char[3];
size_t idx = (size_t)(heap-argv[0]+4);
ptr[idx];
SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100003f3c) in main+0xec
Shadow bytes around the buggy address:
0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 02 fa fa 00 fa
=>0x1c0400000010: fa fa[03]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
1.5. Out of bounds may hit "unexpected" invalid data
35
int main(int argc, char* argv[]) {
char *ptr = argv[0];
char* heap = new char[3];
size_t idx = (size_t)(heap-argv[0]+4);
ptr[idx];
SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100003f3c) in main+0xec
Shadow bytes around the buggy address:
0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 02 fa fa 00 fa
=>0x1c0400000010: fa fa[03]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
2. ASan cannot poison prefixes and works only if your allocator returns 8-byte-aligned* pointers
36
2. ASan cannot poison prefixes and works only if your allocator returns 8-byte-aligned* pointers
37
f1
f1
f1
06
f2
f2
f2
f2
Process memory
Shadow memory
8-byte aligned block
Bytes in use
Shadow byte legend:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
2. ASan cannot poison prefixes and works only if your allocator returns 8-byte-aligned* pointers
38
Process memory
8-byte aligned block
Bytes in use
2. ASan cannot poison prefixes and works only if your allocator returns 8-byte-aligned* pointers
39
This information cannot be encoded with ASan today �(with default granularity; otherwise ASan would use much more memory)
f1
f1
f1
??
f2
f2
f2
f2
Process memory
Shadow memory
8-byte aligned block
Bytes in use
Shadow byte legend:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
3. Detecting "container overflows"
(or: understanding how allocated memory is used, �but good luck with that in a general case)
40
3. Detecting "container overflows"
struct cstring {
size_t size;
size_t capacity;
char* data;
};
41
3. Detecting "container overflows"
struct cstring {
size_t size;
size_t capacity;
char* data;
};
cstring s{0, 8, new char[8]};
42
data
capacity=8
3. Detecting "container overflows"
struct cstring {
size_t size;
size_t capacity;
char* data;
};
cstring s{0, 8, new char[8]};
// Fill in string
memcpy(s.data, "ABCDE", 6);
s.size = 6;
43
A
B
C
D
E
\0
data
capacity=8
size=6
3. Detecting "container overflows"
struct cstring {
size_t size;
size_t capacity;
char* data;
};
cstring s{0, 8, new char[8]};
// Fill in string
memcpy(s.data, "ABCDE", 6);
s.size = 6;
// Access the string
foo(s.data[6]);
44
A
B
C
D
E
\0
data
capacity=8
size=6
3. Detecting "container overflows"
struct cstring {
size_t size;
size_t capacity;
char* data;
};
cstring s{0, 8, new char[8]};
// Fill in string
memcpy(s.data, "ABCDE", 6);
s.size = 6;
// Access the string
foo(s.data[6]);
45
A
B
C
D
E
\0
data
capacity=8
size=6
… which ASan does not detect,
because it's "valid" allocated memory
But actually…
this problem is solved for std::vector !
(Thx to Google!)
But not for std::string and std::deque
And this is what we �researched & implemented in �libc++ (LLVM) and libstdc++ (GCC)
46
But actually…
this problem is solved for std::vector !
(Thx to Google!)
But not for std::string and std::deque
And this is what we �researched & implemented in �libcxx (LLVM) and libstdc++ (GCC)
47
But actually…
this problem is solved for std::vector !
(Thx to Google!)
But not for std::string and std::deque
And this is what we �researched & implemented in �libcxx (LLVM) and libstdc++ (GCC)
48
A bit on container overflow history
49
A bit on history
50
A bit on history
51
A bit on history
52
A bit on history
53
A bit on history
54
A bit on history
55
A bit on history
56
A bit on history
57
…and here comes our research in 2022
(which hopefully will be merged? :P)
58
Our work:
extending ASan support for std::string and std::deque �& fuzzing projects with it
59
Started with merging the libc++ patch
60
so we implemented�std::string and std::deque sanitizations�from scratch for both�libc++ and libstdc++
61
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
62
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
63
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
64
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
65
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
66
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
Btw: the representation is distinguished in one of the bits of __size_ or __cap_ (libc++)
let's see how ASan can deal with it
67
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
Short string: ASan can't detect OOB access between __data_[size] and sizeof(string)
68
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
short std::string
size byte
usable data
out of bounds
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
Short string: ASan can't detect OOB access between __data_[size] and sizeof(string)
Long string: ASan can't detect OOB access between __data_[size] and __data[__cap_-1]
69
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
short std::string
size byte
usable data
out of bounds
long std::string
__data_[__cap_]
usable data
out of bounds
So let's see a quick demo ;)
70
Current ASan in llvm/libc++
71
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
Current ASan in llvm/libc++
72
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
Current ASan in llvm/libc++
73
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
$ clang++ -stdlib=libc++ -fsanitize=address -std=c++20 ./main.cpp && ./a.out
i=0, s[i]='a' (97)
i=1, s[i]='b' (98)
i=2, s[i]='c' (99)
i=3, s[i]='' (0)
i=4, s[i]='�' (127)
i=5, s[i]='' (0)
i=6, s[i]='' (0)
i=7, s[i]='�' (-109)
Current ASan in llvm/libc++
74
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
$ clang++ -stdlib=libc++ -fsanitize=address -std=c++20 ./main.cpp && ./a.out
i=0, s[i]='a' (97)
i=1, s[i]='b' (98)
i=2, s[i]='c' (99)
i=3, s[i]='' (0)
i=4, s[i]='�' (127)
i=5, s[i]='' (0)
i=6, s[i]='' (0)
i=7, s[i]='�' (-109)
Current ASan in llvm/libc++
75
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
$ clang++ -stdlib=libc++ -fsanitize=address -std=c++20 ./main.cpp && ./a.out
i=0, s[i]='a' (97)
i=1, s[i]='b' (98)
i=2, s[i]='c' (99)
i=3, s[i]='' (0)
i=4, s[i]='�' (127)
i=5, s[i]='' (0)
i=6, s[i]='' (0)
i=7, s[i]='�' (-109)
Our ASan in llvm/libc++
76
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
$ clang++ -stdlib=libc++ -fsanitize=address \� -std=c++20 ./main.cpp && ./a.out
i=0, s[i]='a' (97)
i=1, s[i]='b' (98)
i=2, s[i]='c' (99)
i=3, s[i]='' (0)
=================================================================
==11==ERROR: AddressSanitizer: container-overflow on address 0x7ffc3adc0a25 at pc 0x0000004dc9c7 bp 0x7ffc3adc09f0 sp 0x7ffc3adc09e8
READ of size 1 at 0x7ffc3adc0a25 thread T0
#0 0x4dc9c6 in main (/code/a.out+0x4dc9c6)
#1 0x7f810d563d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#2 0x7f810d563e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#3 0x41d344 in _start (/code/a.out+0x41d344)
Address 0x7ffc3adc0a25 is located in stack of thread T0 at offset 37 in frame
#0 0x4dc86f in main (/code/a.out+0x4dc86f)
This frame has 1 object(s):
[32, 56) 's' <== Memory access at offset 37 is inside this variable
Our ASan in llvm/libc++
77
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
$ clang++ -stdlib=libc++ -fsanitize=address \� -std=c++20 ./main.cpp && ./a.out
i=0, s[i]='a' (97)
i=1, s[i]='b' (98)
i=2, s[i]='c' (99)
i=3, s[i]='' (0)
=================================================================
==11==ERROR: AddressSanitizer: container-overflow on address 0x7ffc3adc0a25 at pc 0x0000004dc9c7 bp 0x7ffc3adc09f0 sp 0x7ffc3adc09e8
READ of size 1 at 0x7ffc3adc0a25 thread T0
#0 0x4dc9c6 in main (/code/a.out+0x4dc9c6)
#1 0x7f810d563d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#2 0x7f810d563e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#3 0x41d344 in _start (/code/a.out+0x41d344)
Address 0x7ffc3adc0a25 is located in stack of thread T0 at offset 37 in frame
#0 0x4dc86f in main (/code/a.out+0x4dc86f)
This frame has 1 object(s):
[32, 56) 's' <== Memory access at offset 37 is inside this variable
Our ASan in llvm/libc++
78
#include <iostream>
#include <string>
int main() {
std::string s{"abc"};
for (int i=0; i<8; ++i) {
printf("i=%d, s[i]=", i);
printf("'%c' (%d)\n", s[i], s[i]);
}
}
HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow (/code/a.out+0x4dc9c6) in main
Shadow bytes around the buggy address:
0x1000075b00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000075b0140: f1 f1 f1 f1[05]fc fc f3 f3 f3 f3 f3 00 00 00 00
0x1000075b0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000075b0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Container overflow: fc
(...)
std::deque
79
std::deque
Deque == double ended queue
80
std::deque
Deque == double ended queue
81
Our ASan can detect those accesses
std::deque
Deque == double ended queue
82
Our ASan can detect those accesses
BUT…
Up to 7 bytes before the first used element (data[0]) may be not detected due to 8-byte-aligned blocks/shadow memory encoding
std::deque example
83
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque example
84
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque example
85
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque example
86
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque - standard llvm/libc++
$ clang++ -stdlib=libc++ -fsanitize=address \
-std=c++20 ./main.cpp && ./a.out
$ ./a.out
<no output :(>
87
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque - our llvm/libc++
$ clang++ -stdlib=libc++ -fsanitize=address -std=c++20 ./main.cpp && ./a.out
==11==ERROR: AddressSanitizer: container-overflow on address 0x621000000100 at pc 0x0000004dca7d bp 0x7ffc56b18410 sp 0x7ffc56b18408
READ of size 8 at 0x621000000100 thread T0
#0 0x4dca7c in main (/code/a.out+0x4dca7c)
#1 0x7fdcca4a2d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#2 0x7fdcca4a2e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#3 0x41d374 in _start (/code/a.out+0x41d374)
0x621000000100 is located 0 bytes inside of 4096-byte region [0x621000000100,0x621000001100)
allocated by thread T0 here:
#0 0x4da29d in operator new(unsigned long) /llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
#1 0x4e5b94 in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) (/code/a.out+0x4e5b94)
(...)
88
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
std::deque - our llvm/libc++
Shadow bytes around the buggy address:
0x0c427fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff8000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c427fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c427fff8020:[fc]00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8030: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8040: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8050: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8060: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8070: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
89
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
Shadow byte legend:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
std::deque - our llvm/libc++
Shadow bytes around the buggy address:
0x0c427fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c427fff8000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c427fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c427fff8020:[fc]00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8030: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8040: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8050: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8060: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c427fff8070: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
90
#include <iostream>
#include <deque>
int main() {
std::deque<uint64_t> d;
d.push_back(1);
d.push_back(2);
uint64_t* first = &d[0];
d.pop_front();
return *first;
}
Shadow byte legend:
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Fuzzing with our ASan �sanitizations of string and deque
91
Fuzzing with our ASan
92
Fuzzing with our ASan
93
Fuzzing with our ASan
94
Fuzzing with our ASan
95
Some problems
96
Some problems we had
97
Some problems we had
98
Some problems we had
99
Some problems we had
100
Screenshot of https://compiler-rt.llvm.org/
compiler-rt
INTERCEPTOR(void*, malloc, uptr size) {
if (DlsymAlloc::Use())
return DlsymAlloc::Allocate(size);
ENSURE_ASAN_INITED();
GET_STACK_TRACE_MALLOC;
return asan_malloc(size, &stack);
}
101
Code from llvm-14.x, asan_malloc_linux.cpp file
Some problems we had
102
Where the idea for research �came from?
… from an audit
103
Research origin
#include <iostream>
#include <vector>
#include <string>
int main() {
char text[] = "abcd\0XXX" ;
std::string bar{"ABCD" };
// We explicitly pass size of text so the string will contain the embedded null bytes
std::string foo{text, sizeof(text)};
auto result = std::equal(foo.begin(), foo.end(), bar.begin(), [](char a, char b) {
std::cout << "Comparing '" << a << "' with '" << b << "'" << std::endl;
return std::tolower(a) == std::tolower(b);
});
std::cout << foo << "==" << bar << " => " << result << std::endl;
}
104
Research origin
#include <iostream>
#include <vector>
#include <string>
int main() {
char text[] = "abcd\0XXX" ;
std::string bar{"ABCD" };
// We explicitly pass size of text so the string will contain the embedded null bytes
std::string foo{text, sizeof(text)};
auto result = std::equal(foo.begin(), foo.end(), bar.begin(), [](char a, char b) {
std::cout << "Comparing '" << a << "' with '" << b << "'" << std::endl;
return std::tolower(a) == std::tolower(b);
});
std::cout << foo << "==" << bar << " => " << result << std::endl;
}
$ g++ equals.cpp -std=c++2a -fsanitize=address
$ ./a.out
Comparing 'a' with 'A'
Comparing 'a' with 'B'
Comparing 'a' with 'C'
Comparing 'a' with 'D'
Comparing '' with ''
Comparing 'X' with ''
abcdXXX==ABCD => 0
105
Research origin
#include <iostream>
#include <vector>
#include <string>
int main() {
char text[] = "abcd\0XXX" ;
std::string bar{"ABCD" };
// We explicitly pass size of text so the string will contain the embedded null bytes
std::string foo{text, sizeof(text)};
auto result = std::equal(foo.begin(), foo.end(), bar.begin(), [](char a, char b) {
std::cout << "Comparing '" << a << "' with '" << b << "'" << std::endl;
return std::tolower(a) == std::tolower(b);
});
std::cout << foo << "==" << bar << " => " << result << std::endl;
}
$ g++ equals.cpp -std=c++2a -fsanitize=address
$ ./a.out
Comparing 'a' with 'A'
Comparing 'a' with 'B'
Comparing 'a' with 'C'
Comparing 'a' with 'D'
Comparing '' with ''
Comparing 'X' with ''
abcdXXX==ABCD => 0
106
Out of bounds access not detected
Tricky things or further limitations
107
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
In libc++ there is also _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which has different order of fields for potential performance gains ¯\_(ツ)_/¯
108
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __long {
pointer __data_;
size_type __size_;
size_type __cap_;
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
In libc++ there is also _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which has different order of fields for potential performance gains ¯\_(ツ)_/¯
109
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __long {
pointer __data_;
size_type __size_;
size_type __cap_;
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
std::string (code from llvm-14.0.5/libcxx)
Has two representations: short and long
In libc++ there is also _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which has different order of fields for potential performance gains ¯\_(ツ)_/¯
110
struct __long {
size_type __cap_;
size_type __size_;
pointer __data_;
};
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __long {
pointer __data_;
size_type __size_;
size_type __cap_;
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
Short String Optimization - Metadata byte
111
Default layout
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
Short String Optimization - Metadata byte
112
Default layout
Metadata
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
Short String Optimization - Metadata byte
113
Default layout
Metadata
short std::string
size byte
usable data
out of bounds
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
Short String Optimization - Metadata byte
114
Default layout
Alternate layout
Metadata
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
Short String Optimization - Metadata byte
115
Default layout
Alternate layout
Metadata
Metadata
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
Short String Optimization - Metadata byte
116
Default layout
Alternate layout
Metadata
Metadata
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
But we can't poison (encode in shadow memory) suffixes
Short String Optimization - Metadata byte
117
Default layout
Alternate layout
Metadata
Metadata
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
So what can we do?
Short String Optimization - Metadata byte
118
Default layout
Alternate layout
Metadata
Metadata
Metadata
Content
struct __short {
union {
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
struct __short {
value_type __data_[__min_cap];
struct : __padding<value_type> {
unsigned char __size_;
};
};
00 02 00
Possible shadow bytes encoding
Short String Optimization - Metadata byte
119
Metadata
Content
00 02 00
Possible shadow bytes encoding
Every byte accessible
Short String Optimization - Metadata byte
120
Metadata
Content
00 02 fc
Our shadow bytes encoding
In our impl we poison it with "fc" aka "container overflow"
Short String Optimization - Metadata byte
121
Metadata
Content
00 02 fc
Our shadow bytes encoding
In our impl we poison it with "fc" aka "container overflow"
So how do we access it?
Short String Optimization - Metadata byte
122
Metadata
Content
00 02 fc
Our shadow bytes encoding
In our impl we poison it with "fc" aka "container overflow"
So how do we access it?
#define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((no_sanitize("address")))
_LIBCPP_INLINE_VISIBILITY
_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
size_type __get_short_size() const _NOEXCEPT
{return __r_.first().__s.__size_ >> 1;}
Short String Optimization - Metadata byte
123
Metadata
Content
00 02 fc
Our shadow bytes encoding
In our impl we poison it with "fc" aka "container overflow"
So how do we access it?
#define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((no_sanitize("address")))
_LIBCPP_INLINE_VISIBILITY
_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
size_type __get_short_size() const _NOEXCEPT
{return __r_.first().__s.__size_ >> 1;}
Short String Optimization - Metadata byte
124
Metadata
Content
00 02 fc
Our shadow bytes encoding
In our impl we poison it with "fc" aka "container overflow"
So how do we access it?
#define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((no_sanitize("address")))
_LIBCPP_INLINE_VISIBILITY
_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
size_type __get_short_size() const _NOEXCEPT
{return __r_.first().__s.__size_ >> 1;}
Alignment - poisoning objects memory
(Object size and address.)
125
Short String Annotations
126
Only when:
Alignment - why it’s important?
127
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
Alignment - why it’s important?
128
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
data
&size
Alignment - why it’s important?
129
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
data
&size
9 bytes
Alignment - why it’s important?
130
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
data
&size
9 bytes
More than one ASan block, but less than two!
Alignment - why it’s important?
131
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[1]{{"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
Alignment - why it’s important?
132
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[1]{{"ABCDE", 6}};
//…
}
It’s aligned!
s[0].data
&s[0].size
Alignment - why it’s important?
133
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[1]{{"ABCDE", 6}};
//…
}
It’s aligned!
s[0].data
&s[0].size
First block is data buffer
Alignment - why it’s important?
134
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[1]{{"ABCDE", 6}};
//…
}
It’s aligned!
s[0].data
&s[0].size
Second block has size byte and 7 more bytes
Alignment - why it’s important?
135
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[1]{{"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
06
01
Shadow memory
Alignment - why it’s important?
136
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
Alignment - why it’s important?
137
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
It’s still aligned!
Alignment - why it’s important?
138
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
It’s still aligned!
Same.
Alignment - why it’s important?
139
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
It’s still aligned!
SSO case, first 7 bytes in use
Alignment - why it’s important?
140
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
It’s still aligned!
First byte is from the buffer…
Alignment - why it’s important?
141
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
s[0].data
&s[0].size
s[1].data
&s[1].size
It’s still aligned!
Second byte is in use (size)
Alignment - why it’s important?
142
struct cs_buff {
char data[8];
uint8_t size;
uint8_t capacity() {return 8;}
};
void bar() {
cs_buff s[2]{{"ABCDE", 6}, {"ABCDE", 6}};
//…
}
A
B
C
D
E
\0
6
s[0].data
&s[0].size
A
B
C
E
\0
6
D
s[1].data
&s[1].size
06
07
Shadow memory
??
Alignment - why it’s important?
You can see that problem when:
143
Thank you for listening!
Do you have any questions?
144
Thank you for listening!
Do you have any questions?
To sum up "Extending AddressSanitizer to support C++ collections"
145
Extending AddressSanitizer support for C++ collections
@ WarCon 2022
By Disconnect3d & Tacet
146
Brudnopis / mini plan
147
Które slajdy kto omawia
148
Które slajdy kto omawia
Umówić się jak mówimy ASan
Powiedzieć coś o ToB na poczatku bo mało kto wie
Może outline do prezki
Memory layout sanitizera – fajnie dodać że ASan zmienia layout pamięci
149