Evolution of stack trace capture with BPF
Andrii Nakryiko
BPF_MAP_TYPE_STACK_TRACE
struct {
__uint(type, BPF_MAP_TYPE_STACK_TRACE);
__uint(max_entries, MAX_STACK_TRACE_CNT);
__uint(key, u32);
__uint(value, u64[PERF_MAX_STACK_DEPTH]);
} stacks SEC(".maps");
int bpf_get_stackid(void *ctx, void *map, __u64 flags);
BPF side
id = bpf_get_stackid(ctx, &stacks, BPF_F_USER_STACK);
if (id < 0) {
/* failure */
}
sample.ustack_id = id;
bpf_perf_event_output(ctx, …, &sample, sizeof(sample));
User space side
u64 addrs[PERF_MAX_STACK_DEPTH];
err = bpf_map_lookup_elem(map_fd, &sample.ustack_id, &addrs);
if (err) {
/* error handling */
}
/* first N elements of addrs[] contain captured addresses */
Build ID support
#define BPF_BUILD_ID_SIZE 20
struct bpf_stack_build_id {
__s32 status;
unsigned char build_id[BPF_BUILD_ID_SIZE];
union {
__u64 offset;
__u64 ip;
};
};
Quirks of STACK_TRACE API
Implementation: the good
Specialized hash map implementation.
Stacks deduplication can save space.
Design favors space efficiency and performance.
Does not support hash collisions.
Implementation: the bad
Hash collisions are pretty frequent and unavoidable!
Hash collision handling and tradeoffs controlled through flags:
Implementation: the ugly
Choice between two bad options:
Our production never uses BPF_F_REUSE_STACKID!
Implementation: the ugly
Making it work in practice
"Double buffering" approach:
Cons:
Observations from production
CPU profiling didn't benefit much from deduplication of stacks.
Stack traces are pretty unique, overall.
Let users manage memory.
Evolution: bpf_get_stack()
int bpf_get_stack(void *ctx, void *buf, __u32 size, __u64 flags);
Are done yet?
Not quite.
There are still problems.
Synchronous API: assumptions
Synchronous API: consequences
Synchronous API: limitations
We need a new API
This time, asynchronous!
Asynchronous API: kernel stacks
Asynchronous API: user stacks
API design: overview
API design: deduplication
API design: deduplication
Opinion: seems not worth it to bother.
API design: notifications
How to notify user that stack trace is ready?
API design: notifications
API design: notifications
API design: notifications
API design: notifications
API design: customization
[0] https://lore.kernel.org/all/20240508212605.4012172-3-andrii@kernel.org/
Thank you!