Cache Pathology Report of Linux System Calls
LINUX.CONF.AU
21-25 January 2019 Christchurch, NZ
The Linux of Things
|
#LCA2019
|
@linuxconfau
Xi Yang
Confluent and Australian National University yangxi.github.io
1
2
Tail latency
400 millisecond delay decreased searches/user by 0.59%. [Jack Brutlag, Google]
3
Each tail request has an unique tale
4
Powerful abstractions hide root causes of tail latency
Client
Server
Server
5
Helpful signals are isolated and lost
6
SHIM: a high-frequency continuous profiler (LCA 2016)
SHIM 10 MHz
100 KHz
Sampling CPU perf counters at 1 KHz
IPC
IPC
IPC
7
SHIM
One write system call
8
Tracing by continuous sampling
Channel
Channel
Channel
Channel
E1
E2
E3
E1
E2
E3
E4
while(1)
for each channel
if changed
push_to_stream()
Correlated profiling stream
9
10
11
Example signals and their connections
12
Kernel activities of a normal Kafka produce request
The two write system calls (14856 cycles + 9736 cycles) take about 17% processing cycles of the request
13
One normal Lucene request
14
Xmit delay, JVM GC delay
JVM GC
Nagle’s algorithm
15
Long page faults after deep sleep
Page fault
16
Diagnosing slow page faults with ftrace and trace-cmd
RESOURCE_STALLS:SB:k / NHALTED_CORE_CYCLES:k = 73%
trace-cmd record -p function_graph -g '__do_page_fault’
--max-graph-depth 5
17
Correlate SHIM streams with ftrace records
18
Questions?
19
20
21
Google Protocol Buffer
22
The 25 μs life of the address_book.SerializeToOstream(&output).
Sampling at 5 MHz, every 608 cycles
22
Edit Master text styles
23
24
Edit Master text styles
25
26
Thank You!
27