Tales of 60+ billion req a day
java optimizations @ pocketmath
About
Liu Dapeng
Java Tech Lead
PocketMath
World's largest, 100% self-serve, mobile demand-side platform (DSP) for programmatic mobile.
What is OpenRTB (Real Time Bidding)
Exchange
yes! I will pay $0.01
ad needed
do you want it?
Challenges
Exchange
Thousands of advertisers
Throughput 65b req/day, 750k req/sec
Latency 100~200 ms, round trip
ELB (Elastic Load Balancer)
Transport Layer LB
AJP
ELB (Elastic Load Balancer)
Transport Layer LB
99 percentile latency
100ms
20ms
<Connector port="1443"
scheme="https"
secure="true"
SSLEnabled="true"
SSLCACertificateFile="ca.crt"
SSLCertificateFile="domain.crt"
SSLCertificateKeyFile="domain.key"
... />
<Listener className="org.apache.catalina.core.AprLifecycleListener"
SSLEngine="on" />
<Connector port="8080"
protocol="HTTP/1.1"
maxConnections="12000"
maxThreads="128"
... />
Party!
ExecutorService executor = ...
try {
future = executor.submit(new Task(...))
response = future.get(TIMEOUT, MILLISECONDS)
return response
} catch (TimeoutException e) {
return NO_BID
}
Servlet Threads
Worker Threads
Servlet Threads
Worker Threads
Servlet Threads
Worker Threads
Servlet Threads
Worker Threads
Timeout
Servlet Threads
Worker Threads
Timeout
ExecutorService executor = ...
try {
future = executor.submit(new Task(...))
response = future.get(TIMEOUT, MILLISECONDS)
return response
} catch (TimeoutException e) {
return NO_BID
}
Processing Pipeline
Stage1
Stage2
Stage3
Stage4
return new Task(...).call();
99 percentile latency
20ms
<10ms
Server Fever
Server CPU Usage
JVM Heap
Young Gen
Old Gen
JVM Heap
Young Gen
Old Gen
New JVM flag
-XX:NewSize=13G
Panadol.sh
$ jcmd org.apache.catalina.startup.Bootstrap GC.run
After the daily dose of panadol
Traffic is on the rise
30b+ -> 40b+
A New Problem
100 requests
50 responses in time
20 requests now
Internet
Exchange
PocketMath
Observation
When in doubt, read the doc!
Keep Alive Has A Counter!
<Connector port="8080"
...
maxKeepAliveRequests="-1"
... />
Time to Party?
Traffic is on the rise
40b+ -> 60b+
100 requests
50 responses in time
20 requests now
Internet
Exchange
PocketMath
$ netstat -s | grep ‘connections established’
The unusual log
Total time for which application threads were stopped: 0.0043330 seconds
Total time for which application threads were stopped: 4 seconds
-XX:+PrintGCApplicationStoppedTime
WTH is a Safepoint
Safepoint cont.
-XX:+PrintSafepointStatistics
http://psy-lob-saw.blogspot.sg/2015/12/safepoints.html
grep -E "^[0-9]+\\.[0-9]+" catalina.log
The Culprit
Instruction
Repo
Compiler
Engine
Requests
Quick Experiment
fix
The Matrix
Daily requests | 30b | 60b | 2X |
Bidding Servers | ~60 | 47 | 78% |
99% latency | 100ms | single digit | 10~20X |
Auctions | Auction timeouts | Transfer errors |
2,104,721,951 | 0.0% (6,300) | 0.0% (5,764) |
Fast and Furious