building twitter’s next-gen
ALERTING SYSTEM
dan sotolongo
@sortalongo
megan kanne
@megankanne
justin nguyen
@justanguyen
#talk-megan-justin-dan
OBSERVABILITY AT TWITTER
scaleable
robust
realtime
estwo on flickr
Collection Agent 1
Ingestion Service
Storage (Manhattan)
Timeseries DB Query Engine
Temporal Indexing Service
Alerting Service
Visualization Service
Alerting Cmd Line Tools
Service 1
Timeseries Cmd Line Tools
Collection Agent n
Service n
….
OLD ALERTING SYSTEM
successes and challenges
300M
4.3B
14x
metrics written per minute
Nov 2013
June 2016
25K 3M
alerts per minute
alert monitors per minute
making alerts
alert >
rule >
monitor
the good:
the bad:
alerts
dashboards
alerts
dashboards
being on call
Zone1
Zone2
Zone1
Zone2
THE SOLUTION
improve configurations
reduce loss of visibility
taspicsvns on flickr
making alerts: simplicity
chart
alert
server-side validator
follows best practices
queries not expensive
...
on call: reliability
old
zone 1
zone 2
new
zone 1
zone 2
t0
node 1
state: ok
timestamp: t0
t1
node 1
t2
node 2
state: ok
last evaluation: t0
t1
t2
evaluate t1 & t2
30%
time to detect in minutes
old
new
2.5
1.75
alerting service
alerting service
alert scheduler
alert runner
start
alert source
zookeeper
configs
configs
timeseries db
alerting api
alert source
zone 1
zone 2
timeseries db
storage (Manhattan)
current state
eval
snooze
react
record
history recorder
notifier
stop?
alert evaluator
snoozed?
...
balancer
shards
shards
balancer
human reasoning
INTEGRATION
TESTING
bring together signals
postsumptio on flickr
CONTEXT
global context
(twitter)
peer context
(dependencies)
local context
(changes in my system)
runbook
contact
EMPOWER
HUMANS
elaine_macc on flickr
LESSONS LEARNED
seldonscott on flickr
distributed systems
Requirements
Challenges
alerting system distribution
Work is split into shards
Distributed Systems Design
engineering principles
(small)
END
sharding rulesets
Ruleset
Rule
Rule
Fanout
Fanout
Fanout
Fanout
Fanout
Fanout
sharding rulesets
Ruleset
Rule
Rule
Fanout
Fanout
Fanout
Fanout
Fanout
Fanout
Ruleset
Rule
Rule
Fanout
Fanout
Fanout
Fanout
Fanout
Fanout
distributed systems design
?
So many vows…
No matter what you do,
you’re forsaking one vow or the other.
users
Support
Collaboration
user support
Sisyphus, Marcell Jankovics
user support: front line
Interaction points
Helping out
user support:� second line
User guides
Documentation lives with code� (in a monorepo)
User Support: Third Line
user collaboration
Migrations:
user collaboration
NEW!
user collaboration
NEW!
user collaboration
NEW!
Peter Trevelyan, Shifting Lines
thanks to
Ian Brown
Jonathan Cao
Hao Huang
Aras Saulys
Ning Wang
Si Wang
Mike Moreno
Caitie McCaffrey
Anthony Asta
JC Martin
Ryan O’Neill
Steven Parkes
Jacob Reiff
Yann Ramin
Michael Suzuki
Franklin Hu
Cory Watson
QUESTIONS?
if this sounds cool, come talk to us:
Justin Nguyen� @jnguyen� @justanguyen
Megan Kanne� @megan� @megankanne
Dan Sotolongo� @sortalongo_� @sortalongo
#talk-megan-justin-dan