Scaling Graphite at Criteo
FOSDEM 2018 - “Not yet another talk about Prometheus”
Me
Corentin Chary
Twitter: @iksaif�Mail: c.chary@criteo.com
Graphite
Big
Storing time series in Cassandra and querying them from Graphite
Graphite
Carbon - The metric ingestion daemon
host123.cpu0.user 1517651768 100.5� <metric> <timestamp> <value>
c-cache
disk
c-relay
graphite-web - UI and API
Our usage
architecture overview
Applications
(TCP)
carbon-relay�(dc local)
carbon-relay
carbon-cache
graphite
API + UI
grafana
(UDP)
in-memory
persisted
(UDP)
DC 1
architecture overview (r=2)
Applications
carbon-relay
(UDP)
in-memory
(UDP)
DC 1
DC 2
current tools are improvable
solved problems?
BigGraphite
decisions, decisions
Target architecture overview
metrics
carbon-relay
carbon-cache
grafana
Cassandra
graphite
API + UI
carbon-cache
Cassandra
graphite
API + UI
Cassandra
plug’n’play
carbon (carbon.py)
update(uptime.nodeA, [now(), 42])
Graphite-Web (graphite.py)
find(uptime.*) -> [uptime.nodeA]
fetch(uptime.nodeA, now()-60, now())
Slightly more complicated than that…
storing time series in Cassandra
<Cassandra>
Storing data in Cassandra
store( row, col, val )
H = hash( row )
Node = get_owner( H )
send( Node, (row, col, val) )
naïve schema
CREATE TABLE points (� metric text, -- Metric name� time bigint, -- Value timestamp� value double, -- Point value� PRIMARY KEY ((path), time)�) WITH CLUSTERING ORDER BY (time DESC);
time sharding schema
CREATE TABLE IF NOT EXISTS %(table)s (� metric uuid, -- Metric UUID (actual name stored as metadata)� time_start_ms bigint, -- Lower time bound for this row� offset smallint, -- time_start_ms + offset * precision = timestamp� value double, -- Value for the point.� count int, -- If value is sum, divide by count to get the avg� PRIMARY KEY ((metric, time_start_ms), offset)� ) WITH CLUSTERING ORDER BY (offset DESC)� AND default_time_to_live = %(default_time_to_live)d
demo (sort of)
cqlsh> select * from biggraphite.datapoints_2880p_60s limit 5;
metric | time_start_ms | offset | count | value�--------------------------------------+---------------+--------+-------+-------�7dfa0696-2d52-5d35-9cc9-114f5dccc1e4 | 1475040000000 | 1999 | 1 | 2019�7dfa0696-2d52-5d35-9cc9-114f5dccc1e4 | 1475040000000 | 1998 | 1 | 2035�7dfa0696-2d52-5d35-9cc9-114f5dccc1e4 | 1475040000000 | 1997 | 1 | 2031�7dfa0696-2d52-5d35-9cc9-114f5dccc1e4 | 1475040000000 | 1996 | 1 | 2028�7dfa0696-2d52-5d35-9cc9-114f5dccc1e4 | 1475040000000 | 1995 | 1 | 2028
(5 rows)
Partition key Clustering Key Value
Finding nemo
we’re feeling SASI
do you even query?
</Cassandra>
And then ?
BIGGEST GRAPHITE CLUSTER IN THE MULTIVERSE ! (or not)
How to use it ?
�$ pip install biggraphite�$ bgutil syncdb # create tables�$ bg-import-whisper /opt/graphite/storage/whisper # import data��STORAGE_FINDERS = ['biggraphite.plugins.graphite.Finder'] # graphite-web��BG_DRIVER = cassandra # carbon��Voilà ! And you can use *both* whisper and biggraphite during the migration.
links of (potential) interest
Roadmap ?
More Slides
Monitoring your monitoring !
Aggregation�Downsampling/Rollups/Resolutions/Retentions/...
roll what? hmmm?
60s:8d��(stage0)
1h:30d
1d:1y
sum(points)
sum(points)
What about aggregation ?