1 of 27

What it is, what is new, and what’s coming

Prometheus

Julien Pivotto and Richard “RichiH” Hartmann

2 of 27

Prometheus

What it is

| 2

3 of 27

Prometheus 101

Inspired by Google's Borgmon
Time series database
unit64 millisecond timestamp, float64 value
Over 1000 community-created instrumentation & exporters
Metrics, not logs
Dashboarding via Grafana

| 3

4 of 27

Main selling points

Highly dynamic, built-in service discovery
No hierarchical model, n-dimensional label set
PromQL: for processing, graphing, alerting, and export
Simple operation
Highly efficient

| 4

5 of 27

Main selling points

Prometheus is a pull-based system
Black-box monitoring: Looking at a service from the outside (Does the server answer to HTTP requests?)
White-box monitoring: Instrumenting code from the inside (How much time does this subroutine take?)
Every service should have its own metrics endpoint
Hard API commitments within major versions

| 5

6 of 27

Time series

Time series are recorded values that change over time
Individual events are usually merged into counters and/or histograms
Changing values are recorded as gauges
Typical examples

Access rates to a web server (counter)
Temperatures in a data center (gauge)
Service latency (histograms)

| 6

7 of 27

Super easy to emit, parse & read

http_requests_total{env="prod",method="post",code="200"} 1027

http_requests_total{env="prod",method="post",code="400"} 3

http_requests_total{env="prod",method="post",code="500"} 12

http_requests_total{env="prod",method="get",code="200"} 20

http_requests_total{env="test",method="post",code="200"} 372

http_requests_total{env="test",method="post",code="400"} 75

| 7

8 of 27

Scale

Kubernetes is ~Borg
Prometheus is ~Borgmon, but with Monarch APIs
Google couldn't have run Borg without Borgmon (and Omega and Monarch)
Kubernetes & Prometheus are designed and written with each other in mind

| 8

9 of 27

Scale

2,500,000+ samples/second/instance
60,000+ samples/second/core
16 bytes/sample compressed to 1.36 bytes/sample

The highest we saw in production on a single Prometheus instance were 125,000,000 active times series at once!

| 9

10 of 27

Long-term storage

Two long-term storage solutions have Prometheus-team members working on them

Thanos

Historically easier to run, but slower
Scales storage horizontally

Cortex

Easy to run these days
Scales storage, ingester, and querier horizontally

Both converge on tech again; I have annoyed people with “Corthanos” for years

| 10

11 of 27

Prometheus

What is new?

| 11

12 of 27

Service discoveries

In the last year, we added 5 new service discoveries

DigitalOcean
Scaleway
Hetzner
Eureka
Docker
Docker Swarm

| 12

13 of 27

Basic Authentication / TLS

Prometheus has gained support for TLS/basic auth server side
A new “exporter toolkit” has been created for the Go exporters
TLS/Basic auth is being added to more and more exporters

| 13

14 of 27

PromQL

New functions, like last_over_time
@ modifier

rate(container_cpu_usage) and�topk(4, rate(container_cpu_usage[5m])) @ end()

Negative offsets

up offset -5m

Composite durations, e.g. 1h30m.

| 14

15 of 27

Remote Write receiver

Prometheus can receive metrics from Remote Write
Writing from one Prometheus server to another
Enable new use cases, like Prometheus “on the edge”

| 15

16 of 27

UI

Prometheus has switched to the React UI by default
New fully-featured editor, with labels autocompletion, snippets
Dark theme

| 16

17 of 27

Exemplars

http_request_seconds{le=”5.0”} 9036.32 # {trace_id="KOO5S4vxi0o"} 0.67

Attach external data to metrics (trace ID)
Easily jump from metrics to traces
Grafana supports them
Trace & span ID format taken from W3C tracing specs

| 17

18 of 27

Alertmanager

Time-based muting

Do not send alerts in weekends / Out of business hours
Controlled per route

Negative matchers

Silence alerts that do not match certain labels

| 18

19 of 27

Prometheus

What is coming

| 19

20 of 27

Aggressively open

Historically, Prometheus has been conservative even with features marked EXPERIMENTAL

We treated them as stable

Revisiting a lot of old assumptions, and enabling more use cases
Make our code more modular, easier to re-use
Lots of work on Agents and data pipelines; support all deployment and operating models in upstream https://github.com/prometheus
Mixins out of the box

| 20