1 of 27

What it is, what is new, and what’s coming

Prometheus

Julien Pivotto and Richard “RichiH” Hartmann

2 of 27

Prometheus

What it is

| 2

3 of 27

Prometheus 101

  • Inspired by Google's Borgmon
  • Time series database
  • unit64 millisecond timestamp, float64 value
  • Over 1000 community-created instrumentation & exporters
  • Metrics, not logs
  • Dashboarding via Grafana

| 3

4 of 27

Main selling points

  • Highly dynamic, built-in service discovery
  • No hierarchical model, n-dimensional label set
  • PromQL: for processing, graphing, alerting, and export
  • Simple operation
  • Highly efficient

| 4

5 of 27

Main selling points

  • Prometheus is a pull-based system
  • Black-box monitoring: Looking at a service from the outside (Does the server answer to HTTP requests?)
  • White-box monitoring: Instrumenting code from the inside (How much time does this subroutine take?)
  • Every service should have its own metrics endpoint
  • Hard API commitments within major versions

| 5

6 of 27

Time series

  • Time series are recorded values that change over time
  • Individual events are usually merged into counters and/or histograms
  • Changing values are recorded as gauges
  • Typical examples
    • Access rates to a web server (counter)
    • Temperatures in a data center (gauge)
    • Service latency (histograms)

| 6

7 of 27

Super easy to emit, parse & read

http_requests_total{env="prod",method="post",code="200"} 1027

http_requests_total{env="prod",method="post",code="400"} 3

http_requests_total{env="prod",method="post",code="500"} 12

http_requests_total{env="prod",method="get",code="200"} 20

http_requests_total{env="test",method="post",code="200"} 372

http_requests_total{env="test",method="post",code="400"} 75

| 7

8 of 27

Scale

  • Kubernetes is ~Borg
  • Prometheus is ~Borgmon, but with Monarch APIs
  • Google couldn't have run Borg without Borgmon (and Omega and Monarch)
  • Kubernetes & Prometheus are designed and written with each other in mind

| 8

9 of 27

Scale

  • 2,500,000+ samples/second/instance
  • 60,000+ samples/second/core
  • 16 bytes/sample compressed to 1.36 bytes/sample

The highest we saw in production on a single Prometheus instance were 125,000,000 active times series at once!

| 9

10 of 27

Long-term storage

  • Two long-term storage solutions have Prometheus-team members working on them
    • Thanos
      • Historically easier to run, but slower
      • Scales storage horizontally
    • Cortex
      • Easy to run these days
      • Scales storage, ingester, and querier horizontally
  • Both converge on tech again; I have annoyed people with “Corthanos” for years

| 10

11 of 27

Prometheus

What is new?

| 11

12 of 27

Service discoveries

  • In the last year, we added 5 new service discoveries
    • DigitalOcean
    • Scaleway
    • Hetzner
    • Eureka
    • Docker
    • Docker Swarm

| 12

13 of 27

Basic Authentication / TLS

  • Prometheus has gained support for TLS/basic auth server side
  • A new “exporter toolkit” has been created for the Go exporters
  • TLS/Basic auth is being added to more and more exporters

| 13

14 of 27

PromQL

  • New functions, like last_over_time
  • @ modifier
    • rate(container_cpu_usage) and�topk(4, rate(container_cpu_usage[5m])) @ end()
  • Negative offsets
    • up offset -5m
  • Composite durations, e.g. 1h30m.

| 14

15 of 27

Remote Write receiver

  • Prometheus can receive metrics from Remote Write
  • Writing from one Prometheus server to another
  • Enable new use cases, like Prometheus “on the edge”

| 15

16 of 27

UI

  • Prometheus has switched to the React UI by default
  • New fully-featured editor, with labels autocompletion, snippets
  • Dark theme

| 16

17 of 27

Exemplars

http_request_seconds{le=”5.0”} 9036.32 # {trace_id="KOO5S4vxi0o"} 0.67

  • Attach external data to metrics (trace ID)
  • Easily jump from metrics to traces
  • Grafana supports them
  • Trace & span ID format taken from W3C tracing specs

| 17

18 of 27

Alertmanager

  • Time-based muting
    • Do not send alerts in weekends / Out of business hours
    • Controlled per route
  • Negative matchers
    • Silence alerts that do not match certain labels

| 18

19 of 27

Prometheus

What is coming

| 19

20 of 27

Aggressively open

  • Historically, Prometheus has been conservative even with features marked EXPERIMENTAL
    • We treated them as stable
  • Revisiting a lot of old assumptions, and enabling more use cases
  • Make our code more modular, easier to re-use
  • Lots of work on Agents and data pipelines; support all deployment and operating models in upstream https://github.com/prometheus
  • Mixins out of the box

| 20

21 of 27

Aggressively open

| 21

22 of 27

Imitation is the sincerest form of flattery

Oscar Wilde

| 22

23 of 27

CNCF End User Survey on Observability

| 23

24 of 27

| 24

Working towards full compatibilityMar 2021

25 of 27

Tests, compliance, and compatibility

| 25

26 of 27

Share your thoughts!

| 26

27 of 27