1 of 10

Prometheus operator

Madushan Nishantha - CMS

jlmadushan@gmail.com

2 of 10

Operators?

  • A pattern to encode domain specific knowledge in a k8s native way
  • Made by domain experts about a specific software
  • Coreos made the operator framework(github.com/operator-framework)
  • Checkout more operators in operator hub(operatorhub.io)

3 of 10

Prometheus operator

  • End to end prometheus monitoring solution built by coreos
  • Built on top of operator framework
  • Uses CRD’s(custom resource definitions) to store configuration info as code
  • Provide live configuration reload
  • Checkout the code(github.com/coreos/prometheus-operator)

4 of 10

Prometheus operator CRDs

  • alertmanager
    • alertmanager.monitoring.coreos.com
    • used to handle alertmanager config
  • prometheus
    • prometheus.monitoring.coreos.com
    • used to handle prometheus config
  • prometheusrule
    • prometheusrules.monitoring.coreos.com
    • alerting rules
  • Servicemonitor
    • Servicemonitor.monitoring.coreos.com
    • scraping config/targets

5 of 10

How to install?

  • Multiple methods

  • Helm <3
    • https://git.io/fjcjq
  • Kube-prometheus
    • https://git.io/fjcjY
    • uses jsonnet
  • Pure k8s manifests
    • Manual
  • k8s >= 1.10
  • We’re using helm

6 of 10

Helm values

  • Minimal helm values file
    • https://git.io/fjCU8
  • Enables the following components
    • prometheus -operator
    • prometheus ha
    • alertmanager ha
    • grafana + related dashboards
    • metrics exporters for k8s system components(apiserver, etc..)
    • metrics exporters for k8s nodes(node-exporter)
    • basic alertmanager and alerting config

7 of 10

Target discovery

  • Done using servicemonitors
  • Most official helm charts has built in servicemonitors
    • ex : prometheus-pushgateway / nginx-ingress
  • pod/service should expose prometheus style metrics endpoint
  • Sample servicemonitor
    • https://git.io/fjCUB
    • make sure to add label “release” with the exact same value as your prometheus release name from helm

8 of 10

Data retention

  • prometheus doesn’t have an official way to retain/backup old data
  • two ways to get around this
  • persistent volume
    • enough for most deployments
    • Sample https://git.io/fjCUg
  • Thanos

9 of 10

Thanos

  • data retention layer built for prometheus
  • backs up data to object storage like s3
  • provide a global query api to add to grafana etc..
  • Can be used with prometheus-operator
    • operatror helm values: https://git.io/fjCUa
    • thanos components: https://git.io/fjCUV

10 of 10

Done

Thank You!