1 of 13

2 of 13

Ashwin Sriram

What’s New in Prometheus-Operator

3 of 13

About Me

  • Maintainer at Prometheus Operator (Docs and Website)
  • GSoC’24 @Prometheus-Operator
  • Student, IIT BHU
  • Upcoming Software Engineer @Deutsche Bank

Ashwin Sriram

4 of 13

Looking Back at Sept 2023

  • DaemonSet Mode for Prometheus Agent
  • Status Subresource
  • Shard Autoscaling
  • Scrape Classes

Goals

5 of 13

Shard Autoscaling

  • Automatically scaling of Prometheus shards using HPAs based on resource or custom metrics(proposal).
  • Uses Scale subresource to operate on shard count
  • Graceful Shutdowns:
    • Agent mode: Flush remote-write queues before exit
    • Server mode: Retain pods temporarily to avoid data loss
  • Limitation: PVCs are not automatically deleted after retention period expires #6833

6 of 13

Scrape Classes

  • Allows admins to define common configuration settings to be applied across all scrape resources.
  • (Kind of)Similar to StorageClass in Kubernetes
  • Reduces repetition of complex config in scraping resources.
  • Example: Scraping Pods in an Istio Mesh with Strict mTLS

7 of 13

Poctl

  • CLI for managing Prometheus-Operator resources.
  • Easier deployment, troubleshooting and validation.
  • Example:
    • `poctl create servicemonitor [flags]` -> Create a ServiceMonitor object in K8s cluster
    • Use flags like -n(namespace), -p(port) and -s(service name)
  • EXPERIMENTAL, feedback needed
  • Github

8 of 13

Other New Additions

  • Enhanced Service Discovery: ScrapeConfig now supports 22 service discovery mechanisms
  • Prometheus 3.0 compatible ✅
  • Support for Remote-Write 2.0
  • Remote Write support for ThanosRuler : #7444
  • Cluster mTLS configuration for Alertmanager : #7149

9 of 13

What to expect in Future

  • Zone aware sharding : #6437
  • Remote Write CRD(proposal)
  • ScrapeConfig Graduating to v1beta1🚀
  • DaemonSet Mode
  • Status Subresource for Config resources(SMs, PMs, etc)

10 of 13

DaemonSet Mode for Prometheus Agent

11 of 13

Status Subresource

  • Status of objects is not reflected by Prometheus-Operator
  • Example: How many targets are being scraped?
  • Proposal for status subresource
  • GSoC mentee -> Yash Kumar Patel

12 of 13

Great!! How do I get involved?

Say Hi👋 on Slack: #prometheus-operator-dev �Drop in!!📅 Office hour meetings(Biweekly at Monday, 11 UTC)

Start by exploring the docs 📖 help me improve as well 😅

Open a PR! Flex your skill💪💪Github

13 of 13

Thank You