1 of 104

The Key to Go Efficiency

is Just a Few

Go Runtime Metrics Away!

Arianna Vespri, Independent SWE

Bartek Płotka, Senior SWE at Google

GopherCon UK; London; 16.08.2024

feat.

vesari

bwplotka

vesari

bwplotka

2 of 104

Wheel of Misfortune

vesari

bwplotka

3 of 104

Wheel of Misfortune: Incident!

vesari

bwplotka

4 of 104

Wheel of Misfortune: Incident!

vesari

bwplotka

5 of 104

Wheel of Misfortune: Resolving Incident

vesari

bwplotka

6 of 104

Wheel of Misfortune: Resolving Incident

vesari

bwplotka

7 of 104

Wheel of Misfortune: We could have this data!

vesari

bwplotka

8 of 104

Wheel of Misfortune: We could have this data!

vesari

bwplotka

9 of 104

Wheel of Misfortune: We could have this data!

vesari

bwplotka

10 of 104

Wheel of Misfortune: We could have this data!

vesari

bwplotka

11 of 104

Wheel of Misfortune: Context cancellation?

vesari

bwplotka

12 of 104

Wheel of Misfortune: Context cancellation?

vesari

bwplotka

13 of 104

Wheel of Misfortune: Context cancellation?

vesari

bwplotka

14 of 104

Learning!

  • Always cancel what has to be cancelled (:
  • Defer in loop == trouble

Wheel of Misfortune: Context cancellation?

vesari

bwplotka

15 of 104

Wheel of Misfortune: We could have this data!

vesari

bwplotka

16 of 104

Agenda

Should Devs Know How to Monitor Go Runtime?

How Do I Collect Important Metrics?

What Metrics to Collect? How to Act on Them?

vesari

bwplotka

17 of 104

Who are we?

Arianna Vespri

  • Go developer with 6 years of experience
  • OSS Contributor of Prometheus client_golang
  • Background as music industry professional

vesari

bwplotka

18 of 104

Who are we?

Arianna Vespri

  • Go developer with 6 years of experience
  • OSS Contributor of Prometheus client_golang
  • Background as music industry professional
  • Synthesizers, history of art and gardening

vesari

bwplotka

19 of 104

Who are we?

Bartłomiej Płotka

Senior Software Engineer @ Google

  • Tech Lead for Google Cloud Managed Service for Prometheus
  • OSS Maintainer e.g. Prometheus, client_golang, Thanos and more (mostly Go libs/projects)
  • Tech Lead for CNCF TAG Observability

vesari

bwplotka

20 of 104

Who are we?

Bartłomiej Płotka

Senior Software Engineer @ Google

  • Tech Lead for Google Cloud Managed Service for Prometheus
  • OSS Maintainer e.g. Prometheus, client_golang, Thanos and more (mostly Go libs/projects)
  • Tech Lead for CNCF TAG Observability
  • Efficient Go book author: https://www.bwplotka.dev/book

vesari

bwplotka

21 of 104

Who are we?

Bartłomiej Płotka

Senior Software Engineer @ Google

  • Tech Lead for Google Cloud Managed Service for Prometheus
  • OSS Maintainer e.g. Prometheus, client_golang, Thanos and more (mostly Go libs/projects)
  • Tech Lead for CNCF TAG Observability
  • Efficient Go book author: https://www.bwplotka.dev/book
  • Motorcycling!

vesari

bwplotka

22 of 104

Yesterday we talked about Go Runtime

vesari

bwplotka

23 of 104

Go Runtime

vesari

bwplotka

24 of 104

Go Runtime

vesari

bwplotka

25 of 104

Go Runtime

vesari

bwplotka

26 of 104

Go Runtime

vesari

bwplotka

27 of 104

Go Runtime

vesari

bwplotka

28 of 104

Should Devs Know How to Monitor Go Runtime?

vesari

bwplotka

29 of 104

“Bunch of useless stuff”

# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.

# TYPE brokenapp_operation_latency_seconds histogram

brokenapp_operation_latency_seconds_bucket{le="0.01"} 0

brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755

brokenapp_operation_latency_seconds_bucket{le="1"} 1755

brokenapp_operation_latency_seconds_bucket{le="3"} 1755

brokenapp_operation_latency_seconds_bucket{le="6"} 1755

brokenapp_operation_latency_seconds_bucket{le="9"} 1755

brokenapp_operation_latency_seconds_bucket{le="20"} 1755

brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755

brokenapp_operation_latency_seconds_sum 29.664131651999906

brokenapp_operation_latency_seconds_count 1755

# HELP go_build_info Build information about the main Go module.

# TYPE go_build_info gauge

go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1

# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 4.8028e-05

go_gc_duration_seconds{quantile="0.25"} 7.5145e-05

go_gc_duration_seconds{quantile="0.5"} 8.5528e-05

go_gc_duration_seconds{quantile="0.75"} 9.9265e-05

go_gc_duration_seconds{quantile="1"} 0.000184435

go_gc_duration_seconds_sum 0.002283468

go_gc_duration_seconds_count 25

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC

environment variable, and the runtime/debug.SetGCPercent function.

# TYPE go_gc_gogc_percent gauge

go_gc_gogc_percent 100

# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by

the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.

# TYPE go_gc_gomemlimit_bytes gauge

go_gc_gomemlimit_bytes 9.223372036854776e+18

# HELP go_goroutines Number of goroutines that currently exist.

# TYPE go_goroutines gauge

go_goroutines 11

# HELP go_info Information about the Go environment.

# TYPE go_info gauge

go_info{version="go1.22.6"} 1

# HELP go_memstats_alloc_bytes Number of bytes allocated and currently in use.

# TYPE go_memstats_alloc_bytes gauge

go_memstats_alloc_bytes 3.670424e+06

# HELP go_memstats_alloc_bytes_total Total number of bytes allocated until now, even if released already.

# TYPE go_memstats_alloc_bytes_total counter

go_memstats_alloc_bytes_total 5.173828e+07

# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.

# TYPE go_memstats_buck_hash_sys_bytes gauge

go_memstats_buck_hash_sys_bytes 1.455208e+06

# HELP go_memstats_frees_total Total number of heap objects frees.

# TYPE go_memstats_frees_total counter

go_memstats_frees_total 158618

# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.

# TYPE go_memstats_gc_sys_bytes gauge

go_memstats_gc_sys_bytes 2.707048e+06

# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use.

# TYPE go_memstats_heap_alloc_bytes gauge

go_memstats_heap_alloc_bytes 3.670424e+06

# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.

# TYPE go_memstats_heap_idle_bytes gauge

go_memstats_heap_idle_bytes 6.26688e+06

# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.

# TYPE go_memstats_heap_inuse_bytes gauge

go_memstats_heap_inuse_bytes 5.382144e+06

# HELP go_memstats_heap_objects Number of currently allocated objects.

# TYPE go_memstats_heap_objects gauge

go_memstats_heap_objects 10157

# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.

# TYPE go_memstats_heap_released_bytes gauge

go_memstats_heap_released_bytes 3.178496e+06

# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.

# TYPE go_memstats_heap_sys_bytes gauge

go_memstats_heap_sys_bytes 1.1649024e+07

# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.

# TYPE go_memstats_last_gc_time_seconds gauge

go_memstats_last_gc_time_seconds 1.7234457369405274e+09

# HELP go_memstats_lookups_total Total number of pointer lookups.

# TYPE go_memstats_lookups_total counter

go_memstats_lookups_total 0

# HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for

go_memstats_heap_objects gauge.

# TYPE go_memstats_mallocs_total counter

go_memstats_mallocs_total 168775

# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.

# TYPE go_memstats_mcache_inuse_bytes gauge

go_memstats_mcache_inuse_bytes 4800

# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.

# TYPE go_memstats_mcache_sys_bytes gauge

go_memstats_mcache_sys_bytes 15600

# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.

# TYPE go_memstats_mspan_inuse_bytes gauge

go_memstats_mspan_inuse_bytes 109600

# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.

# TYPE go_memstats_mspan_sys_bytes gauge

go_memstats_mspan_sys_bytes 114240

# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.

# TYPE go_memstats_next_gc_bytes gauge

go_memstats_next_gc_bytes 7.092992e+06

# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.

# TYPE go_memstats_other_sys_bytes gauge

go_memstats_other_sys_bytes 1.08724e+06

# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.

# TYPE go_memstats_stack_inuse_bytes gauge

go_memstats_stack_inuse_bytes 884736

# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.

# TYPE go_memstats_stack_sys_bytes gauge

go_memstats_stack_sys_bytes 884736

# HELP go_memstats_sys_bytes Number of bytes obtained from system.

# TYPE go_memstats_sys_bytes gauge

go_memstats_sys_bytes 1.7913096e+07

# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.

# TYPE go_sched_gomaxprocs_threads gauge

go_sched_gomaxprocs_threads 4

# HELP go_sched_latencies_seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.

# TYPE go_sched_latencies_seconds histogram

go_sched_latencies_seconds_bucket{le="6.399999999999999e-08"} 1065

go_sched_latencies_seconds_bucket{le="6.399999999999999e-07"} 1109

go_sched_latencies_seconds_bucket{le="7.167999999999999e-06"} 1512

go_sched_latencies_seconds_bucket{le="8.191999999999999e-05"} 2100

go_sched_latencies_seconds_bucket{le="0.0009175039999999999"} 2433

go_sched_latencies_seconds_bucket{le="0.010485759999999998"} 2435

go_sched_latencies_seconds_bucket{le="0.11744051199999998"} 2435

go_sched_latencies_seconds_bucket{le="+Inf"} 2435

go_sched_latencies_seconds_sum 0.033589888

go_sched_latencies_seconds_count 2435

# HELP go_threads Number of OS threads created.

# TYPE go_threads gauge

go_threads 10

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.

# TYPE process_cpu_seconds_total counter

process_cpu_seconds_total 30.34

# HELP process_max_fds Maximum number of open file descriptors.

# TYPE process_max_fds gauge

process_max_fds 1.048576e+06

# HELP process_network_receive_bytes_total Number of bytes received by the process over the network.

# TYPE process_network_receive_bytes_total counter

process_network_receive_bytes_total 67306

# HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.

# TYPE process_network_transmit_bytes_total counter

process_network_transmit_bytes_total 309783

# HELP process_open_fds Number of open file descriptors.

# TYPE process_open_fds gauge

process_open_fds 10

# HELP process_resident_memory_bytes Resident memory size in bytes.

# TYPE process_resident_memory_bytes gauge

process_resident_memory_bytes 1.6089088e+07

# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.

# TYPE process_start_time_seconds gauge

process_start_time_seconds 1.72344400053e+09

# HELP process_virtual_memory_bytes Virtual memory size in bytes.

# TYPE process_virtual_memory_bytes gauge

process_virtual_memory_bytes 1.264934912e+09

# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.

# TYPE process_virtual_memory_max_bytes gauge

process_virtual_memory_max_bytes 1.8446744073709552e+19

vesari

bwplotka

30 of 104

Devs vs Ops: Monitoring is a common responsibility!

ops

devs

MONITORING

(features vs reliability)

vesari

bwplotka

31 of 104

Why Should I Monitoring Go Runtime

vesari

bwplotka

32 of 104

Why Should I Monitoring Go Runtime

GC tuning

vesari

bwplotka

33 of 104

Why Should I Monitoring Go Runtime

GC tuning

Understanding

your code allocations

vesari

bwplotka

34 of 104

Why Should I Monitoring Go Runtime

GC tuning

Understanding your code concurrency

Understanding

your code allocations

vesari

bwplotka

35 of 104

Why Should I Monitoring Go Runtime

Advanced: Improving Go Runtime code itself!

GC tuning

Understanding your code concurrency

Understanding

your code allocations

vesari

bwplotka

36 of 104

But… Profiling Solves this, no?

vesari

bwplotka

37 of 104

How Do I Collect Runtime Metrics?

vesari

bwplotka

38 of 104

Go Runtime

How to collect runtime statistics from my app?

vesari

bwplotka

39 of 104

Prometheus Go Collector

vesari

bwplotka

40 of 104

Prometheus Go Collector

vesari

bwplotka

41 of 104

Prometheus Go Collector

# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.

# TYPE brokenapp_operation_latency_seconds histogram

brokenapp_operation_latency_seconds_bucket{le="0.01"} 0

brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755

brokenapp_operation_latency_seconds_bucket{le="1"} 1755

brokenapp_operation_latency_seconds_bucket{le="3"} 1755

brokenapp_operation_latency_seconds_bucket{le="6"} 1755

brokenapp_operation_latency_seconds_bucket{le="9"} 1755

brokenapp_operation_latency_seconds_bucket{le="20"} 1755

brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755

brokenapp_operation_latency_seconds_sum 29.664131651999906

brokenapp_operation_latency_seconds_count 1755

# HELP go_build_info Build information about the main Go module.

# TYPE go_build_info gauge

go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1

# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 4.8028e-05

go_gc_duration_seconds{quantile="0.25"} 7.5145e-05

go_gc_duration_seconds{quantile="0.5"} 8.5528e-05

go_gc_duration_seconds{quantile="0.75"} 9.9265e-05

go_gc_duration_seconds{quantile="1"} 0.000184435

go_gc_duration_seconds_sum 0.002283468

go_gc_duration_seconds_count 25

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G

HTTP /metrics

(OpenMetrics or Prometheus format)

vesari

bwplotka

42 of 104

Prometheus Go Collector

# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.

# TYPE brokenapp_operation_latency_seconds histogram

brokenapp_operation_latency_seconds_bucket{le="0.01"} 0

brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755

brokenapp_operation_latency_seconds_bucket{le="1"} 1755

brokenapp_operation_latency_seconds_bucket{le="3"} 1755

brokenapp_operation_latency_seconds_bucket{le="6"} 1755

brokenapp_operation_latency_seconds_bucket{le="9"} 1755

brokenapp_operation_latency_seconds_bucket{le="20"} 1755

brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755

brokenapp_operation_latency_seconds_sum 29.664131651999906

brokenapp_operation_latency_seconds_count 1755

# HELP go_build_info Build information about the main Go module.

# TYPE go_build_info gauge

go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1

# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 4.8028e-05

go_gc_duration_seconds{quantile="0.25"} 7.5145e-05

go_gc_duration_seconds{quantile="0.5"} 8.5528e-05

go_gc_duration_seconds{quantile="0.75"} 9.9265e-05

go_gc_duration_seconds{quantile="1"} 0.000184435

go_gc_duration_seconds_sum 0.002283468

go_gc_duration_seconds_count 25

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G

HTTP /metrics

vesari

bwplotka

43 of 104

Prometheus Go Collector

# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.

# TYPE brokenapp_operation_latency_seconds histogram

brokenapp_operation_latency_seconds_bucket{le="0.01"} 0

brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755

brokenapp_operation_latency_seconds_bucket{le="1"} 1755

brokenapp_operation_latency_seconds_bucket{le="3"} 1755

brokenapp_operation_latency_seconds_bucket{le="6"} 1755

brokenapp_operation_latency_seconds_bucket{le="9"} 1755

brokenapp_operation_latency_seconds_bucket{le="20"} 1755

brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755

brokenapp_operation_latency_seconds_sum 29.664131651999906

brokenapp_operation_latency_seconds_count 1755

# HELP go_build_info Build information about the main Go module.

# TYPE go_build_info gauge

go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1

# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 4.8028e-05

go_gc_duration_seconds{quantile="0.25"} 7.5145e-05

go_gc_duration_seconds{quantile="0.5"} 8.5528e-05

go_gc_duration_seconds{quantile="0.75"} 9.9265e-05

go_gc_duration_seconds{quantile="1"} 0.000184435

go_gc_duration_seconds_sum 0.002283468

go_gc_duration_seconds_count 25

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G

HTTP /metrics

vesari

bwplotka

44 of 104

Prometheus Go Collector

# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.

# TYPE brokenapp_operation_latency_seconds histogram

brokenapp_operation_latency_seconds_bucket{le="0.01"} 0

brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755

brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755

brokenapp_operation_latency_seconds_bucket{le="1"} 1755

brokenapp_operation_latency_seconds_bucket{le="3"} 1755

brokenapp_operation_latency_seconds_bucket{le="6"} 1755

brokenapp_operation_latency_seconds_bucket{le="9"} 1755

brokenapp_operation_latency_seconds_bucket{le="20"} 1755

brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755

brokenapp_operation_latency_seconds_sum 29.664131651999906

brokenapp_operation_latency_seconds_count 1755

# HELP go_build_info Build information about the main Go module.

# TYPE go_build_info gauge

go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1

# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 4.8028e-05

go_gc_duration_seconds{quantile="0.25"} 7.5145e-05

go_gc_duration_seconds{quantile="0.5"} 8.5528e-05

go_gc_duration_seconds{quantile="0.75"} 9.9265e-05

go_gc_duration_seconds{quantile="1"} 0.000184435

go_gc_duration_seconds_sum 0.002283468

go_gc_duration_seconds_count 25

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G

HTTP /metrics

vesari

bwplotka

45 of 104

Prometheus Go Collector: Default Metrics

vesari

bwplotka

46 of 104

Prometheus Go Collector: Default Metrics

vesari

bwplotka

47 of 104

Prometheus Go Collector: Default Metrics

vesari

bwplotka

48 of 104

Gaps?

That’s it? What about…?

  • Detailed GC statistics?
  • Detailed memory statistics?
  • Detailed scheduler statistics?
  • How many Go to C calls?
  • Currently configured GOGC, GOMEMLIMIT?
  • Mutex wait times?
  • GODEBUG behaviours?

vesari

bwplotka

49 of 104

Welcome runtime/metrics!

vesari

bwplotka

50 of 104

Welcome runtime/metrics!

…and more useful metrics, evolving with every Go version!

vesari

bwplotka

51 of 104

Prometheus Go Collector: runtime/metrics

vesari

bwplotka

52 of 104

Prometheus Go Collector: runtime/metrics

vesari

bwplotka

53 of 104

Prometheus Go Collector: runtime/metrics

Are those valid Prometheus metric names?

vesari

bwplotka

54 of 104

Prometheus Go Collector: runtime/metrics

vesari

bwplotka

55 of 104

Tip

The “_total” suffix immediately tells you a metric is cumulative e.g. counting bytes since the program start!

Prometheus Go Collector: runtime/metrics

vesari

bwplotka

56 of 104

All metrics

vesari

bwplotka

57 of 104

All metrics

vesari

bwplotka

58 of 104

All metrics

vesari

bwplotka

59 of 104

Memstats metrics are calculated from runtime/metrics

# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.

# TYPE go_memstats_alloc_bytes gauge

go_memstats_alloc_bytes 10241231

# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.

# TYPE go_memstats_alloc_bytes_total counter

go_memstats_alloc_bytes_total 410241587

vesari

bwplotka

60 of 104

All metrics

vesari

bwplotka

61 of 104

Tip

Every exposed metric on fleet adds $ to the monitoring bill (lower than other o11y signals, but still)

All metrics - not ideal by default

vesari

bwplotka

62 of 104

Prometheus Go Collector: Pick Your Metrics

vesari

bwplotka

63 of 104

Prometheus Go Collector: Pick Your Metrics

vesari

bwplotka

64 of 104

Prometheus Go Collector: Pick Your Metrics

Wait..

But what metrics should I pick?

vesari

bwplotka

65 of 104

Recommended Metrics

vesari

bwplotka

66 of 104

Recommended runtime/metrics

vesari

bwplotka

67 of 104

Result: Enhanced Go Collector Default!

vesari

bwplotka

68 of 104

Warning

Renaming metrics is not trivial!

(learning resources and downstream automation like alerting, recording, dashboards, self-healing, etc.)

Result: Enhanced Go Collector Default!

vesari

bwplotka

69 of 104

Result: Enhanced Go Collector Default!

Default should be as close to Go team recommendations as possible!

vesari

bwplotka

70 of 104

What Metrics to Collect? How to Act on Them?

vesari

bwplotka

71 of 104

What did I just deploy?

vesari

bwplotka

72 of 104

Go Version Information

go_info

Metric for: Go build info e.g. version of Go.

Collect: to keep track of Go versions in your applications.

Act: by upgrading Go environment in your application.

vesari

bwplotka

73 of 104

GOMAXPROCS

go_sched_gomaxprocs_threads

Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)

Collect: to check the parallelism in an application.

Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.

vesari

bwplotka

74 of 104

GOMAXPROCS

go_sched_gomaxprocs_threads

Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)

Collect: to check the parallelism in an application.

Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.

vesari

bwplotka

75 of 104

GOMAXPROCS

go_sched_gomaxprocs_threads

Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)

Collect: to check the parallelism in an application.

Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.

vesari

bwplotka

76 of 104

GOMAXPROCS

go_sched_gomaxprocs_threads

Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)

Collect: to check the parallelism in an application.

Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.

vesari

bwplotka

77 of 104

GOGC

GOGC = 100

+100%

go_gc_gogc_percent

Metric for: GOGC (/gc/gogc:percent).

Collect: to understand GC behaviour.

Act: by tuning GOGC.

vesari

bwplotka

78 of 104

GOGC

GOGC = 50

+50%

(more frequent)

+120%

(less frequent)

GOGC = 120

go_gc_gogc_percent

Metric for: GOGC (/gc/gogc:percent).

Collect: to understand GC behaviour.

Act: by tuning GOGC.

vesari

bwplotka

79 of 104

GOGC

GOGC

70

50

100

go_gc_gogc_percent

Metric for: GOGC (/gc/gogc:percent).

Collect: to understand GC behaviour.

Act: by tuning GOGC.

vesari

bwplotka

80 of 104

GOGC

GOGC

70

50

100

go_gc_gogc_percent

Metric for: GOGC (/gc/gogc:percent).

Collect: to understand GC behaviour.

Act: by tuning GOGC.

vesari

bwplotka

81 of 104

GOGC

GOGC

70

50

100

go_gc_gogc_percent

Metric for: GOGC (/gc/gogc:percent).

Collect: to understand GC behaviour.

Act: by tuning GOGC.

vesari

bwplotka

82 of 104

GOMEMLIMIT

GOGC = tooHighValue

+ tooHighValue%

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

83 of 104

GOMEMLIMIT

GOMEMLIMIT

GOGC = tooHighValue

+ tooHighValue%

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

84 of 104

GOMEMLIMIT

GOMEMLIMIT

+ okValue%

containers

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

85 of 104

GOMEMLIMIT

GOMEMLIMIT

GOMEMLIMIT

GOMEMLIMIT

GOMEMLIMIT

gomemlimit

G o m e m l i m i t

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

86 of 104

GOMEMLIMIT

X

X

too low GOMEMLIMIT

thrashing!!

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

87 of 104

GOMEMLIMIT

AUTOGOMEMLIMIT

70 MiB

50 MiB

100 MiB

go_gc_gomemlimit_bytes

Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).

Collect: to track the soft memory limit of the runtime

Act: by tuning GOMEMLIMIT

vesari

bwplotka

88 of 104

Is My Concurrency Healthy?

vesari

bwplotka

89 of 104

Number of goroutines

!!!

!!!

go_goroutines

Metric for: the number of currently existing goroutines (/sched/goroutines:goroutines)

Collect: in many circumstances, especially in highly-concurrent applications, also to troubleshoot OOM kills.

Act: by correcting/refactoring the concurrency patterns in your code.

vesari

bwplotka

90 of 104

Number of goroutines

!!!

!!!

go_goroutines

Metric for: the number of currently existing goroutines (/sched/goroutines:goroutines)

Collect: in many circumstances, especially in highly-concurrent applications, also to troubleshoot OOM kills.

Act: by correcting/refactoring the concurrency patterns in your code.

vesari

bwplotka

91 of 104

How much time goroutines wait for scheduling

go_sched_latencies_seconds

Metric for: the distribution of time the goroutines spend runnable but not running (/sched/latencies:seconds).

Collect: in scenarios requiring visibility into the latency of the overall system load and when an uneven load is suspected.

Act: by optimizing your code concurrency patterns.

vesari

bwplotka

92 of 104

Is My Memory OK?

???

vesari

bwplotka

93 of 104

In-use Heap Allocated Memory

go_memstats_alloc_bytes or go_memstats_heap_alloc_bytes

Metric for: the number of bytes allocated and currently in use (/memory/classes/heap/inuse:bytes)

Collect: to plan for memory usage and in case of degraded performance or crashes.

Act: by eliminating possible memory leaks, optimizing GC configurations.

vesari

bwplotka

94 of 104

In-use Heap Allocated Memory

go_memstats_alloc_bytes

Metric for: the number of bytes allocated and currently in use (/memory/classes/heap/inuse:bytes)

Collect: to plan for memory usage and in case of degraded performance or crashes.

Act: by eliminating possible memory leaks, optimizing GC configurations.

vesari

bwplotka

95 of 104

Total Heap Allocated Memory

go_memstats_alloc_bytes_total

Metric for: number of heap bytes allocated since the program start, even if released already (/memory/classes/heap/total_alloc:bytes)

Collect: to understand allocation patterns and GC resource cost impact

Act: by debugging memory leaks, fine-tuning GC configuration

~30 KB/s allocated!

vesari

bwplotka

96 of 104

In-use & Total Heap Allocated Objects

go_memstats_mallocs_total

Metric for: the total number of heap objects allocated, semantically a counter version for go_memstats_heap_objects (/gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects).

Collect: to understand GC behaviour and memory allocation cost impact; identify memory leaks

Act: by tuning GC

go_memstats_heap_objects

Metric for: the number of allocated heap objects (/gc/heap/objects:objects)

Collect: to identify memory leaks and understand memory footprint

Act: by refactoring code, tuning GC settings

objects have

different sizes

vesari

bwplotka

97 of 104

In-use Stack Allocated Memory

go_memstats_stack_sys_bytes

Metric for: the memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use (/memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes).

Collect: to detect excessive use of stack memory

Act: by optimizing code (concurrency, function calls etc)

to each goroutine its stack

vesari

bwplotka

98 of 104

OS Allocated Memory vs What Runtime Uses

1000010001110101010111110011111

1000111111110101011101001

go_memstats_sys_bytes

go_memstats_heap_release_bytes

Metrics for: respectively the number of bytes obtained from the system i.e. total Go runtime memory footprint (/memory/classes/total:bytes) and Memory that is completely free and has been returned to the underlying system (/memory/classes/heap/released:bytes)

Collect: to assess the overall resource demand

Act: by optimizing memory usage, through GOMEMLIMIT tuning

vesari

bwplotka

99 of 104

OS Allocated Memory vs What Runtime Uses

/memory/classes/total:bytes /memory/classes/heap/released:bytes

physical

(according to Runtime)

go_memstats_sys_bytes

go_memstats_heap_release_bytes

Metrics for: respectively the number of bytes obtained from the system i.e. total Go runtime memory footprint (/memory/classes/total:bytes) and Memory that is completely free and has been returned to the underlying system (/memory/classes/heap/released:bytes)

Collect: to assess the overall resource demand

Act: by optimizing memory usage, through GOMEMLIMIT tuning

vesari

bwplotka

100 of 104

Garbage Collection Bytes Target

next

?

How many bytes

for the

go_memstats_next_gc_bytes

Metric for: the number of heap bytes at which the next GC will take place (/gc/heap/goal:bytes)

Collect: to check on the balance between memory usage and application performance

Act: by tuning GC settings

vesari

bwplotka

101 of 104

Summary

vesari

bwplotka

102 of 104

Learnings

  • Go Runtime recommended metrics are handy!
    • Leverage OSS tools and knowledge for those!
  • Don’t over-collect observability.
  • Make sure metrics are stable and well known.
  • Go to Prometheus/client_golang OSS and give us feedback!

Have fun making your code more efficient faster and cheaper!

vesari

bwplotka

103 of 104

Thank You! Questions?

Arianna Vespri, Independent SWE

Bartek Płotka, Senior SWE at Google

Links

Kudos to (hidden heroes)

vesari

bwplotka

104 of 104

Bonus: Let’s get our hands dirty!

Demo Time!

vesari

bwplotka