The Key to Go Efficiency
is Just a Few
Go Runtime Metrics Away!
Arianna Vespri, Independent SWE
Bartek Płotka, Senior SWE at Google
GopherCon UK; London; 16.08.2024
feat.
vesari
bwplotka
vesari
bwplotka
Wheel of Misfortune
vesari
bwplotka
Wheel of Misfortune: Incident!
vesari
bwplotka
Wheel of Misfortune: Incident!
vesari
bwplotka
Wheel of Misfortune: Resolving Incident
vesari
bwplotka
Wheel of Misfortune: Resolving Incident
vesari
bwplotka
Wheel of Misfortune: We could have this data!
vesari
bwplotka
Wheel of Misfortune: We could have this data!
vesari
bwplotka
Wheel of Misfortune: We could have this data!
vesari
bwplotka
Wheel of Misfortune: We could have this data!
vesari
bwplotka
Wheel of Misfortune: Context cancellation?
vesari
bwplotka
Wheel of Misfortune: Context cancellation?
vesari
bwplotka
Wheel of Misfortune: Context cancellation?
vesari
bwplotka
Learning!
Wheel of Misfortune: Context cancellation?
vesari
bwplotka
Wheel of Misfortune: We could have this data!
vesari
bwplotka
Agenda
Should Devs Know How to Monitor Go Runtime?
How Do I Collect Important Metrics?
What Metrics to Collect? How to Act on Them?
vesari
bwplotka
Who are we?
Arianna Vespri
vesari
bwplotka
Who are we?
Arianna Vespri
vesari
bwplotka
Who are we?
Bartłomiej Płotka
Senior Software Engineer @ Google
vesari
bwplotka
Who are we?
Bartłomiej Płotka
Senior Software Engineer @ Google
vesari
bwplotka
Who are we?
Bartłomiej Płotka
Senior Software Engineer @ Google
vesari
bwplotka
Yesterday we talked about Go Runtime
vesari
bwplotka
Go Runtime
vesari
bwplotka
Go Runtime
vesari
bwplotka
Go Runtime
vesari
bwplotka
Go Runtime
vesari
bwplotka
Go Runtime
vesari
bwplotka
Should Devs Know How to Monitor Go Runtime?
vesari
bwplotka
“Bunch of useless stuff”
# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.
# TYPE brokenapp_operation_latency_seconds histogram
brokenapp_operation_latency_seconds_bucket{le="0.01"} 0
brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755
brokenapp_operation_latency_seconds_bucket{le="1"} 1755
brokenapp_operation_latency_seconds_bucket{le="3"} 1755
brokenapp_operation_latency_seconds_bucket{le="6"} 1755
brokenapp_operation_latency_seconds_bucket{le="9"} 1755
brokenapp_operation_latency_seconds_bucket{le="20"} 1755
brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755
brokenapp_operation_latency_seconds_sum 29.664131651999906
brokenapp_operation_latency_seconds_count 1755
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.8028e-05
go_gc_duration_seconds{quantile="0.25"} 7.5145e-05
go_gc_duration_seconds{quantile="0.5"} 8.5528e-05
go_gc_duration_seconds{quantile="0.75"} 9.9265e-05
go_gc_duration_seconds{quantile="1"} 0.000184435
go_gc_duration_seconds_sum 0.002283468
go_gc_duration_seconds_count 25
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC
environment variable, and the runtime/debug.SetGCPercent function.
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by
the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 11
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.22.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and currently in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.670424e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated until now, even if released already.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.173828e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.455208e+06
# HELP go_memstats_frees_total Total number of heap objects frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 158618
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.707048e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.670424e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.26688e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 5.382144e+06
# HELP go_memstats_heap_objects Number of currently allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 10157
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 3.178496e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.1649024e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.7234457369405274e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for
go_memstats_heap_objects gauge.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 168775
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 109600
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 114240
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 7.092992e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.08724e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 884736
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 884736
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.7913096e+07
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 4
# HELP go_sched_latencies_seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.
# TYPE go_sched_latencies_seconds histogram
go_sched_latencies_seconds_bucket{le="6.399999999999999e-08"} 1065
go_sched_latencies_seconds_bucket{le="6.399999999999999e-07"} 1109
go_sched_latencies_seconds_bucket{le="7.167999999999999e-06"} 1512
go_sched_latencies_seconds_bucket{le="8.191999999999999e-05"} 2100
go_sched_latencies_seconds_bucket{le="0.0009175039999999999"} 2433
go_sched_latencies_seconds_bucket{le="0.010485759999999998"} 2435
go_sched_latencies_seconds_bucket{le="0.11744051199999998"} 2435
go_sched_latencies_seconds_bucket{le="+Inf"} 2435
go_sched_latencies_seconds_sum 0.033589888
go_sched_latencies_seconds_count 2435
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 10
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 30.34
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_network_receive_bytes_total Number of bytes received by the process over the network.
# TYPE process_network_receive_bytes_total counter
process_network_receive_bytes_total 67306
# HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.
# TYPE process_network_transmit_bytes_total counter
process_network_transmit_bytes_total 309783
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.6089088e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.72344400053e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.264934912e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
vesari
bwplotka
Devs vs Ops: Monitoring is a common responsibility!
ops
devs
MONITORING
(features vs reliability)
vesari
bwplotka
Why Should I Monitoring Go Runtime
vesari
bwplotka
Why Should I Monitoring Go Runtime
GC tuning
vesari
bwplotka
Why Should I Monitoring Go Runtime
GC tuning
Understanding
your code allocations
vesari
bwplotka
Why Should I Monitoring Go Runtime
GC tuning
Understanding your code concurrency
Understanding
your code allocations
vesari
bwplotka
Why Should I Monitoring Go Runtime
Advanced: Improving Go Runtime code itself!
GC tuning
Understanding your code concurrency
Understanding
your code allocations
vesari
bwplotka
But… Profiling Solves this, no?
vesari
bwplotka
How Do I Collect Runtime Metrics?
vesari
bwplotka
Go Runtime
How to collect runtime statistics from my app?
vesari
bwplotka
Prometheus Go Collector
vesari
bwplotka
Prometheus Go Collector
vesari
bwplotka
Prometheus Go Collector
# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.
# TYPE brokenapp_operation_latency_seconds histogram
brokenapp_operation_latency_seconds_bucket{le="0.01"} 0
brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755
brokenapp_operation_latency_seconds_bucket{le="1"} 1755
brokenapp_operation_latency_seconds_bucket{le="3"} 1755
brokenapp_operation_latency_seconds_bucket{le="6"} 1755
brokenapp_operation_latency_seconds_bucket{le="9"} 1755
brokenapp_operation_latency_seconds_bucket{le="20"} 1755
brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755
brokenapp_operation_latency_seconds_sum 29.664131651999906
brokenapp_operation_latency_seconds_count 1755
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.8028e-05
go_gc_duration_seconds{quantile="0.25"} 7.5145e-05
go_gc_duration_seconds{quantile="0.5"} 8.5528e-05
go_gc_duration_seconds{quantile="0.75"} 9.9265e-05
go_gc_duration_seconds{quantile="1"} 0.000184435
go_gc_duration_seconds_sum 0.002283468
go_gc_duration_seconds_count 25
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G
HTTP /metrics
(OpenMetrics or Prometheus format)
vesari
bwplotka
Prometheus Go Collector
# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.
# TYPE brokenapp_operation_latency_seconds histogram
brokenapp_operation_latency_seconds_bucket{le="0.01"} 0
brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755
brokenapp_operation_latency_seconds_bucket{le="1"} 1755
brokenapp_operation_latency_seconds_bucket{le="3"} 1755
brokenapp_operation_latency_seconds_bucket{le="6"} 1755
brokenapp_operation_latency_seconds_bucket{le="9"} 1755
brokenapp_operation_latency_seconds_bucket{le="20"} 1755
brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755
brokenapp_operation_latency_seconds_sum 29.664131651999906
brokenapp_operation_latency_seconds_count 1755
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.8028e-05
go_gc_duration_seconds{quantile="0.25"} 7.5145e-05
go_gc_duration_seconds{quantile="0.5"} 8.5528e-05
go_gc_duration_seconds{quantile="0.75"} 9.9265e-05
go_gc_duration_seconds{quantile="1"} 0.000184435
go_gc_duration_seconds_sum 0.002283468
go_gc_duration_seconds_count 25
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G
HTTP /metrics
vesari
bwplotka
Prometheus Go Collector
# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.
# TYPE brokenapp_operation_latency_seconds histogram
brokenapp_operation_latency_seconds_bucket{le="0.01"} 0
brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755
brokenapp_operation_latency_seconds_bucket{le="1"} 1755
brokenapp_operation_latency_seconds_bucket{le="3"} 1755
brokenapp_operation_latency_seconds_bucket{le="6"} 1755
brokenapp_operation_latency_seconds_bucket{le="9"} 1755
brokenapp_operation_latency_seconds_bucket{le="20"} 1755
brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755
brokenapp_operation_latency_seconds_sum 29.664131651999906
brokenapp_operation_latency_seconds_count 1755
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.8028e-05
go_gc_duration_seconds{quantile="0.25"} 7.5145e-05
go_gc_duration_seconds{quantile="0.5"} 8.5528e-05
go_gc_duration_seconds{quantile="0.75"} 9.9265e-05
go_gc_duration_seconds{quantile="1"} 0.000184435
go_gc_duration_seconds_sum 0.002283468
go_gc_duration_seconds_count 25
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G
HTTP /metrics
vesari
bwplotka
Prometheus Go Collector
# HELP brokenapp_operation_latency_seconds Tracks the latencies for calls.
# TYPE brokenapp_operation_latency_seconds histogram
brokenapp_operation_latency_seconds_bucket{le="0.01"} 0
brokenapp_operation_latency_seconds_bucket{le="0.05"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.1"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.3"} 1755
brokenapp_operation_latency_seconds_bucket{le="0.6"} 1755
brokenapp_operation_latency_seconds_bucket{le="1"} 1755
brokenapp_operation_latency_seconds_bucket{le="3"} 1755
brokenapp_operation_latency_seconds_bucket{le="6"} 1755
brokenapp_operation_latency_seconds_bucket{le="9"} 1755
brokenapp_operation_latency_seconds_bucket{le="20"} 1755
brokenapp_operation_latency_seconds_bucket{le="+Inf"} 1755
brokenapp_operation_latency_seconds_sum 29.664131651999906
brokenapp_operation_latency_seconds_count 1755
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/prometheus/client_golang/tutorials/runtime/wheelofmisfortune",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.8028e-05
go_gc_duration_seconds{quantile="0.25"} 7.5145e-05
go_gc_duration_seconds{quantile="0.5"} 8.5528e-05
go_gc_duration_seconds{quantile="0.75"} 9.9265e-05
go_gc_duration_seconds{quantile="1"} 0.000184435
go_gc_duration_seconds_sum 0.002283468
go_gc_duration_seconds_count 25
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the G
HTTP /metrics
vesari
bwplotka
Prometheus Go Collector: Default Metrics
vesari
bwplotka
Prometheus Go Collector: Default Metrics
vesari
bwplotka
Prometheus Go Collector: Default Metrics
vesari
bwplotka
Gaps?
That’s it? What about…?
vesari
bwplotka
Welcome runtime/metrics!
vesari
bwplotka
Welcome runtime/metrics!
…and more useful metrics, evolving with every Go version!
vesari
bwplotka
Prometheus Go Collector: runtime/metrics
vesari
bwplotka
Prometheus Go Collector: runtime/metrics
vesari
bwplotka
Prometheus Go Collector: runtime/metrics
Are those valid Prometheus metric names?
vesari
bwplotka
Prometheus Go Collector: runtime/metrics
vesari
bwplotka
Tip
The “_total” suffix immediately tells you a metric is cumulative e.g. counting bytes since the program start!
Prometheus Go Collector: runtime/metrics
vesari
bwplotka
All metrics
vesari
bwplotka
All metrics
vesari
bwplotka
All metrics
vesari
bwplotka
Memstats metrics are calculated from runtime/metrics
# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 10241231
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 410241587
vesari
bwplotka
All metrics
vesari
bwplotka
Tip
Every exposed metric on fleet adds $ to the monitoring bill (lower than other o11y signals, but still)
All metrics - not ideal by default
vesari
bwplotka
Prometheus Go Collector: Pick Your Metrics
vesari
bwplotka
Prometheus Go Collector: Pick Your Metrics
vesari
bwplotka
Prometheus Go Collector: Pick Your Metrics
Wait..
But what metrics should I pick?
vesari
bwplotka
Recommended Metrics
vesari
bwplotka
Recommended runtime/metrics
vesari
bwplotka
Result: Enhanced Go Collector Default!
vesari
bwplotka
Warning
Renaming metrics is not trivial!
(learning resources and downstream automation like alerting, recording, dashboards, self-healing, etc.)
Result: Enhanced Go Collector Default!
vesari
bwplotka
Result: Enhanced Go Collector Default!
Default should be as close to Go team recommendations as possible!
vesari
bwplotka
What Metrics to Collect? How to Act on Them?
vesari
bwplotka
What did I just deploy?
vesari
bwplotka
Go Version Information
go_info
Metric for: Go build info e.g. version of Go.
Collect: to keep track of Go versions in your applications.
Act: by upgrading Go environment in your application.
vesari
bwplotka
GOMAXPROCS
go_sched_gomaxprocs_threads
Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)
Collect: to check the parallelism in an application.
Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.
vesari
bwplotka
GOMAXPROCS
go_sched_gomaxprocs_threads
Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)
Collect: to check the parallelism in an application.
Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.
vesari
bwplotka
GOMAXPROCS
go_sched_gomaxprocs_threads
Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)
Collect: to check the parallelism in an application.
Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.
vesari
bwplotka
GOMAXPROCS
go_sched_gomaxprocs_threads
Metric for: GOMAXPROCS setting (/sched/gomaxprocs:threads)
Collect: to check the parallelism in an application.
Act: by adjusting GOMAXPROCS to optimize concurrency and CPU resources utilization.
vesari
bwplotka
GOGC
GOGC = 100
+100%
go_gc_gogc_percent
Metric for: GOGC (/gc/gogc:percent).
Collect: to understand GC behaviour.
Act: by tuning GOGC.
vesari
bwplotka
GOGC
GOGC = 50
+50%
(more frequent)
+120%
(less frequent)
GOGC = 120
go_gc_gogc_percent
Metric for: GOGC (/gc/gogc:percent).
Collect: to understand GC behaviour.
Act: by tuning GOGC.
vesari
bwplotka
GOGC
GOGC
70
50
100
go_gc_gogc_percent
Metric for: GOGC (/gc/gogc:percent).
Collect: to understand GC behaviour.
Act: by tuning GOGC.
vesari
bwplotka
GOGC
GOGC
70
50
100
go_gc_gogc_percent
Metric for: GOGC (/gc/gogc:percent).
Collect: to understand GC behaviour.
Act: by tuning GOGC.
vesari
bwplotka
GOGC
GOGC
70
50
100
go_gc_gogc_percent
Metric for: GOGC (/gc/gogc:percent).
Collect: to understand GC behaviour.
Act: by tuning GOGC.
vesari
bwplotka
GOMEMLIMIT
GOGC = tooHighValue
+ tooHighValue%
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
GOMEMLIMIT
GOMEMLIMIT
GOGC = tooHighValue
+ tooHighValue%
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
GOMEMLIMIT
GOMEMLIMIT
+ okValue%
containers
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
GOMEMLIMIT
GOMEMLIMIT
GOMEMLIMIT
GOMEMLIMIT
GOMEMLIMIT
gomemlimit
G o m e m l i m i t
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
GOMEMLIMIT
X
X
too low GOMEMLIMIT
thrashing!!
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
GOMEMLIMIT
AUTOGOMEMLIMIT
70 MiB
50 MiB
100 MiB
go_gc_gomemlimit_bytes
Metric for: GOMEMLIMIT (/gc/gomemlimit:bytes).
Collect: to track the soft memory limit of the runtime
Act: by tuning GOMEMLIMIT
vesari
bwplotka
Is My Concurrency Healthy?
vesari
bwplotka
Number of goroutines
!!!
!!!
go_goroutines
Metric for: the number of currently existing goroutines (/sched/goroutines:goroutines)
Collect: in many circumstances, especially in highly-concurrent applications, also to troubleshoot OOM kills.
Act: by correcting/refactoring the concurrency patterns in your code.
vesari
bwplotka
Number of goroutines
!!!
!!!
go_goroutines
Metric for: the number of currently existing goroutines (/sched/goroutines:goroutines)
Collect: in many circumstances, especially in highly-concurrent applications, also to troubleshoot OOM kills.
Act: by correcting/refactoring the concurrency patterns in your code.
vesari
bwplotka
How much time goroutines wait for scheduling
go_sched_latencies_seconds
Metric for: the distribution of time the goroutines spend runnable but not running (/sched/latencies:seconds).
Collect: in scenarios requiring visibility into the latency of the overall system load and when an uneven load is suspected.
Act: by optimizing your code concurrency patterns.
vesari
bwplotka
Is My Memory OK?
???
vesari
bwplotka
In-use Heap Allocated Memory
go_memstats_alloc_bytes or go_memstats_heap_alloc_bytes
Metric for: the number of bytes allocated and currently in use (/memory/classes/heap/inuse:bytes)
Collect: to plan for memory usage and in case of degraded performance or crashes.
Act: by eliminating possible memory leaks, optimizing GC configurations.
vesari
bwplotka
In-use Heap Allocated Memory
go_memstats_alloc_bytes
Metric for: the number of bytes allocated and currently in use (/memory/classes/heap/inuse:bytes)
Collect: to plan for memory usage and in case of degraded performance or crashes.
Act: by eliminating possible memory leaks, optimizing GC configurations.
vesari
bwplotka
Total Heap Allocated Memory
go_memstats_alloc_bytes_total
Metric for: number of heap bytes allocated since the program start, even if released already (/memory/classes/heap/total_alloc:bytes)
Collect: to understand allocation patterns and GC resource cost impact
Act: by debugging memory leaks, fine-tuning GC configuration
~30 KB/s allocated!
vesari
bwplotka
In-use & Total Heap Allocated Objects
go_memstats_mallocs_total
Metric for: the total number of heap objects allocated, semantically a counter version for go_memstats_heap_objects (/gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects).
Collect: to understand GC behaviour and memory allocation cost impact; identify memory leaks
Act: by tuning GC
go_memstats_heap_objects
Metric for: the number of allocated heap objects (/gc/heap/objects:objects)
Collect: to identify memory leaks and understand memory footprint
Act: by refactoring code, tuning GC settings
objects have
different sizes
vesari
bwplotka
In-use Stack Allocated Memory
go_memstats_stack_sys_bytes
Metric for: the memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use (/memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes).
Collect: to detect excessive use of stack memory
Act: by optimizing code (concurrency, function calls etc)
to each goroutine its stack
vesari
bwplotka
OS Allocated Memory vs What Runtime Uses
1000010001110101010111110011111
1000111111110101011101001
go_memstats_sys_bytes
go_memstats_heap_release_bytes
Metrics for: respectively the number of bytes obtained from the system i.e. total Go runtime memory footprint (/memory/classes/total:bytes) and Memory that is completely free and has been returned to the underlying system (/memory/classes/heap/released:bytes)
Collect: to assess the overall resource demand
Act: by optimizing memory usage, through GOMEMLIMIT tuning
vesari
bwplotka
OS Allocated Memory vs What Runtime Uses
/memory/classes/total:bytes /memory/classes/heap/released:bytes
physical
(according to Runtime)
go_memstats_sys_bytes
go_memstats_heap_release_bytes
Metrics for: respectively the number of bytes obtained from the system i.e. total Go runtime memory footprint (/memory/classes/total:bytes) and Memory that is completely free and has been returned to the underlying system (/memory/classes/heap/released:bytes)
Collect: to assess the overall resource demand
Act: by optimizing memory usage, through GOMEMLIMIT tuning
vesari
bwplotka
Garbage Collection Bytes Target
next
?
How many bytes
for the
go_memstats_next_gc_bytes
Metric for: the number of heap bytes at which the next GC will take place (/gc/heap/goal:bytes)
Collect: to check on the balance between memory usage and application performance
Act: by tuning GC settings
vesari
bwplotka
Summary
vesari
bwplotka
Learnings
Have fun making your code more efficient faster and cheaper!
vesari
bwplotka
Thank You! Questions?
Arianna Vespri, Independent SWE
Bartek Płotka, Senior SWE at Google
Kudos to (hidden heroes)
vesari
bwplotka
Bonus: Let’s get our hands dirty!
Demo Time!
vesari
bwplotka