1 of 58

Go-ing Easy on Memory

Writing GC-friendly code

Sümer Cip

Senior SW Eng. @Platform.sh

2 of 58

👋 Hi!

I’m Sümer Cip.

A software engineer and a full-time dad of 3.

I contribute to Open Source as much as I can.

I’m interested in observability, distributed systems, and databases.

sumercip.com

3 of 58

  • Tons of theoretical material around Garbage Collection
  • Less emphasis on writing GC-aware code
    • Even though the concept is largely language-agnostic
  • Aim to be as practical as possible

Motivation

4 of 58

Real-world data

5 of 58

“Sometimes I joke about avoiding dynamic memory management essentially being the the only thing that matters for low latency…”

P99 Conf 2024 - Pekka Enberg - CEO of Turso

https://www.youtube.com/watch?v=Bbq8ER_GXrM

6 of 58

Real-world data

Bryan Boreham, FOSDEM 2018, Make your Go Faster!

7 of 58

Real-world data

Andrey Sibiryov, SreCON17 Asia/Australia: Golang's Garbage

8 of 58

Real-world data

~%50

~%25

CPU Profile Flamegraph

9 of 58

Real-world data

https://www.datadoghq.com/blog/engineering/timeseries-indexing-at-scale/

from Datadog’s Blog:

10 of 58

Real-world data

GC-friendly libraries

11 of 58

Real-world data

Other languages

12 of 58

Understanding Memory

13 of 58

Basics

  • Stack
    • preallocated (dynamically grow)
    • faster alloc (LIFO)
    • faster access (Cache-friendly)
    • managed by compiler
  • Heap
    • dynamically allocated
    • managed by GC

Stack & Heap

14 of 58

Basics

  • Stack is usually on L1 (or in registers)
  • L1: ~1ns, L2: ~4ns, RAM: ~75ns
    • Usually hidden cost
  • So, do your best to be in Stack

Stack & Heap

15 of 58

Understanding GC

16 of 58

Basics

  • GC is complex, due to wide-range of requirements
    • allocation rate/volume
    • high/low concurrency
    • fragmentation
    • pacing (backpressure)
    • minimal latency

How GC works?

17 of 58

Basics

How GC works?

18 of 58

Basics

How GC works?

19 of 58

Basics

How GC works?

20 of 58

Basics

  • Stop-The-World and GC overhead
  • GC act differently on pressure
  • CPU cache flushes

GC Unpredictability

Before GC

After GC

21 of 58

Wrap-up

  • GC can have high impact
    • Real-world data backs it up (~40-50%)
    • Usually goes unnoticed
  • GC can be unpredictable

22 of 58

Let’s keep GC Happy

23 of 58

Reuse, Reduce, Recycle.

FOSDEM 2018 - Bryan Boreham - Make your Go faster!

https://www.youtube.com/watch?v=NS1hmEWv4Ac

24 of 58

Reduce

25 of 58

Reduce

  • It is not all about reduce’ing allocations or pointers
  • Reducing size almost always have compounding benefits.
    • Python reduced base object size 2.7->3.12 by ~%75 and this resulted ~%60 improvement on runtime.

https://sumercip.com/posts/making-python-fitter-and-faster/#2-better-memory-management

26 of 58

  • return escapes to Heap

Reduce

- type Reader interface {

- Read(n int) ([]byte, err error)

- }

+ type Reader interface {

+ Read(p []byte) (n int, err error)

+}

Stack vs Heap

GopherCon SG 2019 - Understanding Allocations: the Stack and the Heap - Jakob Walker

https://www.youtube.com/watch?v=ZMZpH4yT7M0

27 of 58

  • Closure variables can escape to Heap

Reduce

func doStuff() func() {

bigArray := make([]int, 1_000)

return func() {

// do stuff with bigArray

}

}

Stack vs Heap

28 of 58

Reduce

  • interface{} and generics escapes to Heap
    • compiler does not know the type, thus size
    • they are also slower
  • If possible, use concrete types on hot paths

interface{} and generics

29 of 58

Reduce

  • GC overhead is linear with # pointers
    • Skip entire regions without ptrs
  • More cache-friendly
  • Compiler generates extra checks

Avoid Pointers

30 of 58

Reduce

  • Sometimes, pointers can go unnoticed
  • string, time.Time, slices all contain pointers
    • careful storing these
  • Maps with reference types
    • Slice values
    • String keys

Avoid Pointers

type Time struct {

wall uint64

ext int64

loc *Location

}

type StringHeader struct {

Data uintptr

Len int

}

31 of 58

  • Prefer non-reference types for map keys/values
    • Prefer struct{} for map keys instead of string

Reduce

type TenantKey struct {

tenantID int

regionID int

}

tenantKey = TenantKey{1, 2}

- myMap[tenantKeyStr] = value

+ myMap[tenantKey] = value

Avoid Pointers

32 of 58

  • Try keeping map key/value sizes under <=128 bytes
  • 128 bytes is special
    • buckets internally store pointers

Reduce

type MyStruct struct {

- a [17]int64 // >128 bytes

+ a [16]int64 // <=128 bytes

}

m := make(map[int]MyStruct)

Avoid Pointers

33 of 58

  • “Copying is expensive” is usually a myth
  • Copying cache line is same as copying a pointer
  • Cache line is usually 64 bytes on x86

Reduce

type MyStruct struct {

A string // 16 bytes

B string // 16 bytes

C string // 16 bytes

D string // 16 bytes

}

- func DoStuff(a *MyStruct) {}

+ func DoStuff(a MyStruct) {}

Avoid Pointers

34 of 58

Reduce

  • Prefer to use non-pointer versions data structures (e.g: linked list)
  • Favor contiguous memory when possible

Avoid Pointers

type MyNode struct {

- value int

- next *MyNode

}

+ values []int

35 of 58

  • Avoid holding references inside large objects

Reduce

type Small struct {}

type Big struct {

...

smallStruct Small

...

}

- cache(&bigStruct.smallStruct)

+ cache(bigStruct.smallStruct)

Avoid Pointers

36 of 58

Reduce

  • Remember zero allocation libraries? Use them.
    • rs/zerolog, uber-go/zap, coocood/freecache, many more…
  • Wonder how they work?
    • main trick: preallocate and use integer indexes to reference objects in a collection
    • basically: Avoid pointers…

Avoid Pointers

37 of 58

Reuse

38 of 58

  • sync.Pool is not a freelist
    • Values can be GC’d
  • Reuse temporary objects in-between GC cycles
  • See it is in sync ?
    • Offers concurrent access
    • No locks
    • Requires at least 2 GC cycles
  • Very useful, a bit misunderstood?

Reuse

var pool = sync.Pool{

New: func() interface{} {

return new(MyStruct)

},

}

func processData(data []byte) {

buf := pool.Get().(*MyStruct)

defer pool.Put(buf)

// use buffer

}

sync.Pool

39 of 58

  • Be careful with non-pointers
  • Slice header escapes to heap as Put accepts an interface{}
  • Risk: You optimized for allocations but introduced one which will wait for next GC cycle
  • Might be hard to spot on prod

Reuse

var pool = sync.Pool{

New: func() interface{} {

- return []byte{}

+ return new([]byte)

},

}

func processData(data []byte) {

- buf := pool.Get().([]byte)

+ buf := pool.Get().(*[]byte)

...

pool.Put(buf)

}

sync.Pool

40 of 58

Reuse

sync.Pool

41 of 58

  • Reuse slices
  • Pre-allocate
    • resize factor
    • more fragmentation
  • Use strings.Builder
    • No intermediate allocations

Reuse

a := []int{1, 2, 3, 4, 5}

- a = append(a, 10, 20, 30)

+ a = append(a[:0], 10, 20, 30)

- largeMap := make(map[int]string)

+ largeMap := make(

map[int]string, 1000)

- s = s + “hi!”

+ builder.WriteString(“hi”)

42 of 58

Recycle

43 of 58

Recycle

  • GOGC
  • GOMEMLIMIT

Tune GC

44 of 58

Recycle

Tune GC

45 of 58

Tools Overview

46 of 58

  • This is overview for a reason.
  • Go is unparalleled when it comes to observability.

47 of 58

Tools Overview

  • Types
    • Heap Live Objects/Size
      • debug memory leaks
    • Allocations Count/Size
      • observe alloc. frequency

Profiling Memory

48 of 58

Tools Overview

Profiling

Bryan Boreham, FOSDEM 2018, Make your Go Faster!

49 of 58

Tools Overview

  • go build -gcflags="-m=1|2|3" main.go

Escape Analysis

50 of 58

Tools Overview

  • Underrated
  • Most cinematic visualization
    • time and important events
    • GC pressure/phases and latency
    • Lock contention… many more
  • Safe on production
    • With >=1.21, overhead drops to ~1-2% (Kudos Felix Geisendörfer!)

Execution tracer

https://go.dev/blog/execution-traces-2024

51 of 58

Tools Overview

Execution tracer

52 of 58

Tools Overview

  • GODEBUG=gctrace=1 ./your_program
    • CPU/Wall time/percentage of GC
    • Frequency
    • ...

GODEBUG=gctrace=1

53 of 58

Wrap-up

54 of 58

Wrap-up

  • Reduce
    • Prefer stack over heap, if you can
    • Avoid pointers. Make it a habit!
    • Use interface{} and generics sparingly
  • Reuse
    • sync.Pool is your friend, but understand it well
    • Reuse/Pre-allocate slices/maps whenever possible
  • Use Observability tools as much as possible
    • Profile & Benchmark & Execution tracer

55 of 58

Bonus

56 of 58

Bonus

Memory regions (Nov 8, 2024)

57 of 58

Bonus

  • Memory pool
  • Better than arena
  • No/Minimal GC impact

Memory regions

58 of 58

Thanks!

sumercip.com

🎊