1 of 49

Five Steps to Make Your Go Code Faster & More Efficient

Bartłomiej Płotka

Senior Software Engineer at Google

4 Feb 2023 | FOSDEM Go Dev Room

2 of 49

Bartłomiej (Bartek) Płotka

Senior Software Engineer @ Google

3 of 49

Bartłomiej (Bartek) Płotka

Senior Software Engineer @ Google

4 of 49

A story From Thanos Project

5 of 49

A story From Thanos Project

6 of 49

Ups!

7 of 49

Ups!

8 of 49

Solutions?

Let’s “tune” configuration!

9 of 49

Solutions?

Vertical Scale Up

128 GB

10 of 49

Solutions?

Horizontal Scale Out

11 of 49

Solutions?

Let’s install different OSS system…

12 of 49

Solutions?

Let’s use vendor…

13 of 49

Meanwhile in the code…

@bwplotka

14 of 49

What helped?

Optimizing on Algorithm and Code Level!

15 of 49

What helped?

Optimizing on Algorithm and Code Level!

16 of 49

Software Efficiency Enables Things!

…yet we don’t focus on it with the right mindset!

17 of 49

Expect Quiz!

Quiz!

Chance to win a signed copy of my “Efficient Go” book today. Deadline 4.02.2023 4:00pm (30m)

No rush: Random selection among those who responded with correct answers!

Link at the end of the talk! 🙈

18 of 49

Five Pragmatic Steps towards

More Efficient Go Programs

19 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

Use Efficiency Aware Development Flow!

20 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

TFBO = TDD wrapped with BDO

(benchmark driven optimizations)

(test driven development)

21 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

TFBO = TDD wrapped with BDO

22 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

TFBO = TDD wrapped with BDO

23 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

TFBO = TDD wrapped with BDO

24 of 49

Step #1: Use TFBO: Test Fix Benchmark Optimize

TFBO = TDD wrapped with BDO

25 of 49

Step #2: Understand current efficiency level!

Benchmark First!

26 of 49

Step #2: Understand current efficiency level!

Micro benchmarks

27 of 49

Step #2: Understand current efficiency level!

Micro benchmarks

$ go test -run '^$' -bench '^BenchmarkCreate$'

28 of 49

Step #2: Understand current efficiency level!

Micro benchmarks

$ export ver=v1 && \

go test -run '^$' -bench '^BenchmarkCreate$' \

-benchtime 1s -count 6 \

-cpu 1 -benchmem \

| tee ${ver}.txt

29 of 49

Step #2: Understand current efficiency level!

Micro benchmarks

$ export ver=v1 && \

go test -run '^$' -bench '^BenchmarkCreate$' \

-benchtime 1s -count 6 \

-cpu 1 -benchmem \

| tee ${ver}.txt

30 of 49

Step #2: Understand current efficiency level!

Micro benchmarks

$ benchstat v1.txt

goos: linux

goarch: amd64

cpu: AMD EPYC 7B12

│ v1.txt │

│ sec/op │

Create 86.64m ± 1%

│ v1.txt │

│ B/op │

Create 83.96Mi ± 0%

│ v1.txt │

│ allocs/op │

Create 39.00 ± 0%

31 of 49

Step #3: Understand Your Efficiency Requirements

No Expectations?

“Not sure if that’s fast enough… YOLO” 🙈

$ benchstat v1.txt

goos: linux

goarch: amd64

cpu: AMD EPYC 7B12

│ v1.txt │

│ sec/op │

Create 86.64m ± 1%

│ v1.txt │

│ B/op │

Create 83.96Mi ± 0%

│ v1.txt │

│ allocs/op │

Create 39.00 ± 0%

32 of 49

Step #3: Understand Your Efficiency Requirements

No clear Expectations?

“Program should be fast and use reasonable amount of memory” 🤔

$ benchstat v1.txt

goos: linux

goarch: amd64

cpu: AMD EPYC 7B12

│ v1.txt │

│ sec/op │

Create 86.64m ± 1%

│ v1.txt │

│ B/op │

Create 83.96Mi ± 0%

│ v1.txt │

│ allocs/op │

Create 39.00 ± 0%

33 of 49

Step #3: Understand Your Efficiency Requirements

RAER: Resource Aware Efficiency Requirements

34 of 49

Step #3: Understand Your Efficiency Requirements

RAER: Resource Aware Efficiency Requirements

API should have:

Runtime Complexity: ~34.4 * N^2 nanoseconds

Space (RAM) Complexity: ~2.3 * N bytes

35 of 49

Step #3: Understand Your Efficiency Requirements

RAER: Resource Aware Efficiency Requirements

create() should have:

Runtime Complexity: 1 million * ~30 nanoseconds

Space (RAM) Complexity: ~1 million * ~16 bytes

36 of 49

Step #3: Understand Your Efficiency Requirements

RAER: Resource Aware Efficiency Requirements

create() should have:

Runtime Complexity: 1 million * ~30 nanoseconds = 30ms

Space (RAM) Complexity: ~1 million * ~16 bytes = 15 MB

$ benchstat v1.txt

goos: linux

goarch: amd64

cpu: AMD EPYC 7B12

│ v1.txt │

│ sec/op │

Create 86.64m ± 1%

│ v1.txt │

│ B/op │

Create 83.96Mi ± 0%

│ v1.txt │

│ allocs/op │

Create 39.00 ± 0%

37 of 49

Step #4: Focus on the Hot Path

Do Profiling!

38 of 49

Step #4: Focus on the Hot Path

Do Profiling!

$ export ver=v1 && \

go test -run '^$' -bench '^BenchmarkCreate$' \

-benchtime 1s -count 6 \

-cpu 1 -benchmem \

-memprofile=${ver}.mem.pprof \ -cpuprofile=${ver}.cpu.pprof \

| tee ${ver}.txt

39 of 49

Step #4: Focus on the Hot Path

Do Profiling!

go tool pprof -http :8080 v1.cpu.pprof

40 of 49

Step #4: Focus on the Hot Path

Do Profiling!

go tool pprof -http :8080 v1.mem.pprof

41 of 49

Step #5: Try optimizing that part & repeat!

Append (from docs):

  1. If array is full, then resize it.
  2. Add “FOSDEM” to last elem of array.
  3. Return new or same array.

42 of 49

Step #5: Try optimizing that part & repeat!

43 of 49

Repeat!

44 of 49

Repeat!

45 of 49

Repeat!

$ export ver=v2 && \

go test -run '^$' -bench '^BenchmarkCreate$' \

-benchtime 1s -count 5 \

-cpu 1 -benchmem \

| tee ${ver}.txt

46 of 49

Repeat!

$ benchstat v1.txt v2.txt

cpu: AMD EPYC 7B12

│ v1.txt │ v2.txt │

│ sec/op │ sec/op vs base │

Create 87.71m ± 6% 11.56m ± 3% -86.82% (p=0.000 n=6+10)

│ v1.txt │ v2.txt │

│ B/op │ B/op vs base │

Create 83.96Mi ± 0% 15.27Mi ± 0% -81.82% (n=6+10)

│ v1.txt │ v2.txt │

│ allocs/op │ allocs/op vs base │

Create 39.000 ± 0% 1.000 ± 0% -97.44% (n=6+10)

47 of 49

LGTM 😍😍😍!

$ benchstat v1.txt v2.txt

cpu: AMD EPYC 7B12

│ v1.txt │ v2.txt │

│ sec/op │ sec/op vs base │

Create 87.71m ± 6% 11.56m ± 3% -86.82% (p=0.000 n=6+10)

│ v1.txt │ v2.txt │

│ B/op │ B/op vs base │

Create 83.96Mi ± 0% 15.27Mi ± 0% -81.82% (n=6+10)

│ v1.txt │ v2.txt │

│ allocs/op │ allocs/op vs base │

Create 39.000 ± 0% 1.000 ± 0% -97.44% (n=6+10)

create() should have:

Runtime Complexity: 1 million * ~30 nanoseconds = 30ms

Space (RAM) Complexity: ~1 million * ~16 bytes = 15 MB

48 of 49

Lessons

Optimizing Software Efficiency might be easier than you think! [if done right]

    • Follow Pragmatic TFBO Flow
    • Benchmark (go test -bench)
    • Set Clear Goals (RAER)
    • Profile (pprof)
    • Understand what is happening under� the hood (tip: usually generic = slow)

49 of 49

Thank You! Questions?

Quiz!

Chance to win a signed copy of my “Efficient Go” book today. Deadline 4.02.2023 4:00pm (30m)

No rush: Random selection among those who responded with correct answers!

Link: https://bwplotka.dev/quiz.html