Five Steps to Make Your Go Code Faster & More Efficient
Bartłomiej Płotka
Senior Software Engineer at Google
4 Feb 2023 | FOSDEM Go Dev Room
Bartłomiej (Bartek) Płotka
Senior Software Engineer @ Google
Bartłomiej (Bartek) Płotka
Senior Software Engineer @ Google
A story From Thanos Project
A story From Thanos Project
Ups!
Ups!
Solutions?
Let’s “tune” configuration!
Solutions?
Vertical Scale Up
128 GB
Solutions?
Horizontal Scale Out
Solutions?
Let’s install different OSS system…
Solutions?
Let’s use vendor…
Meanwhile in the code…
@bwplotka
What helped?
Optimizing on Algorithm and Code Level!
What helped?
Optimizing on Algorithm and Code Level!
Software Efficiency Enables Things!
…yet we don’t focus on it with the right mindset!
Expect Quiz!
Quiz!
Chance to win a signed copy of my “Efficient Go” book today. Deadline 4.02.2023 4:00pm (30m)
No rush: Random selection among those who responded with correct answers!
Link at the end of the talk! 🙈
Five Pragmatic Steps towards
More Efficient Go Programs
Step #1: Use TFBO: Test Fix Benchmark Optimize
Use Efficiency Aware Development Flow!
Step #1: Use TFBO: Test Fix Benchmark Optimize
TFBO = TDD wrapped with BDO
(benchmark driven optimizations)
(test driven development)
Step #1: Use TFBO: Test Fix Benchmark Optimize
TFBO = TDD wrapped with BDO
Step #1: Use TFBO: Test Fix Benchmark Optimize
TFBO = TDD wrapped with BDO
Step #1: Use TFBO: Test Fix Benchmark Optimize
TFBO = TDD wrapped with BDO
Step #1: Use TFBO: Test Fix Benchmark Optimize
TFBO = TDD wrapped with BDO
Step #2: Understand current efficiency level!
Benchmark First!
Step #2: Understand current efficiency level!
Micro benchmarks
Step #2: Understand current efficiency level!
Micro benchmarks
$ go test -run '^$' -bench '^BenchmarkCreate$'
Step #2: Understand current efficiency level!
Micro benchmarks
$ export ver=v1 && \
go test -run '^$' -bench '^BenchmarkCreate$' \
-benchtime 1s -count 6 \
-cpu 1 -benchmem \
| tee ${ver}.txt
Step #2: Understand current efficiency level!
Micro benchmarks
$ export ver=v1 && \
go test -run '^$' -bench '^BenchmarkCreate$' \
-benchtime 1s -count 6 \
-cpu 1 -benchmem \
| tee ${ver}.txt
Step #2: Understand current efficiency level!
Micro benchmarks
$ benchstat v1.txt
goos: linux
goarch: amd64
cpu: AMD EPYC 7B12
│ v1.txt │
│ sec/op │
Create 86.64m ± 1%
│ v1.txt │
│ B/op │
Create 83.96Mi ± 0%
│ v1.txt │
│ allocs/op │
Create 39.00 ± 0%
Step #3: Understand Your Efficiency Requirements
No Expectations?
“Not sure if that’s fast enough… YOLO” 🙈
$ benchstat v1.txt
goos: linux
goarch: amd64
cpu: AMD EPYC 7B12
│ v1.txt │
│ sec/op │
Create 86.64m ± 1%
│ v1.txt │
│ B/op │
Create 83.96Mi ± 0%
│ v1.txt │
│ allocs/op │
Create 39.00 ± 0%
Step #3: Understand Your Efficiency Requirements
No clear Expectations?
“Program should be fast and use reasonable amount of memory” 🤔
$ benchstat v1.txt
goos: linux
goarch: amd64
cpu: AMD EPYC 7B12
│ v1.txt │
│ sec/op │
Create 86.64m ± 1%
│ v1.txt │
│ B/op │
Create 83.96Mi ± 0%
│ v1.txt │
│ allocs/op │
Create 39.00 ± 0%
Step #3: Understand Your Efficiency Requirements
RAER: Resource Aware Efficiency Requirements
Step #3: Understand Your Efficiency Requirements
RAER: Resource Aware Efficiency Requirements
API should have:
Runtime Complexity: ~34.4 * N^2 nanoseconds
Space (RAM) Complexity: ~2.3 * N bytes
Step #3: Understand Your Efficiency Requirements
RAER: Resource Aware Efficiency Requirements
create() should have:
Runtime Complexity: 1 million * ~30 nanoseconds
Space (RAM) Complexity: ~1 million * ~16 bytes
Step #3: Understand Your Efficiency Requirements
RAER: Resource Aware Efficiency Requirements
create() should have:
Runtime Complexity: 1 million * ~30 nanoseconds = 30ms
Space (RAM) Complexity: ~1 million * ~16 bytes = 15 MB
$ benchstat v1.txt
goos: linux
goarch: amd64
cpu: AMD EPYC 7B12
│ v1.txt │
│ sec/op │
Create 86.64m ± 1%
│ v1.txt │
│ B/op │
Create 83.96Mi ± 0%
│ v1.txt │
│ allocs/op │
Create 39.00 ± 0%
Step #4: Focus on the Hot Path
Do Profiling!
Step #4: Focus on the Hot Path
Do Profiling!
$ export ver=v1 && \
go test -run '^$' -bench '^BenchmarkCreate$' \
-benchtime 1s -count 6 \
-cpu 1 -benchmem \
-memprofile=${ver}.mem.pprof \ -cpuprofile=${ver}.cpu.pprof \
| tee ${ver}.txt
Step #4: Focus on the Hot Path
Do Profiling!
go tool pprof -http :8080 v1.cpu.pprof
Step #4: Focus on the Hot Path
Do Profiling!
go tool pprof -http :8080 v1.mem.pprof
Step #5: Try optimizing that part & repeat!
Append (from docs):
Step #5: Try optimizing that part & repeat!
Repeat!
Repeat!
Repeat!
$ export ver=v2 && \
go test -run '^$' -bench '^BenchmarkCreate$' \
-benchtime 1s -count 5 \
-cpu 1 -benchmem \
| tee ${ver}.txt
Repeat!
$ benchstat v1.txt v2.txt
cpu: AMD EPYC 7B12
│ v1.txt │ v2.txt │
│ sec/op │ sec/op vs base │
Create 87.71m ± 6% 11.56m ± 3% -86.82% (p=0.000 n=6+10)
│ v1.txt │ v2.txt │
│ B/op │ B/op vs base │
Create 83.96Mi ± 0% 15.27Mi ± 0% -81.82% (n=6+10)
│ v1.txt │ v2.txt │
│ allocs/op │ allocs/op vs base │
Create 39.000 ± 0% 1.000 ± 0% -97.44% (n=6+10)
LGTM 😍😍😍!
$ benchstat v1.txt v2.txt
cpu: AMD EPYC 7B12
│ v1.txt │ v2.txt │
│ sec/op │ sec/op vs base │
Create 87.71m ± 6% 11.56m ± 3% -86.82% (p=0.000 n=6+10)
│ v1.txt │ v2.txt │
│ B/op │ B/op vs base │
Create 83.96Mi ± 0% 15.27Mi ± 0% -81.82% (n=6+10)
│ v1.txt │ v2.txt │
│ allocs/op │ allocs/op vs base │
Create 39.000 ± 0% 1.000 ± 0% -97.44% (n=6+10)
create() should have:
Runtime Complexity: 1 million * ~30 nanoseconds = 30ms
Space (RAM) Complexity: ~1 million * ~16 bytes = 15 MB
Lessons
Optimizing Software Efficiency might be easier than you think! [if done right]
Thank You! Questions?
Quiz!
Chance to win a signed copy of my “Efficient Go” book today. Deadline 4.02.2023 4:00pm (30m)
No rush: Random selection among those who responded with correct answers!
Link: https://bwplotka.dev/quiz.html