1 of 28

From Python to Go,

Ben Bangert

Mozilla

Oct 14, 2015

2 of 28

My Path thus Far

From Python to Go

3 of 28

Python since 2004

Wrote the first part of Pylons (Routes) in 2005
Merged Pylons with repoze.bfg -> Pyramid in 2010
Pyramid still rocks!
My work at companies in this time was in Python
Started working at Mozilla in 2011

4 of 28

Life at Mozilla

First Project - Queuey

Notification Storage System at Scale
Python + Cassandra
Never used/deployed
Kazoo supporting lib was!

5 of 28

Life at Mozilla

Second Project - heka

Learned Go in 2012
Go 1.0 just launched
Simple, fast language to get started in

Other Projects

Push system (first prototype in Go)

Current Project

Push System (continuing iterations)

6 of 28

Go is awesome!

Simple language
Nice concurrency primitives
Fairly concise code
Much faster than Python
… and whatever else the Internet has said

7 of 28

Problems in Go-land

8 of 28

Goroutine Memory Use

Ver 1.2

8kb stack size (over original 4kb)
Expensive for standard socket model

One reader
One Writer

Refactor Push to only have readers per conn, pool of writers

Ver 1.4

2kb stack size (with better contiguous stack sizing algo)
Will require large refactor of code to go back to newly optimal reader/writer per conn
Refactor still makes sense due to other issues found with writer pool

9 of 28

Debugging

Most errors are basic strings (Add’tl introspection requires error casting to pull out errors)
Basic strings lack context on call-stack
Many libraries use similar error strings in multiple places
Boilerplate for error handling is tedious
Live debugging option at time was gdb, not fun (godebug recently available looks like a great improvement)

10 of 28

11 of 28

12 of 28

Goroutine Leaks

Channels are neat, but dangerous
Deadlock detector doesn’t spot goroutine leaks
Go experts still can’t write leak-free code
Goroutine leaks are annoying to pin-down
Adding suitable select statements introduces more boilerplate

13 of 28

14 of 28

15 of 28

Testing

Everything should be an interface!
Can’t mock an argument unless its an interface

Many libraries don’t provide interfaces for complex structs, so you get to fabricate a complex interface so the whole thing can be mocked

Extensives interfaces needed all over, regardless of appropriateness
Very labor intensive to achieve high degree of testing (and many additional LoC)

16 of 28

17 of 28

A Return to Python

18 of 28

A quick Python prototype….

Used twisted/Python 2.7 using autobahn websocket library
First draft passing integration tests in 3 hours
Load-tested under PyPy, used dramatically less memory than Go
Complete implementation in 4 days
100% code coverage in 2 weeks (Go version never got more than 65%)
Mature libraries meant websockets ran better than before

19 of 28

Memory Use

Competitive with Go or better (especially if we count goroutine leaks)
M:N schedulers have their own, unpredictable, cost
Async model provides more predictable memory use (goroutine killing/spawning is frequently outside your control)

20 of 28

Performance

Go is generally 50-100x faster than CPython
PyPy is 5-25x faster than CPython
Being within 2-10x of Go is much more acceptable than 50-100x
Go SSL is implemented in pure-Go (very CPU intensive, latest Go improves this, Cloudflare has unmergeable Intel SSL assembly code with huge improvements)
PyPy is a game-changer

21 of 28

Concise, Debuggable Code

Small code-base means its easy to read relevant portions quickly
Tracebacks are captured in full, with local variables, in production (thanks Sentry!)
Async code has no global locks, no channel blocks (callback chains still aren’t fun)

22 of 28

Gotcha’s and Lessons Learned

RAM use of Python objects is rarely benchmarked

Wrote custom nose extension to measure memory use in integration tests
Blog postings on memory profiling are all outdated, so are tools….

Be aware of underlying implementation issues

Python objects are dicts. twisted class objects have lots of attributes, avoid adding more or you can double memory use per instance (connection)
twisted buffers are independent of kernel tcp buffers. know where data is to avoid excess per-conn state

In Go, we block instead, requiring more goroutines for similar functionality

23 of 28

Gotcha’s and Lessons Learned

Revisit Underlying assumptions

Just cause someone else blogged it, doesn’t mean it’s true
Don’t assume other smart people didn’t make a mistake, or their situation applies to you
Used Go for Push partly due to Urban Airship blog postings about Python not handling enough connections efficiently enough
PyPy changes the game
CPython itself has improved a lot over the years
We actually didn’t end up being CPU-bound, but memory-bound

PyPy makes a big difference in cutting memory use

24 of 28

Gotcha’s and Lessons Learned

SSL is Very Expensive

20kb is minimum SSL footprint per connection
Spikes to 36kb during use for 16kb recv/send buffers
Google has 10kb SSL memory secret

Bonus points if you can get them to admit how they did it!

25 of 28

Next?

26 of 28

Concluding thoughts….

CPython handles our use-case fine
PyPy lets us go where we need to
Go is not a big enough improvement over PyPy for the costs it imposes
Obviously it’s time for Rust

27 of 28

Thank you!

28 of 28

Questions?