1 of 28

From Python to Go,

Ben Bangert

Mozilla

Oct 14, 2015

2 of 28

My Path thus Far

From Python to Go

3 of 28

Python since 2004

  • Wrote the first part of Pylons (Routes) in 2005
  • Merged Pylons with repoze.bfg -> Pyramid in 2010
  • Pyramid still rocks!
  • My work at companies in this time was in Python
  • Started working at Mozilla in 2011

4 of 28

Life at Mozilla

  • First Project - Queuey
    • Notification Storage System at Scale
    • Python + Cassandra
    • Never used/deployed
    • Kazoo supporting lib was!

5 of 28

Life at Mozilla

  • Second Project - heka
    • Learned Go in 2012
    • Go 1.0 just launched
    • Simple, fast language to get started in
  • Other Projects
    • Push system (first prototype in Go)
  • Current Project
    • Push System (continuing iterations)

6 of 28

Go is awesome!

  • Simple language
  • Nice concurrency primitives
  • Fairly concise code
  • Much faster than Python
  • … and whatever else the Internet has said

7 of 28

Problems in Go-land

8 of 28

Goroutine Memory Use

  • Ver 1.2
    • 8kb stack size (over original 4kb)
    • Expensive for standard socket model
      • One reader
      • One Writer
    • Refactor Push to only have readers per conn, pool of writers
  • Ver 1.4
    • 2kb stack size (with better contiguous stack sizing algo)
    • Will require large refactor of code to go back to newly optimal reader/writer per conn
    • Refactor still makes sense due to other issues found with writer pool

9 of 28

Debugging

  • Most errors are basic strings (Add’tl introspection requires error casting to pull out errors)
  • Basic strings lack context on call-stack
  • Many libraries use similar error strings in multiple places
  • Boilerplate for error handling is tedious
  • Live debugging option at time was gdb, not fun (godebug recently available looks like a great improvement)

10 of 28

11 of 28

12 of 28

Goroutine Leaks

  • Channels are neat, but dangerous
  • Deadlock detector doesn’t spot goroutine leaks
  • Go experts still can’t write leak-free code
  • Goroutine leaks are annoying to pin-down
  • Adding suitable select statements introduces more boilerplate

13 of 28

14 of 28

15 of 28

Testing

  • Everything should be an interface!
  • Can’t mock an argument unless its an interface
    • Many libraries don’t provide interfaces for complex structs, so you get to fabricate a complex interface so the whole thing can be mocked
  • Extensives interfaces needed all over, regardless of appropriateness
  • Very labor intensive to achieve high degree of testing (and many additional LoC)

16 of 28

17 of 28

A Return to Python

18 of 28

A quick Python prototype….

  • Used twisted/Python 2.7 using autobahn websocket library
  • First draft passing integration tests in 3 hours
  • Load-tested under PyPy, used dramatically less memory than Go
  • Complete implementation in 4 days
  • 100% code coverage in 2 weeks (Go version never got more than 65%)
  • Mature libraries meant websockets ran better than before

19 of 28

Memory Use

  • Competitive with Go or better (especially if we count goroutine leaks)
  • M:N schedulers have their own, unpredictable, cost
  • Async model provides more predictable memory use (goroutine killing/spawning is frequently outside your control)

20 of 28

Performance

  • Go is generally 50-100x faster than CPython
  • PyPy is 5-25x faster than CPython
  • Being within 2-10x of Go is much more acceptable than 50-100x
  • Go SSL is implemented in pure-Go (very CPU intensive, latest Go improves this, Cloudflare has unmergeable Intel SSL assembly code with huge improvements)
  • PyPy is a game-changer

21 of 28

Concise, Debuggable Code

  • Small code-base means its easy to read relevant portions quickly
  • Tracebacks are captured in full, with local variables, in production (thanks Sentry!)
  • Async code has no global locks, no channel blocks (callback chains still aren’t fun)

22 of 28

Gotcha’s and Lessons Learned

  • RAM use of Python objects is rarely benchmarked
    • Wrote custom nose extension to measure memory use in integration tests
    • Blog postings on memory profiling are all outdated, so are tools….
  • Be aware of underlying implementation issues
    • Python objects are dicts. twisted class objects have lots of attributes, avoid adding more or you can double memory use per instance (connection)
    • twisted buffers are independent of kernel tcp buffers. know where data is to avoid excess per-conn state
      • In Go, we block instead, requiring more goroutines for similar functionality

23 of 28

Gotcha’s and Lessons Learned

  • Revisit Underlying assumptions
    • Just cause someone else blogged it, doesn’t mean it’s true
    • Don’t assume other smart people didn’t make a mistake, or their situation applies to you
    • Used Go for Push partly due to Urban Airship blog postings about Python not handling enough connections efficiently enough
    • PyPy changes the game
    • CPython itself has improved a lot over the years
    • We actually didn’t end up being CPU-bound, but memory-bound
      • PyPy makes a big difference in cutting memory use

24 of 28

Gotcha’s and Lessons Learned

  • SSL is Very Expensive
    • 20kb is minimum SSL footprint per connection
    • Spikes to 36kb during use for 16kb recv/send buffers
    • Google has 10kb SSL memory secret
      • Bonus points if you can get them to admit how they did it!

25 of 28

Next?

26 of 28

Concluding thoughts….

  • CPython handles our use-case fine
  • PyPy lets us go where we need to
  • Go is not a big enough improvement over PyPy for the costs it imposes
  • Obviously it’s time for Rust

27 of 28

Thank you!

28 of 28

Questions?