Go 1.4+ Garbage Collection (GC) Plan and Roadmap

Richard L. Hudson August 6, 2014

golang.org/s/go14gc

Discussion here: https://groups.google.com/d/msg/golang-dev/pIuOcqAlvKU/C0wooVzXLZwJ

The goal of the Go 1.5 (June 2015) Garbage Collector (GC) is to reduce GC latency, making Go acceptable for implementing a broad spectrum of systems requiring low response times.  Quantitatively this means that for adequately provisioned machines limiting GC latency to less than 10 milliseconds (10ms), with mutator (Go application code) availability of more than 40 ms out of every 50 ms. Hardware provisioning should allow for in-memory heap sizes twice as large as reachable memory and 25% of CPU cycles, typically one out of 4 hardware threads, available for GC tasks. These goals align with a future with ever-increasing numbers of hardware threads and an ever-increasing amount of memory, all within an ever-decreasing power and expense budget.

Tactically, we will use a hybrid stop the world (STW) / concurrent garbage collector (CGC). The STW piece will limit the amount of time goroutines are stopped to less than 10 milliseconds out of every 50 milliseconds. If the GC completes a cycle in this time frame, great. If not, the GC will transition into a concurrent GC for the remainder of the 50 millisecond block. This process will repeat until the GC cycle completes. As a practical matter if one has a 50 millisecond response quality of service (QOS) requirement one should expect to have 40 milliseconds in which to do mutator tasks. These numbers assume hardware equivalent to a generic $1000 desktop box running Linux.

To meet these goals we are designing a GC where the majority of the concurrent GC work is done on one or more dedicated CPUs while the mutators run on the remaining CPUs. The mutators will be periodically asked to perform GC work proportional to the size of their stack, the amount of memory they are allocating, and the amount of pointer mutation being performed. Any work the concurrent GC asks of the mutators will require that that mutator be paused for less than 1 millisecond and that other mutators not be blocked by the GC work the mutator is running.

In the sometimes redundant jargon of GC what we propose for 1.5 is a non-generational, non-moving, concurrent, tri-color, mark and sweep collector. Reference [1] provides an excellent overview. The mark phase establishes which objects are reachable. The sweep phase establishes which objects are unreachable. Both will be able to run concurrently with the mutators.  Of course achieving low latency will impact the throughput of the mutators. In particular the mutators will have to monitor whether the mark phase is active and if so perform some write barrier work when mutating pointers. Furthermore each mutator will occasionally be asked to suspend and scan its stack. Similar to what we do today, mutator allocation may require the mutator doing sweep work on behalf of the GC. We currently do not have clear visibility into how much 1.5 throughput will be affected but future hardware will have more cores so dedicating one or more to GC should not be too painful.      

The 1.6 (December 2015) design goals will be driven by use cases, feedback from the community, and our experience with 1.5. While other goals may intervene, 1.6 will most likely be used to improve throughput by adding bump pointer allocation as well as a generational copy collector for nursery (new) spaces. The mature (old) space will be managed using our concurrent GC.  The nurseries will be sized so that they can be collected in 10 milliseconds. This pause will be within the range of the 1.5 collector which will ensure a continuity of runtime environments for Go application as we move forward. The decision to prioritize concurrency in 1.5 over throughput in 1.6 was a conscious choice intended to accelerate support for new users.

It's important that the reduced latency planned for release 1.5 does not cause unexpected slowdown. The new algorithm does affect throughput, though. We therefore plan to offset the independent performance improvements scheduled for the 1.4 release by pulling into 1.4 some of the overhead that would otherwise be introduced in 1.5. The idea is that performance should monotonically improve across releases. Without pulling some of this work into 1.4, 1.4 would speed up and 1.5 would slow down again, which would be surprising.  The work being pulled in includes logic for starting and stopping Go routines individually and adding a write barrier infrastructure suitable not only for concurrent GC but also for the potential 1.6 generational copying collector. The 1.4 release will also remove C runtime code that uses Go pointers and various unsafe pointer constructs that are not compatible with either the concurrent or copying collectors. Go users will need to remove incompatible constructs from their code also. Exactly what this will entail is being worked out, and go vet will help identify them. The 1.5 goal is to minimize net throughput performance degradation as we dramatically reduce GC latency. It is a challenge to do all of this in the 1.5 timeframe but after 1.5 we expect to see ever-increasing performance improvements not only from runtime improvements but also from being able to leverage increased core counts and increased memory availability without having to modify any application code.

Our hope is that the Go community understands and supports what the runtime team is trying to accomplish in 1.4, 1.5, and beyond. Such support is needed if we are to be successful in building an infrastructure that will support a shared vision of a more concurrent yet performant runtime going forward.

[1] The Garbage Collection Handbook: The Art of Automatic Memory Management (Chapman & Hall/CRC Applied Algorithms and Data Structures series) R. Jones, A. Hosking, E. Moss