Jumbo

A unity build system for Chromium

Daniel Bratell

0:00

Welcome to this presentation of Jumbo, the unity build system for Chromium. My name is Daniel Bratell and I will guide you through the background, implementation and results.

Feel free to ask questions if something is unclear but this might be a little tight on time so please wait with bigger topics until after the last slide.

TL;DR

Jumbo: 3 times faster* builds now and possibly 9 times faster builds in the future.

*) Terms and conditions apply. Numbers for a full chrome+content_shell+blink_tests build on a 4 core/8 thread machine.

0:45

The problem

The compilation time for Chromium is long

Very long

Extremely long

Hours.

1:30 Delays releases
Slows down developers
Scares away contributors
Slows down continuous integration
Consumes machine time

How long is long?

36 CPU hours for chrome+content_shell+blink_tests

More for "all"

CPU as defined by average logical CPU in an Intel i7-4790K at 4.0Ghz running full load with hyper threading enabled

What is not long? 1 minute? 5 minutes? 10 minutes? 20 minutes?

xkcd 303

https://www.xkcd.com/303/

(by Randall Munroe)

2:55

1:30 Delays releases
Slows down developers
Scares away contributors
Slows down continuous integration
Consumes machine time

Compile time evolution

In five years

34 min → 4 hours

+1% per week

3:30 When did this happen? Was it always this bad? This is a graph of the uncached build time on one of our build machines that has been in service for long enough to give us a history.

No distinct regression. It just happened. Zooming in anywhere on the graph gives you a lot of small increases related to small changes.

Will newer hardware solve it? (Betteridge's law of headlines)

Hardware does not keep up

5:30 But maybe this is not a problem? Everything in software always gets bigger and more complex and we have always done fine, so why would we not be able to handle this? Good question. Is the solution that contributors upgrade their hardware? Sadly no. If we look at how the top consumer systems have evolved, they have become faster, and recent introductions of 50% more cores has helped, but it has not kept up at all. Hardware of equivalent cost has doubled in speed, the amount of work has increased by a factor of seven.

Time vs repo size evolution

One reason is the amount of source code, but plotting the number of lines of text in the repo along compilation times just shows a correlation, no exact match.

The blip at the end is third_party/deqp (5 million lines of tests for something). In 5 years the amount of code has roughly doubled, not increased by a factor of 7.

Another factor, that in might explain the quick compile time growth the last few years, is that Chromium jumped from the C++ of the 20th century to C+11. C++11 has features that make more heavy use of templated libraries which cost more per line of code to compile.

It can also be toolchain changes. Most of my investigation has been on Linux with clang which means that the toolchain has also evolved gradually.

There might also be more generated code, such as mojo, which is not included in the measurements.

Time consumers

  • Code generation
  • Compiling
  • Linking

Can use the ninja log (out/something/.ninja_log)

No file needs more than 0.04% of the total time but:
There are 44,000 .cc files and 43,000 .cpp files in Chromium (including generated code).

6:30 You can analyze this pretty well by studying the ninja log. It contains timestamps for every operation performed by ninja. While it will not tell you *why* something is slow, it will tell you what is slow.

The main conclusion is that in a full build, there is normally no single operation that takes a long time, but there are many operations. If you look in the Chromium tree there are many tens of thousands of C++ source files, most of them small.

Example 134 lines components/viz/services/display/color_lut_cache.cc

Example 78 lines: third_party/angle/third_party/spirv-tools/src/source/opt/build_module.cpp

4841 cpp files in Blink, median length 110 lines. Example third_party/WebKit/Source/core/dom/NodeRareData.cpp

Files are small

Median length .cc files is 134 lines
Median length .cpp files is 78 lines [pre-Blink move].

Preprocess a Blink file of length 110 lines and you get 244,000 lines of source code.

8:00 Example 134 lines components/viz/services/display/color_lut_cache.cc

Example 78 lines: third_party/angle/third_party/spirv-tools/src/source/opt/build_module.cpp

4841 cpp files in Blink, median length 110 lines. Example third_party/WebKit/Source/core/dom/NodeRareData.cpp

Per file

In lines the source file is nothing!

In compile time almost nothing

8:50 Example third_party/WebKit/Source/core/dom/NodeRareData.cpp

Precompiled headers

Been used for system headers for a long time

2015: Blink got more comprehensive precompiled headers

Saves 10-20% of the compile time

9:40 Yay. Except that 10-20% is not enough.

Effect precompiled headers

10:15 We got an improvement when we added good precompiled headers in 2015-2016, but as you can see it was not a long term solution.

So if something doesn't work you bring a bigger hammer, right. The bigger hammer in this case is unity builds.

Unity builds

In Chromium: Jumbo builds

  • Compile a lot of code in a single translation unit
  • Common in large projects and the games industry
  • Long used for blink's v8 bindings, and by third_party/sqlite
  • Used in WebKit since February

JUMBO BOX #1

File1.cc File5.cc

File2.cc File6.cc

File3.cc File7.cc

File4.cc File8.cc

10:50

The idea is that you join a lot of C++ files and merge them together. This cuts down on the overhead of compiling every small file on itself. There are some that say this works because it reduces I/O and that is not true. Builds are cpu bound and it works because the cpu has less to do.

Jumbo

Is it an elephant?

Is it a dutch supermarket?

Is it a unity build system?

11:00 The name jumbo comes from an older unity build system. Just calling them unity builds unfortunately turned out to confuse people because of existing associations to the Unity 3D engine.

If you have heard rumors of someone adding a game 3d engine to Chromium, that might be our fault.

Jumbo in BUILD.gn

source_set("my_code") {
sources = ["file1.cc", "file2.cc", "file3.cc"]
}

import("//build/config/jumbo.gni")
jumbo_
source_set("my_code") {
sources = [ "file1.cc", "file2.cc", "file3.cc"];
}

11:30 Currently jumbo is implemented through a gn template. To add jumbo support to a build target, you import the jumbo template file and add the prefix "jumbo_" to the target type. This is available for source_sets, static_libraries and components. There are also some other existing templates that have jumbo support, (core_sources, mojo, ….)

Jumbo in the file system

Template action generates gen/…/my_code_jumbo_1.cc:

#include "../../chrome/my_code/file1.cc"
#include "../../chrome/my_code/file2.cc"
#include "../../chrome/my_code/file3.cc"
#include "../../chrome/my_code/otherfile.cc"

my_code_jumbo_1.cc compiled as usual

12:20 Template creates an action to create jumbo files and sends it all to gn and ninja
Ninja executes the action and compiles the generated source file with the same flags as it would have compiled the original source files.

Jumbo

90% of the lines are still from headers

… compiling the source code is now half the time!

13:10 Taking the earlier example with NodeRareData, if we join it up with all the 130 files in core/dom, we get a chart like this. Most of the lines are still headers, but now only half the time is spent on headers, and many of those headers will only be compiled once so they barely count as headers.

Compare this with what we had before.

Example third_party/WebKit/Source/core/dom jumbo, 130 files, 23,8 seconds (up from 4,9s): 473k lines (up from 244k lines)

(possibly ~5% error in numbers due to reading graphs rather than having the hard numbers)

Per file

In lines the source file is nothing!

In compile time almost nothing

13:50 Barely any of the time spent was on the C++ code we wanted to compile.

Jumbo

90% of the lines is still from headers

… compiling the source code is now half the time!

14:00 Now it's half. Much less overhead.

Example third_party/WebKit/Source/core/dom jumbo, 130 files, 23,8 seconds: 473k lines

(possible ~5% error in numbers due to reading graphs rather than having the hard numbers)

Metaphor

If you have a lot of cargo, some tools are more efficient

14:25 I'd like to use some analogies here. There is a reason mining companies don't transport ore in Ford pickup trucks.

Sources: https://en.wikipedia.org/wiki/Traffic . Interstate 80, seen here in Berkeley, California, is a freeway with many lanes and heavy traffic.

https://upload.wikimedia.org/wikipedia/commons/5/5c/CamionFermont.png Truck 172 from Mont-Wright mine, on display in Fermont, QC, Canada. This truck had the world record for number of hours of service in 2006.

Metaphor

If you have a lot of cargo, some tools are more efficient

14:45 And there is a reason that when you want to transport a lot of goods a long distance, you get a big boat.

Jumbo

Compiling:

chrome+

content_shell+

blink_tests

4 core/8 thread

15:00 But Chromium is not a container or a piece of mined ore. Still, the same approach with "bigger is better" works here as well for the same reasons. We get less overhead.

This graph describes how jumbo's progress on the reference hardware. There are a few things that I want to point out here here.

First, jumbo works, and it has more and more effect as we have added support to more parts of the code. Now blink, content, cc, v8, ui, base and pdfium support jumbo builds. And some pieces I've surely forgotten.

Second, jumbo builds are barely affected by the general slowing down on non-jumbo builds.

Third, we are still far from the full potential of jumbo builds. From testing we know we can get down to 40-45 minutes, and that still leaves enough non-converted that I think we can reach 30 minute full builds on a 4 core machine.

Not included here is all the tests outside blink. They are a substantial part of the compile time but I have no data, and we have also not added any jumbo support to those.

4.8h to 1.6h - 2018-02-14 Opera

Landed: 2174 CPU minutes default, 1614 of those converted to jumbo. Jumbo pieces compile in 180 minutes (9.0x faster). Remaining 560 CPU minutes compile at normal speed. Total: 740 CPU minutes

WIP: 2163 CPU minutes default, 1948 of those converted to jumbo. Jumbo pieces compile in 244 minutes (8.0x faster). Remaining 215 CPU minutes compile at normal speed. Total: 459 CPU minutes

Potential: ~2150 CPU minutes default, 6-9x faster? ~350 CPU minutes (2.5x faster than today)

Less speedup of remaining code because targets are smaller and the fewer files in a target, the less jumbo does.

Jumbo

Compiling:

chrome+

content_shell+

blink_tests

On a 4 core/8 thread computer

16:35 We more or less reached the milestone of jumbo, on this hardware, compiling three times as fast. Meanwhile we have also seen the potential increase because a larger and larger fraction of the time seems spent compiling the same headers over and over again

Not always faster

  • Always more efficient
    • Saves roughly 150 Wh per full build
  • Less parallel
    • Concern if >100-1000 cores
      • Longer executing bottleneck tasks
      • Idling waiting for dependencies
    • Not an issue for "normal" hardware
  • Tunable
    • Currently tuned for ~10-20 cores or ~100 cores if goma

17:30 There are some caveats though. Jumbo builds will not make builds faster for everyone. They will be more efficient, but they are less parallel so if you have many cores you can't always take advantage of all of them. Jumbo is tunable so we can decide if we want jumbo units with 2 cc files in each, or a thousand files in each. Currently we support two configurations, one for ~10-20 cores, one for ~100 cores.

In particular, there is no configuration tuned for the most extreme distributed compilation networks. If we return to the previous metaphor,

Metaphor

If you have a lot of cargo, some tools are more efficient

17:45 Once you have built a 1000 lane highway and bought a million small cars, it may actually have higher capacity than your fleet of mining trucks.

Sources: https://en.wikipedia.org/wiki/Traffic . Interstate 80, seen here in Berkeley, California, is a freeway with many lanes and heavy traffic.

https://upload.wikimedia.org/wikipedia/commons/5/5c/CamionFermont.png Truck 172 from Mont-Wright mine, on display in Fermont, QC, Canada. This truck had the world record for number of hours of service in 2006.

Not always faster

  • Always more efficient
    • Saves roughly 150 Wh per full build
  • Less parallel
    • Concern if >100-1000 cores
      • Longer executing bottleneck tasks
      • Idling waiting for dependencies
    • Not an issue for "normal" hardware
  • Tunable
    • Currently tuned for ~10-20 cores or ~100 cores if goma

18:00

Good things

Main target

  • Average compilation time per file reduced by 80-95%

Accidental positive effects

  • Less compiler output (debug component build tree 26 GB → 14 GB)
  • Faster linking (due to less data to process)
  • Cheap "global" optimization (per jumbo unit)

18:45 That said, jumbo was intended to help those without that kind of system. And it does.

The main target is to reduce compilation times per file, and in Chromium it achieves a reduction by 80-95%, or 5 to 20 times faster.

There are also other more accidental positive effects. It turns out jumbo builds use less disk, which is nice.

(jumbo, debug component 14GB, jumbo debug static, 23GB, non-jumbo debug component 26 GB, non-jumbo debug static xxx GB)

They also link faster, probably because there is less data for the linker to process.

Jumbo builds also creates faster binaries. When you show the compiler more source code, it makes better decisions. The result is a 1-2% improvement on speedometer. Probably less than full program optimization but it's a nice accident.

More positive side effects

  • Duplicate code removal
  • Dead code removal + dead member removal
  • Addition of include guards
  • Solved the X11 header problem (no more #undef None)
  • Namespace renamings, no more fewer ::-prefixes
  • Easier to find symbols when their names are more distinct than content::{anonymous namespace}::kValue

20:00 Removal of dead code from the build system because obj files no longer containing 100% dead code resulted in jumbo build errors.

Complications

  • With jumbo: "1 cc file" ≠ "1 translation unit"
    • local names become less local.

To a lesser degree:

  • More code is exposed to system headers
  • Stress testing tools ("objcopy O(n2) in number of sections")
  • Worse IDE support in gn
  • Increased single file rebuild times (unless linking makes up for it)

22:00 So the big question is then why we haven't done this to the whole tree. The answer is that there are complications. The main complication is that symbols in files collide. Symbols intended to be local become visible to neighbouring files which may be using the same name for something else. Or a duplicate copy of the same thing. That has to be resolved in some way. I think this is normally trivial for someone that knows the code, but it can be hard for someone not familiar with the code to know how to resolve the clashes.

There also some smaller complications but nothing that really holds us back.

Error messages are uglier (adding "../../" to the paths)

Some long file names (from 138 to 148 chars; 148 > 142 == encryptfs_limit)

Infrastructure

3 fyi bots

Mail a selected few when one of them break (sometimes twice a day, sometimes once in a week)

CQ support coming any day/week

23:15 Linux (large chunks), Windows, Mac

To do

CQ support
Star crbug.com/782863 for updates (unless it's already fixed)

Add support to the rest of Chromium (after CQ)
Tests, components, services, third_party

Native gn support
For better IDE support
PoC/patch by Tomasz Moniuszko exists
Star crbug.com/772918 for updates

24:00

Assistance by clang (PoC covered in lightning talk)

Create sandboxes for each file and prevent some of the problems

PoC: Add pragma that hides/disables current anonymous namespaces. Can not be developed purely as a plugin.

Doesn't solve all problems but prevents the more annoying ones

Time frame: Uncertain, months to infinity

In lightning talk

Deprecating other systems

  • split_static_library
    • Needed because libs > 2 GB. Not the case in jumbo builds
  • component builds
    • Solving long link times. Linking is (a bit) faster in jumbo
  • Putting a lot of source in one file
    • v8 has not split up its code because of compile times. Faster in jumbo?

Jumbo alternatives

Distributed compilation (icecc, distcc, an open and independent goma)
Can sometimes be combined with jumbo for win-win

Faster compilers
clang can be 20% faster if compiled with other flags
compilers that cache state
C++ modules?
Can be combined with jumbo for win-win

Another implementation language than C++?

Distributed compilation

Missing good alternatives for Windows and requires a farm of computers

Precompiled headers

20-30% impact on normal build, but actually slower with jumbo since it adds work but saves no, or little, time

Open source goma instead of clang

goma client open source while server still proprietary

If depends on Google's infrastructure
Intellectual property
Price
Dependence
"Privacy"
Else
Access to hardware (works for large companies but not individuals and small companies)
Maintenance

Combine both!

The devil is in the details

How to make it faster?

Less code
No

Precompiled headers
Saves only 10-20%

Faster compilers
No silver bullet, maybe 20%

Faster hardware
Not fast enough

Distributed compilation
Really helpful if …
Massive hardware available
Distribution systems work
Can save 90+% of the time

Unity builds
Requires code changes
Can save 90% of the time

Questions

"If you are wondering something, there are probably many others that would like to know the answer as well."

/ Me - right now

25:00 Unless we've run out of time

Thanks!

Daniel Bratell @ Opera
Mostyn Bramley-Moore @ Vewd

Bruce Dawson @ Google
Dirk Pranke @ Google

(Windows/goma)
(infrastructure)

Tomasz Moniuszko @ Opera
Jens Widell @ Opera

(native gn, IDE support)
(clang support)

And many, many others (haraken, thakis, the_stig, pdr, fs, avi, kinuko, sky, sadrul, ...) who have tested, experimented, reviewed and added jumbo support to code

Lesson to self: You should not start to name those you want to thank when they are too many to name

Jumbo Presentation BlinkOn9 - Google Slides