1 of 33

Eric Burnett, Oct 2019

One Minute Presubmits

London Build Meetup Talk

Proprietary + Confidential

2 of 33

The Space

Proprietary + Confidential

Proprietary + Confidential

3 of 33

Presubmits...

Proprietary + Confidential

Proprietary + Confidential

4 of 33

Goal:

60 seconds

Proprietary + Confidential

Proprietary + Confidential

5 of 33

Latency vs Happiness

Proprietary + Confidential

Proprietary + Confidential

6 of 33

Latency vs Happiness

Proprietary + Confidential

Proprietary + Confidential

7 of 33

The Problem

Proprietary + Confidential

Proprietary + Confidential

8 of 33

Builder-centric world

Proprietary + Confidential

Proprietary + Confidential

9 of 33

Storage-centric world

Proprietary + Confidential

Proprietary + Confidential

10 of 33

Digression: Directories

Logically, a nested mapping from

path -> [attributes, contents]

Does not have to be on a single disk anywhere.

Could be a flat ‘manifest’:

/src/lib/a/a.txt -> [775, “aabbccdd1122”]

/src/lib/a/b.txt -> [775, “12341234aabb”]

Or a merkle tree: ------------------------------------------------------->

Proprietary + Confidential

Proprietary + Confidential

11 of 33

Remote Execution principles

Move work off-host

Reuse prior results

Copy as little as possible

Work moved off-host can be parallelized onto the right number of right-sized workers.

Small, well-defined units of work are cacheable, and cached results don’t need to be executed again.

Data is only needed in two places: where it’s produced, and where it’s used. Every intermediate copy is unnecessary overhead. And file data is cacheable too!

Proprietary + Confidential

Proprietary + Confidential

12 of 33

Structure of a “Build”

Source Fetch

  • Fetching the main repository

Dependency Fetch

  • Fetching dependencies
    • Repositories and binary artifacts both

Transforms

  • Transforming the source code before building
    • Applying patches, generating build files, injecting constants, leak checks, include checks, ...

Build/Test

  • Running the compile and/or tests

Results

  • Uploading artifacts for persistence

Proprietary + Confidential

Proprietary + Confidential

13 of 33

Chart

  1. Charts are generated from data in Google Sheets. To create a version of this chart with your data, click on the chart, and in the upper right hand corner, click the drop down arrow and select “Open source”����
  2. In Google Sheets, select “File > make a copy.” In the copy, edit data as needed.
  3. In Google Slides, click “Insert > Chart > From Sheets” and scale and align the new chart to the existing version. Move the new chart to the background by selecting “Arrange > Order > Send to back” and delete the original.

89%

Build / Test

7%

Dependency Fetch

1%

Source Fetch

0.5%

Results

Presubmit time, before Remote Execution

3%

Transforms

Proprietary + Confidential

Proprietary + Confidential

14 of 33

Chart

  • Charts are generated from data in Google Sheets. To create a version of this chart with your data, click on the chart, and in the upper right hand corner, click the drop down arrow and select “Open source”����
  • In Google Sheets, select “File > make a copy.” In the copy, edit data as needed.
  • In Google Slides, click “Insert > Chart > From Sheets” and scale and align the new chart to the existing version. Move the new chart to the background by selecting “Arrange > Order > Send to back” and delete the original.

Presubmit time, with Remote Execution

12%

Build / Test

54%

Dependency Fetch

6%

Source Fetch

4%

Results

24%

Transforms

Proprietary + Confidential

Proprietary + Confidential

15 of 33

Chart

  • Charts are generated from data in Google Sheets. To create a version of this chart with your data, click on the chart, and in the upper right hand corner, click the drop down arrow and select “Open source”����
  • In Google Sheets, select “File > make a copy.” In the copy, edit data as needed.
  • In Google Slides, click “Insert > Chart > From Sheets” and scale and align the new chart to the existing version. Move the new chart to the background by selecting “Arrange > Order > Send to back” and delete the original.

Builder utilization

12%

Build / Test

54%

Dependency Fetch

6%

Source Fetch

4%

Results

24%

Transforms

CPU

Network

Disk

Memory

Proprietary + Confidential

Proprietary + Confidential

16 of 33

The Solution

Proprietary + Confidential

Proprietary + Confidential

17 of 33

Move everything off-host

Proprietary + Confidential

Proprietary + Confidential

Proprietary + Confidential

18 of 33

New world

Proprietary + Confidential

Proprietary + Confidential

19 of 33

Storage-centric principles

Operate on metadata

Cacheable operations

Minimal builder

Where possible, remove the data and operate only on metadata. Data can be fetched from storage where/when it’s needed.

Avoid executing the same process on the same data twice. Carve up data operations into small enough chunks that cache hits are common.

Prefer to do heavy lifting in trusted services, and use builders to orchestrate - execute scripts, small tools and remote builds. Individual builders should not need significant resources, and lost builders should not be a significant setback.

Proprietary + Confidential

Proprietary + Confidential

20 of 33

Build phases - goals

Source Fetch

  • Make the inputs of the build available
    • Need metadata, plus contents of a minority of files

Dependency Fetch

  • Resolve and make available the dependencies to the build
    • Similar to ‘source fetch’; mostly static build-over-build

Transforms

  • A combination of source analysis and source-to-source rewrites
    • Acting on a minority of files and/or highly cacheable

Build/Test

  • What we’re actually trying to accomplish!
    • Only build configuration is required locally, plus metadata

Results

  • Persisting artifacts so they’re available later
    • If files are already remote, handles are O(1) to save

Proprietary + Confidential

Proprietary + Confidential

21 of 33

Source Fetch

Build a virtual directory - merkle tree or manifest - of the repository at the desired ref.

New data copied into storage (not shown)

Reference the virtual directory for downstream work.

Proprietary + Confidential

Proprietary + Confidential

22 of 33

Dependency Fetch

Build a virtual directory of the each dependency at the desired ref.

At build time, stitch these together with the top-level source into a single virtual directory containing all input files at the appropriate locations.

Proprietary + Confidential

Proprietary + Confidential

23 of 33

Transforms

Strategically rewrite the virtual directory as needed.

Applying a patch: point additions/removals/ replacements of a few specific files.

Applying a transform: shard the work and apply it recursively, caching based on

<tool, rule, blob, path?> -> blob

or subtrees of the same.

Proprietary + Confidential

Proprietary + Confidential

24 of 33

Build/Test

Execute the build tool on top of the virtual directory.

The build tool must be virtualization-aware: either take the virtual directory as input, or run on a FUSE filesystem with logic to get metadata (digests) from it and to insert remote outputs into it.

Proprietary + Confidential

Proprietary + Confidential

25 of 33

Results

Persist handles to remote files instead of the file contents themselves.

May also persist whole directories, if desired: e.g. state of the file tree after every step and at the end of the build.

Optionally, extend the lifetime/durability of these remote files for persistence.

Proprietary + Confidential

Proprietary + Confidential

26 of 33

New world

Proprietary + Confidential

Proprietary + Confidential

27 of 33

Chart

  • Charts are generated from data in Google Sheets. To create a version of this chart with your data, click on the chart, and in the upper right hand corner, click the drop down arrow and select “Open source”����
  • In Google Sheets, select “File > make a copy.” In the copy, edit data as needed.
  • In Google Slides, click “Insert > Chart > From Sheets” and scale and align the new chart to the existing version. Move the new chart to the background by selecting “Arrange > Order > Send to back” and delete the original.

75%

Build / Test

6%

Dependency Fetch

6%

Source Fetch

1%

Results

Presubmit time, after remoting Everything

11%

Transforms

Proprietary + Confidential

Proprietary + Confidential

28 of 33

Additional benefits

  • Reproducibility
    • Recreate any state; re-run single process from exact same starting point
    • Cheaply iterate on any single step
  • Auditability
    • “Free” persistence of whole input or output directory for debugging; auditing inputs; tracking provenance
    • Diff directories across builds, steps.
  • Flexibility
    • New, resource-intensive transforms can cheaply be added, so long as they’re cacheable.

Proprietary + Confidential

Proprietary + Confidential

29 of 33

Requirements

1

Durable content-addressed and key-value storage options

2

‘Syncer’ to fetch repos, transform into virtual directories, and populate repo@ref -> root map

3

Logic (service or builder-side) for resolving dependency tree and stitching the relevant virtual directories together for a build

4

Logic (service or builder-side) for applying any necessary source->source transforms

5

Virtual-directory-aware build tool

Proprietary + Confidential

Proprietary + Confidential

30 of 33

Deployment

1

Start at the repo: remove successively more tasks from builder, but still copy all files down for legacy phases

OR

2

Start at the build: virtualize fetches, virtualize outputs, virtualize inputs. Expand to cover pre-build transforms, and push upwards.

Proprietary + Confidential

Proprietary + Confidential

31 of 33

Current Work

Proprietary + Confidential

Proprietary + Confidential

32 of 33

Thank You

Proprietary + Confidential

33 of 33

Additional Content

Proprietary + Confidential

Proprietary + Confidential