1 of 130

Bazel workshop

2 of 130

Welcome!

Who are we?

Mark Karpov and Johan Herland, Principal Consultants
Scalable Builds Group at Tweag
Tweag is part of Modus Create

3 of 130

Plan

6-hour sessions over 2 days
Each session is 3h + 3h
Presentation and exercises

A whirlwind tour of:

Bazel concepts, usage and configuration
How to set yourself up for success with Bazel

4 of 130

Expectations

We expect that you:

Have some experience with using build systems

Bonus points if you have experience maintaining one

Are familiar with the development process

How to navigate a source code project
Edit text files
Running programs from a command line

No need to be an expert, but familiar with a language or two is good.
Exercises:

Based on a simple project written in C/C++ and shell scripts
But we’re not diving into the code itself, only the surrounding build instructions
Run inside a devcontainer (in VS Code, CDW, or any machine that can run linux/amd64 containers)

Ask questions! We are here for you, not the other way around.

5 of 130

Outline

Intro to Bazel

Intro and basic concepts�✦ Exercise #1: The Cookie Machine
Bazel concepts deep dive�✦ Exercise #2: Write a macro�

Intro to Bazel, continued

Toolchains, platforms, selects
Sandbox, Remote cache + execution
Bazel rule sets�✦ Exercise #3: Import and build a� 3rd-party dependency�✦ Exercise #4: Hermetic toolchain

A bigger picture

Bazel and CI
Organizational points
Extra topics�✦ Exercise #5: Consume asio hermetically
Wrap up

6 of 130

1.1�Intro and basic concepts

7 of 130

When to use Bazel?

{ Fast, Correct } — Choose two
Polyglot
Cross-platform
Remote execution
Extensible

8 of 130

When not to use Bazel?

Bazel is complex!
Probably unnecessarily complex, if you have:

Only one language
Already a working build system
No need to scale

Small project, small team

Pure Windows projects

9 of 130

Migration costs

Small: Rewrite old build rules into Bazel
Big: Setting up remote caching and execution infrastructure for Bazel
Bigger: Make build hermetic + reproducible ⇒ enable scaling

Without this, the benefit from remote cache + execution is severely limited

Biggest?: Help developers to become familiar and productive with Bazel

We have often seen people approach a Bazel migration as largely a two-step process:

(1) They rewrite their old build rules into Bazel, and (2) they invest in some infrastructure for remote caching and execution, and then they expect their builds to magically become faster.

Unfortunately things are not so simple. In order for Bazel to benefit from caching and remote execution, the build needs to be hermetic and reproducible.

The work required to achieve an hermetic and reproducible build is often underestimated. It takes time to understand how the various parts of the build fit together, and in order to achieve reproducibility some of these parts will sometimes need to be deeply restructured.

Finally, an often overlooked part of this is helping the users of the build system to become familiar and productive with it. The build system is one of the most important tools at a developer’s disposal, and Bazel is indeed very powerful when you know how to use it. However, learning to use it effectively is a process that takes time and effort. Again, not to be underestimated.

10 of 130

Modeling the build

11 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a directed acyclic graph (DAG)

12 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a directed acyclic graph (DAG)
One possible model:

Nodes: Artifacts, typically files
Edges: Actions, commands to produce artifacts

Source file

Executable

Compile action

main.c

main

gcc main.c -o main

Dockerfile

Docker image

docker build -t … .

13 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a directed acyclic graph (DAG)
More common model in build systems: Target graph or dependency graph

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

Source file

Executable

(Compile action)

depends on

Source file

Executable

Compile action

main.c

main

gcc main.c -o main

14 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a target graph (DAG)

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

When to rebuild? When an input has changed.

Must know all inputs!

main.h

main.c

main

gcc main.c -o main

#include “main.h”

main.h

These dependency edges are really important because they allow us to determine when something needs to be rebuilt. We follow all the edges from an output back to its inputs, and if any of the inputs have changed, then we know that the output will need to be rebuilt.

But in order for this to work, we need to know _all_ the inputs.

***

Let’s assume that our “main.c” files includes a header file called “main.h”. What happens if we don’t register main.h as an input to the compilation step?

main.h is an uncaptured input: It is something outside our captured inputs that still affects our build result

***

Worse: since this dependency is not captured, a change in main.h will NOT trigger the main executable to be rebuilt. This leads to an incorrect build result

***

The solution is, of course, for us to properly declare “main.h” as an additional dependency for the “main” executable.

15 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a target graph (DAG)

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

When to rebuild? When an input has changed.

Must know all inputs!

Pure and hermetic:

Pure function: same input ⇒ same output
“Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted

main.c

main

gcc main.c -o main

main.h

#include “main.h”

GCC toolchain

16 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a target graph (DAG)

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

When to rebuild? When an input has changed.

Must know all inputs!

Pure and hermetic:

Pure function: same input ⇒ same output
“Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted

Transitivity: Rebuilds spread across the target graph

From a changed input file to all output that ultimately depend on it.
Anything that does not change can be reused

17 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a target graph (DAG)

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

When to rebuild? When an input has changed.

Must know all inputs!

Pure and hermetic:

Pure function: same input ⇒ same output
“Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted

Transitivity: Rebuilds spread across the target graph
Reproducible: Same inputs, same rules ⇒ Same outputs!

For each target in the graph, but also for the target graph as a whole
No matter if we build today or tomorrow
No matter if we build on my machine or your machine, or in the cloud

18 of 130

Build system recap

Collection of commands to produce a certain output
Can be modeled as a target graph (DAG)

Visualize dependencies
Follow arrows from ultimate targets back to their initial constituents

When to rebuild? When an input has changed.

Must know all inputs!

Pure and hermetic:

Pure function: same input ⇒ same output
“Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted

Transitivity: Rebuilds spread across the target graph
Reproducible: Same inputs, same rules ⇒ Same outputs!

For each target in the graph, but also for the target graph as a whole
No matter if we build today or tomorrow
No matter if we build or my machine or your machine, or in the cloud

All of this enables:

Caching
Remote execution

⇒ Speed!

19 of 130

What breaks hermeticity + reproducibility?

Including any of this in the build output:

Timestamps (e.g. __TIME__, __DATE__, timestamps in tar/zip files)
Git revisions, current branch, etc.
Details from the system where the build runs

Files from outside the build tree
Absolute file paths, Usernames, Machine details

Non-deterministic output formats

Arbitrary ordering of data, reordered sections
Random data inclusions, non-zeroed padding

Fetching unchecked data from the network

OK if verified with a known checksum

So, I have hopefully shown to you that hermeticity and reproducibility is crucial for Bazel to live up to its motto of being both fast and correct.

It follows that when working with Bazel we must make sure to not break this reproducibility. But what does this mean in practice? What are the things that usually break this hermeticity?

Timestamps are a common source of inhermeticity, and there are many ways these can sneak into a build result.
The same goes for recording Git revisions, branch names, or similar variable artifacts
Another big topic is absolute paths, username or other irrelevant details from the machine where the build is performed.
Some output formats are also more difficult to work with, due to them either reordering data arbitrarily, or including padding that may contain random data.
Finally, if you have build steps that fetch unverified data from the network and includes it into the build output, then hermeticity is obviously broken.

Sometimes you cannot avoid some of these, and then there are some ways to limit the impact: Basically you want any non-determinism to be isolated to a very small part of the build graph, and in a way that does not cause expensive rebuilds.

20 of 130

What happens when we break hermeticity?

How to keep a Bazel project hermetic:

https://www.tweag.io/blog/2022-09-15-hermetic-bazel/

Works on my machine, but not on yours
Works locally, but not in remote execution
Slower builds, some parts “always” need to be rebuilt
Build results are always different ⇒ build cache is never reused

21 of 130

First steps to using Bazel

22 of 130

Bazel concepts

Targets: Something that can be built

Defined in BUILD or BUILD.bazel files throughout the source tree

Rules: The instructions for how to build it

Comes with Bazel,
or with third-party rulesets,
or write your own in *.bzl files

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

Target

Rule

To build: bazel build //:main

main.c

main

gcc main.c -o main

Before we go into our first exercise, we will briefly introduce a few Bazel concepts that you will soon be familiar with:

In Bazel, a _target_ is anything that can be built. These are defined in BUILD or BUILD.bazel files throughout the source tree, and they describe your ultimate outputs, but also any intermediate outputs that are built along the way.

In order to build a target, it needs a corresponding _rule_, to tell Bazel exactly how to build that target.

***

Let’s go back to our very first example of a build: We simply want to compile the “main.c” source file into a “main” executable. Here is what this looks like in Bazel.

***

We use the “cc_binary” _rule_ …

***

… to define a _target_ called “main”,

***

In addition to naming the target, which also names the output file, we also list the input sources for this executable.

***

The cc_binary rule is part of a rule set for building C/C++ code called rules_cc. In order to use cc_binary, we must first _load_ it from the rules_cc rule set. This is very similar to an import statement in Python or other languages.

The cc_binary rule knows how to run the compiler, and how to declare both the “main.c” source file and the compiler itself as proper dependencies of the “main” target.

***

Finally, how do you actually tell Bazel to build this target? Assuming that this target is defined in a BUILD file at the root of source tree, you would run this command.

23 of 130

Bazel concepts

Targets: Something that can be built

Defined in BUILD files throughout the source tree

Rules: The instructions for how to build it

Comes with Bazel,
or with third-party rulesets,
or write your own in *.bzl files

Labels: How you refer to targets

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

To build: bazel build //:main

Label

Root of current repository

BUILD.bazel file at top level

main target

main.c

main

gcc main.c -o main

And this brings us to the next concept: The “slash slash colon main” on the command line is a _label_ that refers to the main target in the BUILD file.

***

A label is Bazel’s way of referring to targets, and you’ll be using labels on the Bazel command line to tell Bazel which target you mean, but you will also be using labels inside BUILD files to refer to other targets. For example when one target depends on another target, you will add a dependency into the first target, and the dependency will be a label that refers the second target.

***

We will go into the details of labels soon, but in short, the “slash slash” refers to the root of the current Bazel project (aka. repository, but not to be confused with Git repository), …

***

… and the “colon” separates the directory containing the BUILD file from the target name inside the BUILD file. Here, there is nothing between the “slash slash” and the “colon”, which means that the BUILD file is in the root directory of the current Bazel project, and after the “colon” we see “main” …

***

… which refers to the target named “main” in the BUILD file.

When Bazel sees this target on the “bazel build” command line, it figures out that it must use the cc_binary rule in order to build this main target, and from there it works out all the actions needed in order to actually produce the requested output, that is, the “main” executable.

24 of 130

Bazel concepts

Targets: Something that can be built

Defined in BUILD files throughout the source tree

Rules: The instructions for how to build it

Comes with Bazel,
or with third-party rulesets,
or write your own in *.bzl files

Labels: How you refer to targets
Bazel repositories:

Main repository: WORKSPACE file
Other repositories:

defined by repo rules in the WORKSPACE file
3rd-party rulesets
3rd-party code dependencies

# WORKSPACE

workspace(name = "cookie_machine")

load("@bazel_tools//…/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://…/rules_cc-0.0.10.tar.gz"],

sha256 = "65b…e49",

strip_prefix = "rules_cc-0.0.10",

)

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

Workspace declaration

Official docs: https://bazel.build/concepts/build-ref

Just a quick word about Bazel repositories.

This is how you can access third-party rule sets without copying all of the code into your source tree. It is also how you can import third-party code dependencies that you want to be part of the same build process.

The main repository is always the one in which you run your bazel commands. It is defined by a WORKSPACE file in the root directory of the source tree. This is always the starting point for Bazel.

***

Inside the WORKSPACE file you can also refer to other repositories. These can be compressed archives that are downloaded and unpacked by Bazel, they can be Git repositories that are cloned by Bazel, or they can even refer to other directories on your local machine.

***

In this WORKSPACE file, we first declare that this is a Bazel workspace – or main repository – called “cookie_machine”.

***

Next, we use a load statement to get access to the “http_archive” repository rule. Since we’re in a WORKSPACE file, this is a repository rule, and not a regular rule that you would find in a BUILD file. The “http_archive” repository rule is something that comes with Bazel, but we must still load it to get access to it.

***

Then, we use the http_archive repository rule to download the rules_cc 3rd-party ruleset. This is the rule set that contains the implementation of the cc_binary rule that we then load and use in our BUILD file.

25 of 130

Bazel concepts

Targets: Something that can be built

Defined in BUILD files throughout the source tree

Rules: The instructions for how to build it

Comes with Bazel,
or with third-party rulesets,
or write your own in *.bzl files

Labels: How you refer to targets
Bazel repositories:

Main repository: WORKSPACE file
Other repositories:

defined by repo rules in the WORKSPACE file
3rd-party rulesets
3rd-party code dependencies

Starlark language:

Inspired by Python, but simpler
Declarations in BUILD(.bazel) and WORKSPACE files
Macros and custom rule implementations in *.bzl files

# WORKSPACE

workspace(name = "cookie_machine")

load("@bazel_tools//…/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://…/rules_cc-0.0.10.tar.gz"],

sha256 = "65b…e49",

strip_prefix = "rules_cc-0.0.10",

)

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

Official docs: https://bazel.build/rules/language

26 of 130

Bazel’s phases

Loading phase: Construct the target graph

Read the WORKSPACE and BUILD files
Construct the target graph (for the requested target)
Load and evaluate necessary extensions (aka. “External repositories”)

Run the repository rules that are referenced in the WORKSPACE file

Analysis phase: Invoke the necessary rules to augment the target graph into an action graph

What operations need to be performed, and in which order?
Rule implementations:

Look at configured targets
Construct corresponding actions that will realize those targets

Execution phase:

Execute all necessary actions to produce the required target outputs
Run tests (if requested)

Tip: For visualizing the complete target + actions graph that Bazel builds, check out Skyscope:�https://github.com/tweag/skyscope and https://www.tweag.io/blog/2023-05-04-announcing-skyscope/

Official docs: https://bazel.build/extending/concepts#evaluation-model

Finally, we can now say a few words about how Bazel goes through the build process from start to finish.

We can roughly split it into three phases:

First, in the loading phase, Bazel builds the target graph for the requested target. To do this, it reads the relevant BUILD files to construct the target graph. It also reads the WORKSPACE file to understand what repositories might be involved in addition to the main repository. Any repositories that contain rules or targets that we need in order to build the target graph are loaded and evaluated by running their repository rules.

Repository rules are similar to regular build rules, but they run in _this_ phase with fewer restrictions, and their objective is to prepare the overall build environment, often by downloading and configuring necessary tools and resources.

Next, in the analysis phase, Bazel will invoke the rule implementations for the configured targets.This will examine the target graph and construct any actions that are needed in order to realize the target output.

This is where the _regular_ build rules are run. In our earlier example this means running the cc_binary rule implementation, which will construct the compile and link actions needed to turn the main.c source file into the main executable.

Note that the actions themselves are not run at this point, we’re only setting up what actions would need to be run if and when this target needs to be rebuilt.

Finally, in the execution phase, Bazel looks at any previously cached targets, and if these can be reused (i.e. if all the inputs are unchanged). For any target that needs to be built, it now executes the necessary actions in order to create the required output files.

27 of 130

Getting help

https://bazel.build

28 of 130

Questions?

29 of 130

Try it yourself!

Clone https://github.com/tweag/bazel-workshop-2024
Enter the devcontainer (with or without VS Code)

See the README.md file in the repo for details

Start playing around!

Look at WORKSPACE And BUILD.bazel files
bazel --version
bazel help
bazel query //...
bazel query "deps(//c-client:c-client)"
bazel build //c-client
bazel clean
bazel build //c-client --subcommands
bazel build //...
bazel test //...

30 of 130

Exercise #1 The Cookie Machine

What is this project about?

Look at WORKSPACE

A project called cookie machine, consisting of�a C client, a C++ server, and an integration test.
How many repositories are there in total?

There are BUILD.bazel files in three subdirectories, look at them:

What targets do they define?
What rules are we using?

Can Bazel show us the targets?

bazel query //...
bazel query //cpp-server/...

31 of 130

Exercise #1 List target dependencies

What are the dependencies of a target?

bazel query "deps(//c-client:c-client)"
Lists all the dependencies that the cc_binary rule has set up for this target.
Includes the main.c source file, whose label is //c-client:main.c
But also lots of stuff from:

The rules_cc ruleset (labels that start with @rules_cc//...)
The C compiler toolchain (labels that start with @local_config_cc//...)
Internal Bazel rules (labels that start with @bazel_tools//...)

32 of 130

Exercise #1 Building with Bazel

Let’s build something!

bazel build //c-client
The //c-client label is shorthand for the full label: //c-client:c-client
Bazel builds the c-client executable and prints out where it is located
Note: Bazel never puts build outputs inside the source tree, but rather in a separate Bazel output directory accessible via a symlink.

33 of 130

Exercise #1 Subcommands

Let’s build again, with more detail

bazel clean
bazel build //c-client --subcommands
Can see the actual command lines that Bazel ends up executing.
Note how the single c-client cc_binary target was turned into two commands:

First compile the source file into an object file
Then link the final executable

If we rebuild without bazel clean, there are no commands to run.

Unchanged targets can always be reused!

34 of 130

Exercise #1 Building and testing everything

Let’s build everything!

bazel build //...
Builds three targets, but does not run the fourth test target

Let’s run the tests

bazel test //...
Finds and runs all test targets

35 of 130

Exercise #1 Reproducible and hermetic?

Let’s consider the hermeticity of this project:

In WORKSPACE we have seen how to import the rules_cc rule set.
But what about the underlying compiler? Where is it coming from?
The c-client target has a custom link option (-lcurl)

Also #include <curl/curl.h> inside main.c
Where is libcurl coming from?

The cpp-server/README.md file talks about a dependency on something called asio

Various #include <asio…> lines found under cpp-server/crow/crow/…
Where is asio coming from?

Currently these dependencies are in our devcontainer.

Is that good enough?

We have seen a few target declarations, and we have seen that the cc_binary rule adds an impressive number of dependencies to an otherwise simple target.

But what do we really know about the reproducibility of this build?

Let’s dig a little deeper:

In the WORKSPACE file we’ve seen how to import the rules_cc rule set, but what do we really know about the underlying GCC compiler? Where is that coming from?
We have also seen that the “c-client” executable target has a custom link option: -lcurl. And we’re also including headers from the cURL library inside “main.c”�Where is libcurl coming from?
The README file inside “cpp-server” mentions a dependency called “asio”. And if we search around the source code we also see corresponding include statements.�Where is “asio” coming from?

The answer is that all of these are currently part of the devcontainer itself. They are simply things that are installed on the system performing the build. In the Dockerfile we can see how these things are installed with Ubuntu’s package manager.

However, from Bazel’s point of view these are _uncaptured inputs_. They are things that we implicitly assume are present on the system running the build.

If we make sure to _always_ build with the devcontainer, then we might get away with these uncaptured inputs. But things are rarely this simple:

If anybody tries to build outside the devcontainer, they will run into build problems when these dependencies are missing
Worse, they might get the wrong build result if they happen to have a different version of a dependency installed.
The devcontainer for a project is often big, as it often also contains tools that help developers, but that are not strictly necessary for a build. CI therefore often wants to build outside the devcontainer to save resources.
When the devcontainer _changes_, these changes are NOT known to Bazel. Older build results might get incorrectly reused because the inputs are uncaptured.

We’ll get into how to fix these issues later in the workshop.

36 of 130

1.2�Bazel concepts deep dive

37 of 130

Labels

@cookie_machine//cpp-server:cpp-server

Repository name (can be omitted, defaults to the main repository). @ references the main repository, which will still work even from external repositories.

A repository name by itself is also a valid label, it expands like this @curl = @curl//:curl.

Package name (a directory with a BUILD file in it). A package name can be absolute or relative.

Absolute package names start with //.
Relative package names are relative to the current package and can only be used when no repository name is explicitly specified. Otherwise they are much like relative paths in a filesystem.

Target name (name that is passed to an instantiation of a rule via the name attribute).

Colon is optional when a target or file in the same package is referenced.
The convention is to omit it for files but add it for targets.
If package name is specified but no colon and target name are present, the target name is taken to be the same as the last segment in the package name, e.g.: @cookie_machine//cpp-server is the same target as at the top of the slide.

38 of 130

Loads

# BUILD.bazel / WORKSPACE / .bzl

load("@rules_cc//cc:defs.bzl", "cc_library", "cc_binary")

39 of 130

Target visibility

Packages are the unit of granularity.
Visibility specs:

"//visibility:public"
"//visibility:private"
"//foo/bar:__pkg__"
"//foo/bar:__subpackages__"
"//some_pkg:my_package_group"

# BUILD.bazel

cc_binary(

name = "c-client",

srcs = ["main.c",],

linkopts = ["-lcurl"],

visibility = ["//visibility:public"],

)

Let us briefly consider the topic of visibility. There are in fact two kinds of visibility of Bazel: load visibility and target visibility. You don’t hear much about load visibility because it pertains to .bzl files and always defaults to public. No one usually bothers to change that. Target visibility on the other hand defaults to private and so it is usually desirable to override that in order to make targets accessible outside of their parent package.

Packages are the unit of granularity for deciding whether or not to allow access. There are two ways to specify visibility:

through visibility attributes
through package metadata

For prototyping, you can disable target visibility enforcement by setting the flag --check_visibility=false.

Buildifier has liter warnings in order to respect the convention about private and internal subdirectories, see: https://github.com/bazelbuild/buildtools/blob/main/WARNINGS.md#bzl-visibility.

Official docs: https://bazel.build/concepts/visibility.

40 of 130

Target visibility: default visibility in a package

# BUILD.bazel

package(default_visibility = ["public"])

41 of 130

Rules

# BUILD.bazel

cc_library(

name = "crow",

hdrs = glob(

["crow/**/*.h", "crow/**/*.hpp"]),

strip_include_prefix = "crow/",

)

# cc_library.bzl

cc_library = rule(

implementation = _cc_library_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

"hdrs": attr.label_list(

allow_files = True,

),

"deps": attr.label_list(

providers = [CcInfo],

),

"linkstatic": attr.bool(default = False),

"includes": attr.string_list(),

"strip_include_prefix": attr.string(),

"copts": attr.string_list(),

...

},

...,

provides = [CcInfo],

)

A rule is the Bazel’s workhorse. Most time during a build is typically spent invoking rules. A rule is supposed to be pure, which means that its output should be completely determined by its inputs. Only if this condition is satisfied can we hope to maintain a correct cache.

A rule is a parametrized by a collection of inputs that are passed through different channels. For now we will concentrate on attributes. Attributes are specified via the “attr” attribute of the built-in “rule” function and are created by invoking various functions in the “attr” object. All attributes must be completely known before a build can start. For example, “glob” is not an exception since it is expanded during the loading phase into a concrete list of filenames.

A rule typically creates a build output and also returns a provider. Let us take a closer look at providers.

Official docs: https://bazel.build/extending/rules.

42 of 130

Providers

# my_provider.bzl

CcInfo = provider(

doc = "A provider for cc rules.",

fields = {

"headers": "Headers of this target and all transitive dependencies",

"includes": "Directories to pass via -I when compiling",

"linker_inputs": "A collection of files to pass to the linker.",

...

},

)

A provider can be thought of as a typed object that is guaranteed to have certain pre-declared fields. It is declared with the built-in function “provider”. The result of such invocation must be stored in a global value. Rules can then have a rudimentary typing discipline by declaring what types of providers they return (in the “provides” attribute passed to the rule function) and what types of providers they expect label attributes to provide (the “providers” attribute of “attr.label” and related functions). This way information is passed around in the target graph. Providers is the only way to obtain information from e.g. a dependency. For this to work, the dependency must save that information first and return it packed in a provider.

Official docs: https://bazel.build/rules/lib/globals/bzl.html#provider.

43 of 130

Rules: a simple implementation function

# cc_library.bzl

def _cc_library_impl(ctx):

args = ctx.actions.args()

lib_path = ctx.attr.name + ".so"

transitive_headers = ctx.attr.hdrs

for dep in ctx.attr.deps:

transitive_headers += dep[CcInfo].headers

args.add_all(["-I" + include for include

in ctx.attr.includes])

args.add_all(ctx.attr.srcs)

...

lib = ctx.actions.declare_file(lib_path)

ctx.actions.run(

inputs = ctx.attr.srcs + MORE STUFF,

outputs = [lib],

arguments = args,

executable = my_compiler,

)

return

[CcInfo(headers = transitive_headers,

linker_inputs = [lib],

...)]

This little listing demonstrates most core features of a rule implementation:

Indexation with the provider name in order to access information from a dependency.
The use of the args data structure.
How to invoke an executable.
How to return a provider (one or possibly many).

In order to keep track of the build graph every file that is created during a rule invocation must be declared with the “declare_file” function. If a file is created but not declared, it will not be accessible to the build system and will be removed once the implementation function returns. To declare a file its exact name must be specified. Therefore, one has to be able to predict statically all names of outputs.

Official docs: https://bazel.build/rules/lib/builtins/ctx, https://bazel.build/rules/lib/builtins/actions and https://bazel.build/extending/rules#implementation_function.

44 of 130

Rules: declare_directory

declare_directory can be used instead of declare_file, however it is not possible to access individual files this way and make any decisions based on what files where created.
Nevertheless, a declared directory can be passed around in a Provider, and it ultimately can be passed to args.add_all. This will traverse the directory and add all files from it.

45 of 130

Rule implementation and build actions

Rule implementation is a Starlark function that gets called at the analysis phase. It produces a collection of build actions.
The implementation function is not called during the execution phase.
Build actions are executed as necessary during the execution phase.

46 of 130

Rules: Q&A

Q: Can I access the internet from a rule implementation?

A: Not directly, the implementation function per se can never access the internet.

Q: Can I access the internet from build actions produced by the implementation function?

A: This is frowned upon and will not work by default. A repository rule is a better place to do fetching of sources. However, one can add a “requires-network” tag to allow build actions that are executed while building the target in question to access the network. It is the responsibility of the rule author to ensure reproducibility in that case. Bazel will not know when the resource you access changes.

47 of 130

Filegroups

# BUILD.bazel

filegroup(

name = "exported_testdata",

srcs = glob([

"testdata/*.dat",

"testdata/logs/**/*.log",

]),

)

cc_library(

name = "lib_b",

...,

data = [

"//my_package:exported_testdata",

],

visibility = ["//visibility:public"],

)

48 of 130

Filegroups: implementation + DefaultInfo

# filegroup.bzl

filegroup = rule(

implementation = _filegroup_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

},

provides = [DefaultInfo],

)

def _filegroup_impl(ctx):

return [DefaultInfo(files = ctx.attr.srcs)]

49 of 130

Genrule

# BUILD.bazel

genrule(

name = "concat_all_files",

srcs = [

"//some:files", # a filegroup with multiple files in it ==> $(locations)

"//other:gen", # a genrule with a single output ==> $(location)

],

outs = ["concatenated.txt"],

cmd = "cat $(locations //some:files) $(location //other:gen) > $@",

)

50 of 130

Macros: templating around rule invocations

# my_cc_library.bzl

def my_cc_library(**attrs):

if "enable_foo" in attrs.keys():

attrs["copts"] = attrs.get("copts", default = [])

+ ["-std=c++14", "-Wstack-usage=10000"]

attrs.pop("enable_foo", None)

native.cc_library(**attrs)

51 of 130

Macros: templating around rule invocations

# my_app.bzl

def my_app(name, srcs, deps):

app_name = name + "_app"

native.cc_binary(

name = app_name,

srcs = srcs,

deps = deps,

)

native.genrule(

name = name + "_config",

srcs = [":" + app_name],

outs = [name + ".json"],

cmd = "$(location //:make_config) $(location {}) > $@".format(app_name),

tools = ["//:make_config"],

)

52 of 130

Repository rules

# http_archive.bzl

http_archive = repository_rule(

implementation = _impl,

attrs = {

"urls": attr.string_list(mandatory=True),

"sha256": attr.string(mandatory=True),

"strip_prefix": attr.string(),

}

)

53 of 130

Repository rules

# WORKSPACE

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://github.com/bazelbuild/rules_cc/releases/download/0.0.10/rules_cc-0.0.10.tar.gz"],

sha256 = "65b67b81c6da378f136cc7e7e14ee08d5b9375973427eceb8c773a4f69fa7e49",

strip_prefix = "rules_cc-0.0.10",

)

54 of 130

Repository rules

# http_archive.bzl

def _impl(repository_ctx):

url = repository_ctx.attr.urls[0]

sha256 = repository_ctx.attr.sha256

repository_ctx.download_and_extract(url, sha256=sha256)

There are a few ways to trigger a fetch:

$ bazel sync --only=rules_cc

$ bazel fetch @rules_cc//:*

To check what an external repository looks like:

$ ls -la $(bazel info output_base)/external/rules_cc

55 of 130

Repository rules: the repository_ctx object

56 of 130

Common repository rules

new_* rules typically allow for creation of BUILD files, although http_archive does not need a new_* version and in recent versions of Bazel git_repository and new_git_repository are essentially the same thing.

WORKSPACE files in external repositories have no significance.

57 of 130

Repository rules

Prefer http_archive to git_repository. The reasons are:

Git repository rules depend on system git(1) whereas the HTTP downloader is built into Bazel and has no system dependencies.
http_archive supports a list of urls as mirrors, and git_repository supports only a single remote.
http_archive works with the repository cache, but not git_repository. See #5116 for more information.

58 of 130

Questions?

59 of 130

Exercise #2: Write a macro

Write a macro that wraps either native.cc_library or native.cc_binary rule and adds “-std=c++17” to their copt attribute. Prevent users from passing their own copts.
Make the code base use that macro instead of the default cc rules.
Hint: you can make the loading phase fail by using the “fail” function.
Put the definition of the macro in bazel/defs.bzl.
See the branch solution2 when you are ready to verify your solution.

60 of 130

2.1�Toolchains, platforms, selects

61 of 130

Toolchains & platforms

Platforms

Toolchains

62 of 130

Toolchains & platforms

Platforms

Toolchains

Constraints

63 of 130

Constraints

# BUILD.bazel

constraint_setting(

name = "compiler",

default_constraint_value = ":gcc",

)

constraint_value(

name = "qcc",

constraint_setting = ":compiler",

)

constraint_value(

name = "gcc",

constraint_setting = ":compiler",

)

64 of 130

Platforms

# BUILD.bazel

platform(

name = "x86_64_linux",

constraint_values = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

65 of 130

Platforms

# BUILD.bazel

platform(

name = "x86_64_linux",

constraint_values = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

$ bazel build --platforms=//:x86_64_linux //...

66 of 130

Toolchains: toolchain type

# bar_tools/BUILD.bazel

# By convention, toolchain_type targets are named "toolchain_type" and

# distinguished by their package path. So the full path for this would be

# //bar_tools:toolchain_type.

toolchain_type(name = "toolchain_type")

67 of 130

Toolchain definition

# BUILD.bazel

toolchain(

name = "qnx_x86_64_toolchain",

exec_compatible_with = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

target_compatible_with = [

"@platforms//os:qnx",

"@platforms//cpu:x86_64",

"//platform:gcc",

],

toolchain = "@qnx_sdp//:x86_64-pc-nto-qnx7.1.0",

toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",

)

A toolchain is defined with the built-in “toolchain” function. This function creates a special kind of target that refers to a normal build target that actually produces the toolchain ("@qnx_sdp//:x86_64-pc-nto-qnx7.1.0" in this case) and is “annotated” in various ways:

Its toolchain type is specified with the “toolchain_type” attribute. This is how Bazel will know that this toolchain is compatible with rules that require this toolchain type.
The “exec_compatible_with” attribute specifies what kind of host is compatible with this toolchain.
The “target_compatible_with” attribute specifies for what kind of target platforms this toolchain can be used.

Toolchain targets created with the “toolchain” function are then eligible for being registered.

Official docs: https://bazel.build/extending/toolchains#toolchain-definitions, https://bazel.build/reference/be/platforms-and-toolchains#toolchain.

Side note: toolchains can also be constrained on config settings through the “target_settings” attribute. This is used for configuration axes that are not global like platform constrains, e.g. OS or CPU, but more specific, e.g. Java version.

68 of 130

Toolchain registration

# WORKSPACE

register_toolchains(

"//bazel/toolchains/qnx_compiler:qnx_x86_64_toolchain",

"//bazel/toolchains/qnx_compiler:qcc_qnx_x86_64_toolchain",

"//bazel/toolchains/qnx_compiler:qnx_aarch64_toolchain",

"//bazel/toolchains/qnx_compiler:qcc_qnx_aarch64_toolchain",

)

69 of 130

Toolchain resolution

During toolchain resolution, constraint values that are specified by the target platforms are compared against constraints specified by registered toolchains in their “target_compatible_with” attribute.
“exec_compatible_with” of registered toolchains is compared against the detected host platform.
The first toolchain of a given type that matches all constraints is selected.
You can use --toolchain_resolution_debug=<regexp> to debug toolchain resolution. The regexp is checked against toolchain types and specific targets.

70 of 130

How rule definitions specify toolchain type

# cc_library.bzl

cc_library = rule(

implementation = _cc_library_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

"deps": attr.label_list(

providers = [CcInfo],

),

"copts": attr.string_list(),

...

},

...,

toolchains = ["@bazel_tools//tools/cpp:toolchain_type"],

provides = [CcInfo],

)

def _cc_library_impl(ctx):

...

cc_toolchain = ctx.toolchains["@bazel_tools//tools/cpp:toolchain_type"]

...

71 of 130

Configurable attributes

# BUILD.bazel

load("@bazel_skylib//lib:selects.bzl", "selects")

selects.config_setting_group(

name = "x86_64_linux",

match_all = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

cc_library(

name = "clock_util",

srcs = select({

":x86_64_linux": [

"clock_util_linux.cpp",

],

"//conditions:default": [

"clock_util_other.cpp",

],

}) + [

"clock_util.hpp",

],

visibility = ["//visibility:public"],

)

72 of 130

Questions?

73 of 130

2.2�Sandbox,�Remote cache + execution

74 of 130

Sandboxing

75 of 130

The Bazel sandbox

How can Bazel help enforce hermeticity in build actions?

Limit access to files that are not declared inputs?
Prevent writing to files that are not declared outputs?
Limit access to other processes running on the same machine?
Limit access to details about the current user?
Limit access to network?

Official docs: https://bazel.build/docs/sandboxing and https://bazel.build/docs/user-manual#execution-strategy

76 of 130

The Bazel sandbox

Enabled by default, based on what the underlying system supports

Different sandbox implementations available

What does it do?

Prepare a sandbox directory with symlinks mirroring the source tree
Only declared inputs are present inside the sandbox
Only declared outputs are copied back out of the sandbox and preserved
Might also limit access to network, other processes, etc. depending on platform support

What does it not do?

Does not limit access to system tools, system headers, or system libraries!

Will not help you catch implicit dependencies on undeclared tools/libraries
(although some rule sets like rules_cc provide some additional protections)

Does not apply to repository rules!

Runs in an earlier phase, and their job is to prepare the build environment

77 of 130

processwrapper-sandbox

Works on any POSIX system, does not require “advanced” features

Limit access to files that are not declared inputs? ✅
Only declared outputs are preserved? ✅
Limit access to other processes running on the same machine? ❌
Limit access to details about the current user? ❌
Limit access to network? ❌�
Works inside unprivileged containers

78 of 130

linux-sandbox

Uses Linux Namespaces to isolate the build action from the underlying system:

Limit access to files that are not declared inputs? ✅
Only declared outputs are preserved? ✅
Limit access to other processes running on the same machine? ✅
Limit access to details about the current user? ✅
Limit access to network? ✅�
Requires Linux with namespace features

Container must be privileged in order to allow these features inside

79 of 130

darwin-sandbox

Uses Apple’s sandbox-exec to achieve roughly the same as linux-sandbox:

Limit access to files that are not declared inputs? ✅
Only declared outputs are preserved? ✅
Limit access to other processes running on the same machine? ✅
Limit access to details about the current user? ✅
Limit access to network? ✅

80 of 130

Windows?

Sorry, no official sandboxing support available

81 of 130

local (no sandbox)

Executes the action from the root of your workspace.

Limit access to files that are not declared inputs? ❌
Only declared outputs are preserved? ❌
Limit access to other processes running on the same machine? ❌
Limit access to details about the current user? ❌
Limit access to network? ❌�

Useful for debugging hermeticity:

fails in the sandbox, but works with --spawn_strategy=local

82 of 130

The Bazel sandbox

Controlled with --spawn-strategy or --strategy flags
Many other flags control various details of the sandboxing:

--strategy_regexp (for more fine-grained control of sandboxing per action)
--[no]experimental_use_hermetic_linux_sandbox
--experimental_sandbox_limits
--[no]incompatible_sandbox_hermetic_tmp
--[no]reuse_sandbox_directories
--sandbox_add_mount_pair
--sandbox_block_path
--[no]sandbox_default_allow_network
--[no]sandbox_fake_hostname
--[no]sandbox_fake_username
--sandbox_writable_path
--[no]sandbox_debug

Can also be controlled via tags specific to certain targets/actions

83 of 130

Remote cache

84 of 130

Bazel’s (local) action cache

Bazel breaks down a build into discrete actions
Each action knows its

input files
command line
environment variables
expected output paths

Each actions generates

actual output files

actionKey

Hash function

digestKey

Hash function

bazel dump --action_cache

Remember that Bazel breaks down a build into discrete build actions.

Bazel has a local cache of these build actions, so that does not need to redo any actions that have not changed since a previous build. How does it know this?

***

Well, each action knows everything it needs to run. We can generate a hash from this and call this the actionKey.

***

In a similar way, once the action has been executed, we can generate a hash based on the contents of the output files, and call this the digestKey.

***

Because the action is pure and hermetic, we know that given the same inputs, we will always get the same output, and from this we can derive that if we ever see the same actionKey, then we already know the corresponding digestKey that the action will generate. We can store this mapping from actionKey to digestKey, and use it to skip unnecessary build actions:

***

If we see an actionKey that is already in our cache, then we can check the digestKey against the actual output files, and if they match, then we can reuse the output files without rerunning the action.

***

If you want to look more into the details of this caching, here is a command that will give you all the gory details.

85 of 130

Bazel’s remote cache

What if we could share this cache with our colleagues (and with CI)?

Copy the action cache (actionKey ⇒ digestKey mapping) to a remote server
What about the output files themselves?

Content addressable Store (CAS): digestKey ⇒ output files

How to use?

Compute the actionKey
If actionKey is in the local cache: Use the local output files directly
Else if actionKey is in the remote cache: Download output files from remote CAS
Else:

Execute the action locally
Upload the result into the remote cache

86 of 130

Bazel’s remote cache

We’re still executing all actions locally

The remote server only stores the cache

The real world is always more complicated:

https://blog.engflow.com/2024/05/13/the-many-caches-of-bazel/

Only works if your build is already hermetic and reproducible!

Different actionKey ⇒ Cache miss ⇒ build locally
Overhead means this can be worse than having no remote cache!

Debugging remote cache involves:

Looking at cache hit rates
Figuring out why actionKeys differ when they shouldn’t ⇐ different build environments

e.g. leaking environment variables into an action

Figuring out why actionKeys are equal they should differ ⇐ uncaptured build inputs

Some notes on remote caching:

Note that when we need to execute an action, that still always happens locally.

Also, the real world details of caching in Bazel are more complicated than what we can cover in this course, but here is a good blog post that goes into some of that detail.

The most important thing to remember about remote caching is that it ONLY works well when your build is already hermetic and reproducible.

If two colleagues do the same build, but there are details in their environment that causes their actionKeys to be different, then they won’t be able to benefit from each other’s cached results, and the remote cache will be filled with build outputs that are almost never reused.

In general, when debugging remote cache issues, you end up looking at the cache hit rate a lot. A higher hit rate means that the remote cache is being used effectively, and is preventing a lot of unnecessary rebuilds. A lower hit rate is typically a symptom of hermeticity problems.

Lower hit rate often means that actionKeys are different when you expect them to be the same, and this is often caused by differences in build environments that “pollute” the actionKey. Finding and eliminating this pollution is important to improve the hit rate.

Less common, but still important, is the opposite problem: If you have uncaptured inputs you can end up with actions that SHOULD be different, but still share the same actionKey. The symptom here is typically a subtly incorrect build result because you reused a build output that was not appropriate for the build you wanted to do.

87 of 130

Setting up a remote cache

Bazel cache protocol, either over HTTP, gRPC, or UNIX sockets
Consists of two parts:

The action cache (actionKey ⇒ digestKey)
The CAS (digestKey ⇒ build output)

Often setup together with remote execution (see next section)

Cache-only services do exist, e.g. based on nginx, bazel-remote or GCS
More info: https://bazel.build/remote/caching#cache-backend

Relevant Bazel flags:

--remote_cache=<URL>
--remote_upload_local_results=false
Per-target adjustments: tags = ["no-remote-cache"]
More details: https://www.tweag.io/blog/2020-04-09-bazel-remote-cache/

Now, how do you go about setting up a remote cache?

Bazel offers three different caching protocols, based either on HTTP, on gRPC (part of the bigger remote execution protocol that we’ll cover in the next section), or on UNIX sockets.

The server side of the protocol implements the two parts that we’ve already talked about, the action cache itself, and the content addressed storage of build outputs.

There are several products or services that implement this server side. Most implementation are part of a bigger remote execution offering that we’ll cover in the next section, but if you want to set up caching only, there are solutions based on nginx, on a separate bazel-remote project, or based on Google Cloud Storage. The official documentation has more details.

When you setup remote caching, though, you will need to configure Bazel accordingly:

There are two Bazel options that control this, the first is the --remote_cache option that will point to the URL of the remote cache server, and the second is a boolean flag you can use to control whether the Bazel client will only download build outputs from the cache, or also upload its own build outputs into the cache. Depending on your scenario, it can for example be useful to only allow uploads from CI workers, whereas regular developers can only download results that have been built by CI.

These options would typically go into some shared .bazelrc file, so that all developers run with the same settings.

Next, it is also possible to adjust the use of remote cache on a per-target basis. For example, if you have a particularly large build output that is cheaper to regenerate locally rather than downloading from the remote cache, you can tag this so that it won’t interact with the remote cache at all.

Again, my colleague Mark has written a more detailed exploration on our blog: https://www.tweag.io/blog/2020-04-09-bazel-remote-cache/

88 of 130

A “local” remote cache?

Yes, controlled by: --disk_cache=<PATH>
Points to an “external” cache on your local disk
Lives outside Bazel’s output_base ⇒ not the same as the local cache
Used by Bazel after the local action cache, but before the remote cache
Survives bazel clean and even bazel clean --expunge
Useful when switching between multiple build configurations

These can often thrash the local action cache + output_path

Sometimes confusing when you think you’re building from scratch

e.g. after bazel clean

Disable it by passing an empty path: --disk_cache=

Let’s backtrack a little bit: There is actually yet another cache that Bazel can use, in between the local cache and the remote cache.

This is called the disk cache, and it is controlled by the --disk_cache option.

It looks and behaves like a remote cache, but it lives on your local disk, rather than on a remote machine.

Bazel checks this cache _after_ checking the local action cache, but _before_ asking the remote cache.

Since it lives outside Bazel’s output_base, it is _not_ affected by `bazel clean`, not even when you use the `--expunge` option.

This can be very useful when you are switching back and forth between scenarios that cause the entire project to rebuild. For example build configurations that affect the flags that are passed into each action.

This switching back and forth will rewrite everything under Bazel’s output_path, and thus the regular old local action cache will struggle to find anything reusable.

But the disk cache will be able to keep track of outputs across these switches, and thus save you a lot of rebuilding, but without having to set up a remote cache.

Of course, there’s a flip side to this as well. Since you have another level of caching, it might sometimes end up providing a cached output when you actually _want_ to rebuild from scratch.

In those cases you need to remember that you can always disable it by passing an empty path to the --disk_cache option.

89 of 130

Remote execution

90 of 130

Remote execution

Relies on remote caching
Also relies on sandboxing, in spirit:

If a build action fails in a local sandbox, it probably won’t work in RE
If a build action succeeds in a local sandbox, it might also work in RE
A working local build is no guarantee that you have resolved all hermeticity issues for RE

The idea:

If an action is not already cached (locally or remotely)
Upload the action’s inputs into the CAS
Send a message to execute the action remotely:

The remote executor downloads action inputs from the cache, executes the action and uploads the build output into the remote cache

Download the action outputs from the remote cache

91 of 130

Remote execution

Remote Execution API:

gRPC protocol
Used by Bazel, but also other build systems: Buck2, Pants, etc.

Flags to enable remote execution:

--remote_executor=<URL> points to the RE cluster

(also sets --remote_cache, if unset)

--remote_instance_name can be used to separate workloads inside the RE cluster�

Good overview of remote caching and remote execution:�https://www.buildbuddy.io/blog/bazels-remote-caching-and-remote-execution-explained/

Official docs: https://bazel.build/remote/rbe, https://bazel.build/remote/rules, https://bazel.build/community/remote-execution-services

92 of 130

Setting up remote execution

Self-service:

Buildbarn: https://github.com/buildbarn

Example deployments: https://github.com/buildbarn/bb-deployments

Buildfarm: https://buildfarm.github.io/buildfarm/
BuildGrid: https://buildgrid.build
NativeLink: https://www.nativelink.com/docs/introduction/setup

Commercial:

Aspect: https://docs.aspect.build/workflows/
BuildBuddy: https://www.buildbuddy.io/docs/introduction/
EngFlow: https://docs.engflow.com

93 of 130

Build without the Bytes

By default (before Bazel 7), for each build action that needs to be executed remotely:

Upload the action inputs (if necessary)
Download the action outputs

This is expensive when all you want is the final build result, i.e. the output of the last build action

Enter “Build without the Bytes”:

Limit which action outputs are downloaded to the local machine:
Controlled with --remote_download_outputs=<all, minimal or toplevel>

all: Download all actions outputs (default before Bazel 7)
minimal: Only download what is required by local build actions (e.g. to run tests)
toplevel: Download outputs associated with top level targets (default since Bazel 7)

More details: https://blog.bazel.build/2023/10/06/bwob-in-bazel-7.html

A bonus topic when it comes to Bazel and remote execution:

Before Bazel version 7, by default, the outputs from each and every build action would be downloaded to the local machine while the build was running remotely.

This is wasteful if all you really want from the build is the final build result.

Bazel has therefore introduced a concept called “Build without the Bytes”: The idea is that you should be able to select which build outputs you want to have available locally after a remote build execution.

This is controlled by the --remote_download_outputs option, and it takes one of three values:

all, means that all action outputs are made available locally. This was the default behavior before Bazel 7
minimal, means that action outputs are only downloaded if they are needed to run some local action. For example, if you have configured your tests to run locally, you can still build them remotely, but Bazel will have to download the necessary build outputs before the tests can be run on your local machine.
toplevel, is a bit more relaxed. This also downloads outputs associated with top level targets. For example, if you ask to build a specific target on the command line, then that target’s output will be downloaded, but any intermediate outputs will be skipped.

There is an article on the Bazel blog that goes into more detail.

94 of 130

2.3�Bazel rule sets

95 of 130

Bazel rule sets

Why? Make Bazel extensible to many languages and ecosystems
What? Bazel repository with rule implementations written in Starlark
Examples of rule sets:

We’ve already seen r ules_cc for building C/C++ code
Many other languages: r ules_java, r ules_go, rules_py, rules_rust, rules_haskell, etc.
More than just languages:

Google Protocol Buffers: rules_proto
Building tar/zip/deb/rpm packages: rules_pkg
Building container images: rules_oci
General Starlark utilities: bazel-skylib

Running other build systems inside Bazel: r ules_foreign_cc

Official docs: https://bazel.build/rules

Bazel is constructed to support building many different languages and scenarios, but it is impossible for Bazel to support any language or ecosystem out of the box.

Instead, as we’ve already seen, Bazel allows rules to be written in the Starlark language.

We can _share_ such rules between Bazel projects by putting them into an archive that can then be shared as a Bazel repository into your Bazel workspace.

So, we can write a set of reusable rules related to a particular language or ecosystem, and then share this with the wider Bazel community. And thus a Bazel rule set is born.

We have already encountered a rule set in our previous exercises: The rules_cc rule set is a set of rules for building C/C++ code. This rule set was originally part of Bazel itself, but it is now being distributed as a separate rule set, on equal footing with other rule sets.

There are rule sets available for many other languages as well: Java, Go, Python, Rust, Haskell, and so forth.

There are also rule sets that don’t target a specific programming language, but rather some other common aspect that many Bazel projects need to deal with. For example, if you’re using Google Protocol Buffers, there is a rule set for working with these, generating code from protobuf definitions, and so on. There is a rule set for generating various package formats, which is a common output in many build processes. And there is even a rule set for allowing other build systems to be run inside a Bazel build process. We’ll look more into this very soon.

Some of these rule sets are maintained by the Bazel authors themselves, some are maintained either by individual members of the Bazel community, or by the community as a whole.

96 of 130

Example: rules_go and rules_rust vs. rules_cc

Most language rule sets provide similar rules:

In BUILD files:

cc_binary() / go_binary() / rust_binary()
cc_library() / go_library() / rust_library()
cc_test() / go_test() / rust_test()
as well as more language-specific rules…

E.g. cc_import() to import and use pre-compiled dependencies

In the WORKSPACE file:

register_toolchains() / go_register_toolchains() / rust_register_toolchains()

First, though, there are some common patterns among rule sets that target programming languages, and it’s worth mentioning these:

In our exercises so far, we have encountered cc_binary() and cc_library() rules, for building executables and libraries. The rules_cc rule set also provide a rule called cc_test() for building and running test cases written in C/C++.

But the concepts of binaries, libraries and test programs are certainly not limited to only C/C++ code. In fact, there is a common pattern where all rule sets that target a language are expected to provide these same rules for that language. So, for example in the rules_go rule set, you will find go_binary(), go_library(), and go_test() rules for building Go binaries, libraries and test programs, respectively. The same applies to rules_rust, for the Rust programming language. And you’ll find the same for pretty much all other language-specific rule sets.

Most language rule sets also provide additional rules that are more specific to the language at hand. For example, the rules_cc provides a cc_import() rule that can be used to make precompiled C/C++ libraries available to your build process. The rules_rust rule set provides rules for interfacing with other tools in the Rust ecosystem, such as Cargo – the Rust package manager, or rustfmt – the default code formatter for Rust code.

Finally, there are also some common patterns when you import and set up these rule sets in the WORKSPACE file. For example, many languages provide a way to specify and configure the exact toolchain used for compiling source files in that language.

97 of 130

Example: rules_foreign_cc

Supports running other build systems inside Bazel:

CMake
Meson
Ninja
GNU Make (including the configure && make && make install pattern)�

Why?

Build a non-Bazel 3rd-party dependency as part of a bigger Bazel project

98 of 130

Example: rules_foreign_cc

Using rules_foreign_cc:

Load rules_foreign_cc itself inside WORKSPACE
Fetch 3rd-party source code as a Bazel repository
Provide a BUILD file with instructions how to build the 3rd-party code

Use rules from rules_foreign_cc to invoke the foreign build process

E.g. cmake() or configure_make()

Declare the overall inputs/output of the foreign build process

Depend on these targets from your own targets�

Drawbacks: A “foreign” build is “opaque” to Bazel:

Run as a single action from Bazel’s POV
Hard to set up fine-grained dependencies on parts of the foreign output.
Limits parallelism against fine-grained dependencies

So, how do you actually use rules_foreign_cc in such scenarios?

You start by importing the rules_foreign_cc rule set itself in your WORKSPACE file.

Then, you use something like http_archive to fetch the 3rd-party source code as a Bazel repository. This makes the project files themselves available to the Bazel build process, but remember that there is no Bazel BUILD files inside this 3rd-party project, so Bazel does not yet have any idea of how to build any of this, or integrate it with your own build products.

Therefore, you also need to provide a BUILD file that tells Bazel how to build this code. And this is where you would then use a rule like cmake() or configure_make() from rules_foreign_cc in order to build that 3rd-party code. You also need to declare the overall inputs and outputs from this 3rd-party build process, so that Bazel can make sure the dependencies line up correctly and everything is built in the right order.

And finally, you would then add dependencies from your own build targets onto the outputs from this 3rd-party build process. Thus you can, for example, link against a library that was produced by this other build process.

Note that there are some drawbacks to this approach, compared to when the entire build graph is defined in Bazel: The foreign builds are executed as _single_ build actions as far as Bazel is concerned.

Even if the result of the foreign build process is a collection of different libraries or other build outputs, it does not help to set up fine-grained dependencies on these, since they are all produced by this one, large foreign action. Thus Bazel is often limited in how it can schedule other build actions around these foreign builds, and you can often end up with your build having to wait for the foreign build process to finish, before your own targets that depend on it can start to build.

99 of 130

Questions?

100 of 130

Exercise #3: Import and build a 3rd-party dependency

Recognize the undeclared c-client dependency on curl:

#include <curl/curl.h> in the c-client/main.c source code
Setting linkopts to link against curl
Verify libcurl dependency with: ldd bazel-bin/c-client/c-client
But Bazel does not know about curl:

Where is curl installed?
Which version?

Steps:

sudo apt remove libcurl4-openssl-dev
bazel clean && bazel build //c-client --disk_cache=

Observe failure due to curl/curl.h: No such file or directory

How can we fix this?

101 of 130

Exercise #3: Import and build a 3rd-party dependency

How can we fix this?

Retrieve a known version of the curl library source code
Build a working curl library (curl’s build system is CMake)
Link c-client against this curl library

# WORKSPACE

http_archive(

name = "curl",

...

)

# BUILD

cmake(

name = "curl",

...

)

@curl

//c-client:main.c

//c-client:c-client

@curl//:curl

How can we fix this?

In short we need to retrieve a known version of the curl source code

Then we need to build a working curl library. A complication here is that the curl itself is not a Bazel project, but it usually built with CMake.

And then finally, we need to link our c-client against this curl library that we just built.

To illustrate more precisely:

***

We need to fetch the curl library source code with a http_archive() call in our WORKSPACE file.

***

This will be represented in Bazel as a @curl external repository.

***

Then we need a BUILD file inside this @curl repository that tells Bazel how to build the curl library.

***

This will create a “curl” target inside the “@curl” repository.

***

Finally, we need to add a dependency from our c-client target to this new curl library target.

102 of 130

Exercise #3: Import and build a 3rd-party dependency

Oops, transitive dependency! curl depends on openssl

Must first get openssl sources
Then build openssl (using Configure + make)
Then link curl against this openssl

# WORKSPACE

http_archive(

name = "openssl",

...

)

# BUILD

configure_make(

name = "openssl",

...

)

@curl

//c-client:main.c

//c-client:c-client

@curl//:curl

@openssl//:openssl

103 of 130

Exercise #3: Import and build a 3rd-party dependency

git checkout exercise3
Open the WORKSPACE file

We’ve already added rules_foreign_cc and openssl
How to get the curl source code?

Use this version: https://curl.se/download/curl-7.77.0.tar.gz
Download http_archive: bazel sync --only=curl

How to build curl?

Look to openssl for inspiration
cmake() docs: https://bazel-contrib.github.io/rules_foreign_cc/cmake.html
See WORKSPACE for more hints…
Build curl stand-alone: bazel build @curl

@curl is shorthand for @curl//:curl

104 of 130

Exercise #3: Import and build a 3rd-party dependency

Adjust c-client to properly depend on @curl

No more linkopts, use deps instead
Hint: One line change
bazel build //c-client
Verify result with:

bazel query "deps(//c-client:c-client)"
ldd bazel-bin/c-client/c-client

105 of 130

Exercise #4: Hermetic toolchain

Motivation: Replace non-hermetic use of build machine’s /usr/bin/gcc with a toolchain that is fully controlled by Bazel.
Introduce a hermetic C/C++ toolchain by leveraging https://github.com/bazel-contrib/toolchains_llvm.

Follow instructions from https://github.com/bazel-contrib/toolchains_llvm/releases/tag/v1.2.0.
Be sure to follow the snippet for WORKSPACE, not for bzlmod.
Be sure to use the LLVM toolchain version 17.0.6.
If you encounter error messages about passing -std=c++17 to C program, feel free to switch from the macro developed in the exercise 2 back to a normal cc_binary rule.

Start from branch exercise4
See the branch solution4 for solution.

106 of 130

Exercises recap

What have we achieved:

We are hermetic with respect to libcurl (and its openssl dependency).
We are hermetic with respect to the C/C++ toolchain we are using.

Are we now fully hermetic from Bazel’s point of view? No!

cpp-server still pulls asio from the devcontainer.
LLVM toolchain will still access system headers from the devcontainer.
Shell utilities such as cat are taken from the system and are not fixed by Bazel.

Approaching hermeticity is a process

Balancing act between what we allow the system to provide, and what we explicitly get via Bazel.

107 of 130

3.1�Bazel and CI

108 of 130

Repository cache

Repository cache is the cache that is used to avoid downloading files from the Internet. It is a content-addressable store (CAS) indexed by SHA256 checksums of downloads.
It has nothing to do with remote cache. Remote cache does not cache downloads per se since validation of downloads happens before remote cache gets involved.
The same repository cache can be shared across multiple Bazel workspaces.
Repository cache is a reason to prefer persistent CI workers as opposed to ephemeral workers that are destroyed after each run. Alternatively, CI could arrange to persist repository cache directory specifically and make sure it is populated on the CI worker whenever it is started.

109 of 130

When is a repository rule re-fetched?

In contrast to regular targets, repos are not necessarily re-fetched when something changes that would cause the repo to be different. This is because there are things that Bazel either cannot detect changes to or it would cause too much overhead on every build (for example, things that are fetched from the network). Therefore, repos are re-fetched only if one of the following things changes:

The attributes passed to the repo rule invocation.
The Starlark code comprising the implementation of the repo rule.
The value of any environment variable passed to repository_ctx's getenv() method
The existence, contents, and type of any paths being watch ed in the implementation function of the repo rule.
When bazel fetch --force is executed.

110 of 130

Bazel’s client/server architecture

Bazel’s client/server organization allows for many optimizations that would not be otherwise possible, such as caching of BUILD files, dependency graphs, and other metadata from one build to the next.
When you run bazel, you're running the client. The client finds the server based on the output base (by default is determined by the path of the base workspace directory and your user id).
If the client cannot find a running server instance, it starts a new one.
Server shuts down after a period of inactivity, by default 3 hours.

111 of 130

Analysis cache

Analysis and loading phases can take a significant amount of time.
Analysis cache is part of the in-process state of the Bazel server, so losing the server loses the cache. But the cache is also invalidated very easily: for example, many bazel command line flags cause the cache to be discarded.
Tip: for a branch of CI choose a set of flags and stick to it. Do not change flags back and forth during a build.
Tip: if persistent CI workers are used, it is beneficial to dedicate a worker per platform so that analysis cache can be reused. For example, use the same worker to build for x86_64_linux, instead of selecting an arbitrary worker every time.
Tip: you can use --announce_rc in order to identify and debug analysis cache purges.

112 of 130

Overview of Bazel caches

In memory

On local disk

Remote

Skyframe

Analysis cache

Repository cache

Under output_base

Action cache

Output tree

Disk cache

Action cache

CAS

Remote cache

Action cache

CAS

113 of 130

Target determination

In a CI pipeline, build only what you need to build. Building //... may be too much.
If you use //... as a build wildcard on CI make sure to tag targets that should not be built by default with the “manual” tag.
Furthermore, there is special tooling such as https://github.com/bazel-contrib/target-determinator that can be used to determine a smaller subset of targets to (re)build based on changed as reported by revision control. This aims to save time on analysis.

114 of 130

The relationship between Bazel and CI

Bazel's features can be used to replace (parts of) complicated CI pipelines.
Bazel can help to make large portions of CI runnable on developer machines as opposed to having to test on CI only, which can help reduce feedback times.
With multi-platform remote execution setups it is possible to replace cross platform CI pipeline configurations with a Bazel configuration that can be used to initiate cross platform builds from a developer machine with the help of a remote execution cluster.

115 of 130

3.2�Organizational points

116 of 130

Build system maintenance

Code owners
DevOps
Infrastructure

Debugging and improving the build system issues requires:

Cross-functional knowledge and mandates

Fixes happen not only in the build system itself, but also in the code, and in the infra!

Working across teams

Communication and collaboration across team boundaries are put to the test!

Who maintains the build system?

Everybody!

Hopefully we’ve been able to show you in this workshop that working with Bazel stretches across and outside the usual boundaries of teams and practices.

***

Everybody is sooner or later involved in maintaining and improving the build system.

***

On one hand, you will have developers who know how the code is structured, and you will need their help to decouple two modules, so that the corresponding build graph can be simplified and optimized. On the other hand, you will need help from infra people who know how to scale and balance server loads so that build actions are scheduled in the most optimal way across the available compute and storage resources.

Often, what might seem like a build system issue at first, can just as well end up being an issue with the code or infrastructure.

Working on build system issues requires an appetite for communicating and collaborating _across_ these traditional team boundaries. Find the root cause of issues together with the appropriate domain experts, and make sure fixes are made in the right places!

When this does not happen, you can often see one team adding hacks and workarounds to their part of the code, just because some other team won’t help make the fix in a better way.

This is important to keep in mind when managing build system work: Even when the “the build system” ends up being owned by one of the groups, it is important that a mandate exists to make necessary code changes across all the groups involved.

117 of 130

Build system maintenance

The build system is never finished:

Initial acceleration after adopting Bazel
Don’t allow build + test times to slowly drift back up after the migration
Historic metrics of overall and per-target build + test times
Spot trends and degradations quickly

Easier to fix these now, than later

A faster build allows you to do more

Just like the code that is built constantly changes and improves, so must the build system change and improve along with it.

In organizations adopting Bazel it is common to see an initial acceleration of the build, as hermeticity problems are found and addressed.

But then, if this work stops, build and test times will slowly drift back towards what they were before the Bazel migration.

The reason for this is that - as the underlying code base changes and grows - it is common for inhermeticities to slowly creep back into the build process.

To combat this, it is important to maintain historic metrics of overall and per-target build and test times.

This helps spot trends and degradation early on, before it’s too hard to fix.

Improving the build system today and making it faster is not meant to buy budget to be complacent and let it deteriorate again in the future. Instead, the goal should be to maintain or even further improve upon the faster build that has been achieved.

118 of 130

Build system maintenance

Build system champions!

People in each product team that have some interest in build system work
Support them from the build system team!
These are your

Expert users
Early adopters
Communication channels to the wider team

119 of 130

3.3�Extra topics

120 of 130

.bazelrc: Storing common command line options

Bazel has too many command line options.
Often you want a set of options to apply to all (or most) builds.
The .bazelrc file allows you to store common options in a file, e.g.:
Format:

Lines where a bazel command is�followed by options that are applied�to that command.
Commands like:

build, query, test

“Special” command:

common: option is applied to� all supported commands.

# .bazelrc

# Remote cache/execution configuration

common --remote_cache=

common --remote_executor=grpcs://my_RE_cluster_URL

# Don't let env vars like $PATH sneak into the build

build --incompatible_strict_action_env

# When running tests, only build necessary test targets

test --build_tests_only

121 of 130

.bazelrc: Configuring named sets of related options

What about applying a set of related options to some (but not all) builds?
Format: <bazel command>:<config name> <options...>
Example usage:

bazel build --config=address_sanitizer <target>

Option precedence:

Multiple matching lines are combined�as if they were listed in the same�order on the command line.
Lines for a more specific command�take precedence over general lines.
Options on the actual command line�take precedence over .bazelrc.

# .bazelrc

# Enable with --config=leak_sanitizer

build:leak_sanitizer --copt="-fsanitize=leak"

build:leak_sanitizer --linkopt="-fsanitize=leak"

build:leak_sanitizer --copt="-U_FORTIFY_SOURCE"

# Enable with --config=address_sanitizer

build:address_sanitizer --copt="-fsanitize=address"

build:address_sanitizer --linkopt="-fsanitize=address"

build:address_sanitizer --copt="-U_FORTIFY_SOURCE"

122 of 130

.bazelrc: Spread across multiple files

Multiple .bazelrc locations:

(options in later files can override options in earlier files)
The system RC file (e.g. /etc/bazel.bazelrc)
The workspace RC file (.bazelrc in your workspace directory)

Most commonly used
The only one that is typically tracked by version control

The home RC file ($HOME/.bazelrc)
The user-specified RC file: Specified by --bazelrc=<file> on the command line

Imports:

Allows spreading option configuration�across multiple files
import: Target file must exist
try-import: Target file is optional

# .bazelrc

# Import common

import %workspace%/remote.bazelrc

# Import user-specific options (if any)

try-import %workspace%/user.bazelrc

123 of 130

Bzlmod, aka. Bazel modules

Modernization of the concept of external repositories in Bazel
MODULE.bazel takes over from the WORKSPACE file
Improves Bazel’s handling of repositories with transitive dependencies

Including how to handle conflicting version requirements

Bazel Central Registry: https://registry.bazel.build

Easily consume common third-party Bazel repositories (now called Bazel modules)
Similar to e.g. the Python Package Index (PyPI), but for Bazel
Can set up your own Registry to control consumption of 3rd-party repository dependencies

Retains some compatibility with WORKSPACE files

Beware: the filesystem structure underneath $bazel_out/external/... is changed

Bzlmod migration guide: https://bazel.build/external/migration

124 of 130

Controlling provenance of 3rd-party dependencies

The Internet is, in general, unreliable. Everything that needs to be downloaded as part of the build needs to come from reliable sources.
Solutions:

Mirroring 3rd-party deps on your own server (Artifactory) instead of pulling directly from the Internet. Downside: the impossibility of handling transitive dependencies this way without extensive patching.
Using --distdir to search for archives before accessing the network. Downside: means that all dependencies have to be downloaded as part of creation of the devcontainer, which is not in line with the idea of granular upgrades.
The best option: using --experimental_downloader_config.

125 of 130

Downloader config that limits access only to Artifactory

allow mycompany.com

block *

rewrite (.*)(api.github.com/.*) https://mycompany.com/artifactory/$2

126 of 130

Exercise #5: Consume ASIO hermetically

Replace non-hermetic dependency on ASIO with Bazel-provided version
Remove the pre-installed ASIO by running: sudo apt remove libasio-dev
Observe build failure: bazel clean && bazel build //... --disk_cache=
How to fix? (similar to exercise #3)

Fetch ASIO with http_archive() to create a new Bazel repo: @asio

URL: https://downloads.sourceforge.net/asio/asio-1.30.2.tar.gz

Figure out how to build ASIO with Bazel

Header-only library, can be wrapped directly with cc_library()
Look at cc_library() docs to find appropriate attributes to pass:�https://bazel.build/reference/be/c-cpp#cc_library

Add a proper dependency from the project’s code onto @asio//:asio

127 of 130

3.4

Wrap up!

128 of 130

Resources

Bazel documentation: https://bazel.build
Readme/documentation of popular rule sets, e.g. https://github.com/bazel-contrib
Blog posts from reputable players:

Aspect: https://blog.aspect.build
BuildBuddy: https://www.buildbuddy.io/blog
EngFlow: https://blog.engflow.com
Tweag: https://www.tweag.io/blog

Bazel slack: http://bazelbuild.slack.com

129 of 130

Thanks for listening to us!

We hope you have:

learned something about build systems in general,
picked up some Bazel concepts,
understood why simply transcribing build rules into Bazel is not enough,
built some intuition around what makes a build hermetic and reproducible,
seen how this enables Bazel to benefit from caching and remote execution,
appreciated the big picture of improving and maintaining a build system, and
found some resources to keep exploring!

130 of 130

Questions?