1 of 130

Bazel workshop

2 of 130

Welcome!

Who are we?

  • Mark Karpov and Johan Herland, Principal Consultants
  • Scalable Builds Group at Tweag
  • Tweag is part of Modus Create

3 of 130

Plan

  • 6-hour sessions over 2 days
  • Each session is 3h + 3h
  • Presentation and exercises

A whirlwind tour of:

  • Bazel concepts, usage and configuration
  • How to set yourself up for success with Bazel

4 of 130

Expectations

We expect that you:

  • Have some experience with using build systems
    • Bonus points if you have experience maintaining one
  • Are familiar with the development process
    • How to navigate a source code project
    • Edit text files
    • Running programs from a command line
  • No need to be an expert, but familiar with a language or two is good.
  • Exercises:
    • Based on a simple project written in C/C++ and shell scripts
    • But we’re not diving into the code itself, only the surrounding build instructions
    • Run inside a devcontainer (in VS Code, CDW, or any machine that can run linux/amd64 containers)
  • Ask questions! We are here for you, not the other way around.

5 of 130

Outline

  1. Intro to Bazel
    1. Intro and basic concepts�✦ Exercise #1: The Cookie Machine
    2. Bazel concepts deep dive�✦ Exercise #2: Write a macro�
  2. Intro to Bazel, continued
    • Toolchains, platforms, selects
    • Sandbox, Remote cache + execution
    • Bazel rule sets�✦ Exercise #3: Import and build a� 3rd-party dependency�✦ Exercise #4: Hermetic toolchain

  • A bigger picture
    • Bazel and CI
    • Organizational points
    • Extra topics�✦ Exercise #5: Consume asio hermetically
    • Wrap up

6 of 130

1.1�Intro and basic concepts

7 of 130

When to use Bazel?

  • { Fast, Correct } — Choose two
  • Polyglot
  • Cross-platform
  • Remote execution
  • Extensible

8 of 130

When not to use Bazel?

  • Bazel is complex!
  • Probably unnecessarily complex, if you have:
    • Only one language
    • Already a working build system
    • No need to scale
      • Small project, small team
    • Pure Windows projects

9 of 130

Migration costs

  • Small: Rewrite old build rules into Bazel
  • Big: Setting up remote caching and execution infrastructure for Bazel
  • Bigger: Make build hermetic + reproducible ⇒ enable scaling
    • Without this, the benefit from remote cache + execution is severely limited
  • Biggest?: Help developers to become familiar and productive with Bazel

10 of 130

Modeling the build

11 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a directed acyclic graph (DAG)

12 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a directed acyclic graph (DAG)
  • One possible model:
    • Nodes: Artifacts, typically files
    • Edges: Actions, commands to produce artifacts

Source file

Executable

Compile action

main.c

main

gcc main.c -o main

Dockerfile

Docker image

docker build -t … .

13 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a directed acyclic graph (DAG)
  • More common model in build systems: Target graph or dependency graph
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents

Source file

Executable

(Compile action)

depends on

Source file

Executable

Compile action

main.c

main

gcc main.c -o main

14 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a target graph (DAG)
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents
  • When to rebuild? When an input has changed.
    • Must know all inputs!

main.h

main.c

main

gcc main.c -o main

#include “main.h”

main.h

15 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a target graph (DAG)
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents
  • When to rebuild? When an input has changed.
    • Must know all inputs!
  • Pure and hermetic:
    • Pure function: same input ⇒ same output
    • “Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted

main.c

main

gcc main.c -o main

main.h

#include “main.h”

GCC toolchain

16 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a target graph (DAG)
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents
  • When to rebuild? When an input has changed.
    • Must know all inputs!
  • Pure and hermetic:
    • Pure function: same input ⇒ same output
    • “Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted
  • Transitivity: Rebuilds spread across the target graph
    • From a changed input file to all output that ultimately depend on it.
    • Anything that does not change can be reused

17 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a target graph (DAG)
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents
  • When to rebuild? When an input has changed.
    • Must know all inputs!
  • Pure and hermetic:
    • Pure function: same input ⇒ same output
    • “Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted
  • Transitivity: Rebuilds spread across the target graph
  • Reproducible: Same inputs, same rules ⇒ Same outputs!
    • For each target in the graph, but also for the target graph as a whole
    • No matter if we build today or tomorrow
    • No matter if we build on my machine or your machine, or in the cloud

18 of 130

Build system recap

  • Collection of commands to produce a certain output
  • Can be modeled as a target graph (DAG)
    • Visualize dependencies
    • Follow arrows from ultimate targets back to their initial constituents
  • When to rebuild? When an input has changed.
    • Must know all inputs!
  • Pure and hermetic:
    • Pure function: same input ⇒ same output
    • “Hermetically sealed”: All inputs captured: nothing that could affect the output is omitted
  • Transitivity: Rebuilds spread across the target graph
  • Reproducible: Same inputs, same rules ⇒ Same outputs!
    • For each target in the graph, but also for the target graph as a whole
    • No matter if we build today or tomorrow
    • No matter if we build or my machine or your machine, or in the cloud
  • All of this enables:
    • Caching
    • Remote execution

⇒ Speed!

19 of 130

What breaks hermeticity + reproducibility?

Including any of this in the build output:

  • Timestamps (e.g. __TIME__, __DATE__, timestamps in tar/zip files)
  • Git revisions, current branch, etc.
  • Details from the system where the build runs
    • Files from outside the build tree
    • Absolute file paths, Usernames, Machine details
  • Non-deterministic output formats
    • Arbitrary ordering of data, reordered sections
    • Random data inclusions, non-zeroed padding
  • Fetching unchecked data from the network
    • OK if verified with a known checksum

20 of 130

What happens when we break hermeticity?

How to keep a Bazel project hermetic:

https://www.tweag.io/blog/2022-09-15-hermetic-bazel/

  • Works on my machine, but not on yours
  • Works locally, but not in remote execution
  • Slower builds, some parts “always” need to be rebuilt
  • Build results are always different ⇒ build cache is never reused

21 of 130

First steps to using Bazel

22 of 130

Bazel concepts

  • Targets: Something that can be built
    • Defined in BUILD or BUILD.bazel files throughout the source tree
  • Rules: The instructions for how to build it
    • Comes with Bazel,
    • or with third-party rulesets,
    • or write your own in *.bzl files

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

Target

Rule

To build: bazel build //:main

main.c

main

gcc main.c -o main

23 of 130

Bazel concepts

  • Targets: Something that can be built
    • Defined in BUILD files throughout the source tree
  • Rules: The instructions for how to build it
    • Comes with Bazel,
    • or with third-party rulesets,
    • or write your own in *.bzl files
  • Labels: How you refer to targets

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

To build: bazel build //:main

Label

Root of current repository

BUILD.bazel file at top level

main target

main.c

main

gcc main.c -o main

24 of 130

Bazel concepts

  • Targets: Something that can be built
    • Defined in BUILD files throughout the source tree
  • Rules: The instructions for how to build it
    • Comes with Bazel,
    • or with third-party rulesets,
    • or write your own in *.bzl files
  • Labels: How you refer to targets
  • Bazel repositories:
    • Main repository: WORKSPACE file
    • Other repositories:
      • defined by repo rules in the WORKSPACE file
      • 3rd-party rulesets
      • 3rd-party code dependencies

# WORKSPACE

workspace(name = "cookie_machine")

load("@bazel_tools//…/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://…/rules_cc-0.0.10.tar.gz"],

sha256 = "65b…e49",

strip_prefix = "rules_cc-0.0.10",

)

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

Workspace declaration

25 of 130

Bazel concepts

  • Targets: Something that can be built
    • Defined in BUILD files throughout the source tree
  • Rules: The instructions for how to build it
    • Comes with Bazel,
    • or with third-party rulesets,
    • or write your own in *.bzl files
  • Labels: How you refer to targets
  • Bazel repositories:
    • Main repository: WORKSPACE file
    • Other repositories:
      • defined by repo rules in the WORKSPACE file
      • 3rd-party rulesets
      • 3rd-party code dependencies
  • Starlark language:
    • Inspired by Python, but simpler
    • Declarations in BUILD(.bazel) and WORKSPACE files
    • Macros and custom rule implementations in *.bzl files

# WORKSPACE

workspace(name = "cookie_machine")

load("@bazel_tools//…/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://…/rules_cc-0.0.10.tar.gz"],

sha256 = "65b…e49",

strip_prefix = "rules_cc-0.0.10",

)

# BUILD.bazel

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(

name = "main",

srcs = ["main.c"],

)

26 of 130

Bazel’s phases

  • Loading phase: Construct the target graph
    • Read the WORKSPACE and BUILD files
    • Construct the target graph (for the requested target)
    • Load and evaluate necessary extensions (aka. “External repositories”)
      • Run the repository rules that are referenced in the WORKSPACE file
  • Analysis phase: Invoke the necessary rules to augment the target graph into an action graph
    • What operations need to be performed, and in which order?
    • Rule implementations:
      • Look at configured targets
      • Construct corresponding actions that will realize those targets
  • Execution phase:
    • Execute all necessary actions to produce the required target outputs
    • Run tests (if requested)

Tip: For visualizing the complete target + actions graph that Bazel builds, check out Skyscope:�https://github.com/tweag/skyscope and https://www.tweag.io/blog/2023-05-04-announcing-skyscope/

27 of 130

Getting help

28 of 130

Questions?

29 of 130

Try it yourself!

  • Clone https://github.com/tweag/bazel-workshop-2024
  • Enter the devcontainer (with or without VS Code)
    • See the README.md file in the repo for details
  • Start playing around!
    • Look at WORKSPACE And BUILD.bazel files
    • bazel --version
    • bazel help
    • bazel query //...
    • bazel query "deps(//c-client:c-client)"
    • bazel build //c-client
    • bazel clean
    • bazel build //c-client --subcommands
    • bazel build //...
    • bazel test //...

30 of 130

Exercise #1 The Cookie Machine

  • What is this project about?
    • Look at WORKSPACE
      • A project called cookie machine, consisting of�a C client, a C++ server, and an integration test.
      • How many repositories are there in total?
    • There are BUILD.bazel files in three subdirectories, look at them:
      • What targets do they define?
      • What rules are we using?
  • Can Bazel show us the targets?
    • bazel query //...
    • bazel query //cpp-server/...

31 of 130

Exercise #1 List target dependencies

  • What are the dependencies of a target?
    • bazel query "deps(//c-client:c-client)"
    • Lists all the dependencies that the cc_binary rule has set up for this target.
    • Includes the main.c source file, whose label is //c-client:main.c
    • But also lots of stuff from:
      • The rules_cc ruleset (labels that start with @rules_cc//...)
      • The C compiler toolchain (labels that start with @local_config_cc//...)
      • Internal Bazel rules (labels that start with @bazel_tools//...)

32 of 130

Exercise #1 Building with Bazel

  • Let’s build something!
    • bazel build //c-client
    • The //c-client label is shorthand for the full label: //c-client:c-client
    • Bazel builds the c-client executable and prints out where it is located
    • Note: Bazel never puts build outputs inside the source tree, but rather in a separate Bazel output directory accessible via a symlink.

33 of 130

Exercise #1 Subcommands

  • Let’s build again, with more detail
    • bazel clean
    • bazel build //c-client --subcommands
    • Can see the actual command lines that Bazel ends up executing.
    • Note how the single c-client cc_binary target was turned into two commands:
      • First compile the source file into an object file
      • Then link the final executable
    • If we rebuild without bazel clean, there are no commands to run.
      • Unchanged targets can always be reused!

34 of 130

Exercise #1 Building and testing everything

  • Let’s build everything!
    • bazel build //...
    • Builds three targets, but does not run the fourth test target
  • Let’s run the tests
    • bazel test //...
    • Finds and runs all test targets

35 of 130

Exercise #1 Reproducible and hermetic?

  • Let’s consider the hermeticity of this project:
    • In WORKSPACE we have seen how to import the rules_cc rule set.
    • But what about the underlying compiler? Where is it coming from?
    • The c-client target has a custom link option (-lcurl)
      • Also #include <curl/curl.h> inside main.c
      • Where is libcurl coming from?
    • The cpp-server/README.md file talks about a dependency on something called asio
      • Various #include <asio…> lines found under cpp-server/crow/crow/…
      • Where is asio coming from?
  • Currently these dependencies are in our devcontainer.
    • Is that good enough?

36 of 130

1.2�Bazel concepts deep dive

37 of 130

Labels

@cookie_machine//cpp-server:cpp-server

  • Repository name (can be omitted, defaults to the main repository). @ references the main repository, which will still work even from external repositories.
    • A repository name by itself is also a valid label, it expands like this @curl = @curl//:curl.
  • Package name (a directory with a BUILD file in it). A package name can be absolute or relative.
    • Absolute package names start with //.
    • Relative package names are relative to the current package and can only be used when no repository name is explicitly specified. Otherwise they are much like relative paths in a filesystem.
  • Target name (name that is passed to an instantiation of a rule via the name attribute).
    • Colon is optional when a target or file in the same package is referenced.
    • The convention is to omit it for files but add it for targets.
    • If package name is specified but no colon and target name are present, the target name is taken to be the same as the last segment in the package name, e.g.: @cookie_machine//cpp-server is the same target as at the top of the slide.

38 of 130

Loads

# BUILD.bazel / WORKSPACE / .bzl

load("@rules_cc//cc:defs.bzl", "cc_library", "cc_binary")

39 of 130

Target visibility

  • Packages are the unit of granularity.
  • Visibility specs:
    • "//visibility:public"
    • "//visibility:private"
    • "//foo/bar:__pkg__"
    • "//foo/bar:__subpackages__"
    • "//some_pkg:my_package_group"

# BUILD.bazel

cc_binary(

name = "c-client",

srcs = ["main.c",],

linkopts = ["-lcurl"],

visibility = ["//visibility:public"],

)

40 of 130

Target visibility: default visibility in a package

# BUILD.bazel

package(default_visibility = ["public"])

41 of 130

Rules

# BUILD.bazel

cc_library(

name = "crow",

hdrs = glob(

["crow/**/*.h", "crow/**/*.hpp"]),

strip_include_prefix = "crow/",

)

# cc_library.bzl

cc_library = rule(

implementation = _cc_library_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

"hdrs": attr.label_list(

allow_files = True,

),

"deps": attr.label_list(

providers = [CcInfo],

),

"linkstatic": attr.bool(default = False),

"includes": attr.string_list(),

"strip_include_prefix": attr.string(),

"copts": attr.string_list(),

...

},

...,

provides = [CcInfo],

)

42 of 130

Providers

# my_provider.bzl

CcInfo = provider(

doc = "A provider for cc rules.",

fields = {

"headers": "Headers of this target and all transitive dependencies",

"includes": "Directories to pass via -I when compiling",

"linker_inputs": "A collection of files to pass to the linker.",

...

},

)

43 of 130

Rules: a simple implementation function

# cc_library.bzl

def _cc_library_impl(ctx):

args = ctx.actions.args()

lib_path = ctx.attr.name + ".so"

transitive_headers = ctx.attr.hdrs

for dep in ctx.attr.deps:

transitive_headers += dep[CcInfo].headers

args.add_all(["-I" + include for include

in ctx.attr.includes])

args.add_all(ctx.attr.srcs)

...

lib = ctx.actions.declare_file(lib_path)

ctx.actions.run(

inputs = ctx.attr.srcs + MORE STUFF,

outputs = [lib],

arguments = args,

executable = my_compiler,

)

return

[CcInfo(headers = transitive_headers,

linker_inputs = [lib],

...)]

44 of 130

Rules: declare_directory

  • declare_directory can be used instead of declare_file, however it is not possible to access individual files this way and make any decisions based on what files where created.
  • Nevertheless, a declared directory can be passed around in a Provider, and it ultimately can be passed to args.add_all. This will traverse the directory and add all files from it.

45 of 130

Rule implementation and build actions

  • Rule implementation is a Starlark function that gets called at the analysis phase. It produces a collection of build actions.
  • The implementation function is not called during the execution phase.
  • Build actions are executed as necessary during the execution phase.

46 of 130

Rules: Q&A

  • Q: Can I access the internet from a rule implementation?
    • A: Not directly, the implementation function per se can never access the internet.
  • Q: Can I access the internet from build actions produced by the implementation function?
    • A: This is frowned upon and will not work by default. A repository rule is a better place to do fetching of sources. However, one can add a “requires-network” tag to allow build actions that are executed while building the target in question to access the network. It is the responsibility of the rule author to ensure reproducibility in that case. Bazel will not know when the resource you access changes.

47 of 130

Filegroups

# BUILD.bazel

filegroup(

name = "exported_testdata",

srcs = glob([

"testdata/*.dat",

"testdata/logs/**/*.log",

]),

)

cc_library(

name = "lib_b",

...,

data = [

"//my_package:exported_testdata",

],

visibility = ["//visibility:public"],

)

48 of 130

Filegroups: implementation + DefaultInfo

# filegroup.bzl

filegroup = rule(

implementation = _filegroup_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

},

provides = [DefaultInfo],

)

def _filegroup_impl(ctx):

return [DefaultInfo(files = ctx.attr.srcs)]

49 of 130

Genrule

# BUILD.bazel

genrule(

name = "concat_all_files",

srcs = [

"//some:files", # a filegroup with multiple files in it ==> $(locations)

"//other:gen", # a genrule with a single output ==> $(location)

],

outs = ["concatenated.txt"],

cmd = "cat $(locations //some:files) $(location //other:gen) > $@",

)

50 of 130

Macros: templating around rule invocations

# my_cc_library.bzl

def my_cc_library(**attrs):

if "enable_foo" in attrs.keys():

attrs["copts"] = attrs.get("copts", default = [])

+ ["-std=c++14", "-Wstack-usage=10000"]

attrs.pop("enable_foo", None)

native.cc_library(**attrs)

51 of 130

Macros: templating around rule invocations

# my_app.bzl

def my_app(name, srcs, deps):

app_name = name + "_app"

native.cc_binary(

name = app_name,

srcs = srcs,

deps = deps,

)

native.genrule(

name = name + "_config",

srcs = [":" + app_name],

outs = [name + ".json"],

cmd = "$(location //:make_config) $(location {}) > $@".format(app_name),

tools = ["//:make_config"],

)

52 of 130

Repository rules

# http_archive.bzl

http_archive = repository_rule(

implementation = _impl,

attrs = {

"urls": attr.string_list(mandatory=True),

"sha256": attr.string(mandatory=True),

"strip_prefix": attr.string(),

}

)

53 of 130

Repository rules

# WORKSPACE

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(

name = "rules_cc",

urls = ["https://github.com/bazelbuild/rules_cc/releases/download/0.0.10/rules_cc-0.0.10.tar.gz"],

sha256 = "65b67b81c6da378f136cc7e7e14ee08d5b9375973427eceb8c773a4f69fa7e49",

strip_prefix = "rules_cc-0.0.10",

)

54 of 130

Repository rules

# http_archive.bzl

def _impl(repository_ctx):

url = repository_ctx.attr.urls[0]

sha256 = repository_ctx.attr.sha256

repository_ctx.download_and_extract(url, sha256=sha256)

There are a few ways to trigger a fetch:

$ bazel sync --only=rules_cc

$ bazel fetch @rules_cc//:*

To check what an external repository looks like:

$ ls -la $(bazel info output_base)/external/rules_cc

55 of 130

Repository rules: the repository_ctx object

56 of 130

Common repository rules

new_* rules typically allow for creation of BUILD files, although http_archive does not need a new_* version and in recent versions of Bazel git_repository and new_git_repository are essentially the same thing.

WORKSPACE files in external repositories have no significance.

57 of 130

Repository rules

Prefer http_archive to git_repository. The reasons are:

  • Git repository rules depend on system git(1) whereas the HTTP downloader is built into Bazel and has no system dependencies.
  • http_archive supports a list of urls as mirrors, and git_repository supports only a single remote.
  • http_archive works with the repository cache, but not git_repository. See #5116 for more information.

58 of 130

Questions?

59 of 130

Exercise #2: Write a macro

  • Write a macro that wraps either native.cc_library or native.cc_binary rule and adds “-std=c++17” to their copt attribute. Prevent users from passing their own copts.
  • Make the code base use that macro instead of the default cc rules.
  • Hint: you can make the loading phase fail by using the “fail” function.
  • Put the definition of the macro in bazel/defs.bzl.
  • See the branch solution2 when you are ready to verify your solution.

60 of 130

2.1�Toolchains, platforms, selects

61 of 130

Toolchains & platforms

Platforms

Toolchains

62 of 130

Toolchains & platforms

Platforms

Toolchains

Constraints

63 of 130

Constraints

# BUILD.bazel

constraint_setting(

name = "compiler",

default_constraint_value = ":gcc",

)

constraint_value(

name = "qcc",

constraint_setting = ":compiler",

)

constraint_value(

name = "gcc",

constraint_setting = ":compiler",

)

64 of 130

Platforms

# BUILD.bazel

platform(

name = "x86_64_linux",

constraint_values = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

65 of 130

Platforms

# BUILD.bazel

platform(

name = "x86_64_linux",

constraint_values = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

$ bazel build --platforms=//:x86_64_linux //...

66 of 130

Toolchains: toolchain type

# bar_tools/BUILD.bazel

# By convention, toolchain_type targets are named "toolchain_type" and

# distinguished by their package path. So the full path for this would be

# //bar_tools:toolchain_type.

toolchain_type(name = "toolchain_type")

67 of 130

Toolchain definition

# BUILD.bazel

toolchain(

name = "qnx_x86_64_toolchain",

exec_compatible_with = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

target_compatible_with = [

"@platforms//os:qnx",

"@platforms//cpu:x86_64",

"//platform:gcc",

],

toolchain = "@qnx_sdp//:x86_64-pc-nto-qnx7.1.0",

toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",

)

68 of 130

Toolchain registration

# WORKSPACE

register_toolchains(

"//bazel/toolchains/qnx_compiler:qnx_x86_64_toolchain",

"//bazel/toolchains/qnx_compiler:qcc_qnx_x86_64_toolchain",

"//bazel/toolchains/qnx_compiler:qnx_aarch64_toolchain",

"//bazel/toolchains/qnx_compiler:qcc_qnx_aarch64_toolchain",

)

69 of 130

Toolchain resolution

  • During toolchain resolution, constraint values that are specified by the target platforms are compared against constraints specified by registered toolchains in their “target_compatible_with” attribute.
  • exec_compatible_with” of registered toolchains is compared against the detected host platform.
  • The first toolchain of a given type that matches all constraints is selected.
  • You can use --toolchain_resolution_debug=<regexp> to debug toolchain resolution. The regexp is checked against toolchain types and specific targets.

70 of 130

How rule definitions specify toolchain type

# cc_library.bzl

cc_library = rule(

implementation = _cc_library_impl,

attrs = {

"srcs": attr.label_list(

allow_files = True,

),

"deps": attr.label_list(

providers = [CcInfo],

),

"copts": attr.string_list(),

...

},

...,

toolchains = ["@bazel_tools//tools/cpp:toolchain_type"],

provides = [CcInfo],

)

def _cc_library_impl(ctx):

...

cc_toolchain = ctx.toolchains["@bazel_tools//tools/cpp:toolchain_type"]

...

71 of 130

Configurable attributes

# BUILD.bazel

load("@bazel_skylib//lib:selects.bzl", "selects")

selects.config_setting_group(

name = "x86_64_linux",

match_all = [

"@platforms//os:linux",

"@platforms//cpu:x86_64",

],

)

cc_library(

name = "clock_util",

srcs = select({

":x86_64_linux": [

"clock_util_linux.cpp",

],

"//conditions:default": [

"clock_util_other.cpp",

],

}) + [

"clock_util.hpp",

],

visibility = ["//visibility:public"],

)

72 of 130

Questions?

73 of 130

2.2�Sandbox,�Remote cache + execution

74 of 130

Sandboxing

75 of 130

The Bazel sandbox

How can Bazel help enforce hermeticity in build actions?

  • Limit access to files that are not declared inputs?
  • Prevent writing to files that are not declared outputs?
  • Limit access to other processes running on the same machine?
  • Limit access to details about the current user?
  • Limit access to network?

76 of 130

The Bazel sandbox

  • Enabled by default, based on what the underlying system supports
    • Different sandbox implementations available
  • What does it do?
    • Prepare a sandbox directory with symlinks mirroring the source tree
    • Only declared inputs are present inside the sandbox
    • Only declared outputs are copied back out of the sandbox and preserved
    • Might also limit access to network, other processes, etc. depending on platform support
  • What does it not do?
    • Does not limit access to system tools, system headers, or system libraries!
      • Will not help you catch implicit dependencies on undeclared tools/libraries
      • (although some rule sets like rules_cc provide some additional protections)
    • Does not apply to repository rules!
      • Runs in an earlier phase, and their job is to prepare the build environment

77 of 130

processwrapper-sandbox

Works on any POSIX system, does not require “advanced” features

  • Limit access to files that are not declared inputs? ✅
  • Only declared outputs are preserved? ✅
  • Limit access to other processes running on the same machine? ❌
  • Limit access to details about the current user? ❌
  • Limit access to network? ❌�
  • Works inside unprivileged containers

78 of 130

linux-sandbox

Uses Linux Namespaces to isolate the build action from the underlying system:

  • Limit access to files that are not declared inputs? ✅
  • Only declared outputs are preserved? ✅
  • Limit access to other processes running on the same machine? ✅
  • Limit access to details about the current user? ✅
  • Limit access to network? ✅�
  • Requires Linux with namespace features
    • Container must be privileged in order to allow these features inside

79 of 130

darwin-sandbox

Uses Apple’s sandbox-exec to achieve roughly the same as linux-sandbox:

  • Limit access to files that are not declared inputs? ✅
  • Only declared outputs are preserved? ✅
  • Limit access to other processes running on the same machine? ✅
  • Limit access to details about the current user? ✅
  • Limit access to network? ✅

80 of 130

Windows?

Sorry, no official sandboxing support available

81 of 130

local (no sandbox)

Executes the action from the root of your workspace.

  • Limit access to files that are not declared inputs? ❌
  • Only declared outputs are preserved? ❌
  • Limit access to other processes running on the same machine? ❌
  • Limit access to details about the current user? ❌
  • Limit access to network? ❌�

Useful for debugging hermeticity:

  • fails in the sandbox, but works with --spawn_strategy=local

82 of 130

The Bazel sandbox

  • Controlled with --spawn-strategy or --strategy flags
  • Many other flags control various details of the sandboxing:
    • --strategy_regexp (for more fine-grained control of sandboxing per action)
    • --[no]experimental_use_hermetic_linux_sandbox
    • --experimental_sandbox_limits
    • --[no]incompatible_sandbox_hermetic_tmp
    • --[no]reuse_sandbox_directories
    • --sandbox_add_mount_pair
    • --sandbox_block_path
    • --[no]sandbox_default_allow_network
    • --[no]sandbox_fake_hostname
    • --[no]sandbox_fake_username
    • --sandbox_writable_path
    • --[no]sandbox_debug
  • Can also be controlled via tags specific to certain targets/actions

83 of 130

Remote cache

84 of 130

Bazel’s (local) action cache

  • Bazel breaks down a build into discrete actions
  • Each action knows its
    • input files
    • command line
    • environment variables
    • expected output paths
  • Each actions generates
    • actual output files

actionKey

Hash function

digestKey

Hash function

bazel dump --action_cache

85 of 130

Bazel’s remote cache

What if we could share this cache with our colleagues (and with CI)?

  • Copy the action cache (actionKey ⇒ digestKey mapping) to a remote server
  • What about the output files themselves?
    • Content addressable Store (CAS): digestKey ⇒ output files
  • How to use?
    • Compute the actionKey
    • If actionKey is in the local cache: Use the local output files directly
    • Else if actionKey is in the remote cache: Download output files from remote CAS
    • Else:
      • Execute the action locally
      • Upload the result into the remote cache

86 of 130

Bazel’s remote cache

  • We’re still executing all actions locally
    • The remote server only stores the cache
  • The real world is always more complicated:
  • Only works if your build is already hermetic and reproducible!
    • Different actionKey ⇒ Cache miss ⇒ build locally
    • Overhead means this can be worse than having no remote cache!
  • Debugging remote cache involves:
    • Looking at cache hit rates
    • Figuring out why actionKeys differ when they shouldn’t ⇐ different build environments
      • e.g. leaking environment variables into an action
    • Figuring out why actionKeys are equal they should differ ⇐ uncaptured build inputs

87 of 130

Setting up a remote cache

  • Bazel cache protocol, either over HTTP, gRPC, or UNIX sockets
  • Consists of two parts:
    • The action cache (actionKey ⇒ digestKey)
    • The CAS (digestKey ⇒ build output)
  • Often setup together with remote execution (see next section)
    • Cache-only services do exist, e.g. based on nginx, bazel-remote or GCS
    • More info: https://bazel.build/remote/caching#cache-backend
  • Relevant Bazel flags:
    • --remote_cache=<URL>
    • --remote_upload_local_results=false
    • Per-target adjustments: tags = ["no-remote-cache"]
    • More details: https://www.tweag.io/blog/2020-04-09-bazel-remote-cache/

88 of 130

A “local” remote cache?

  • Yes, controlled by: --disk_cache=<PATH>
  • Points to an “external” cache on your local disk
  • Lives outside Bazel’s output_basenot the same as the local cache
  • Used by Bazel after the local action cache, but before the remote cache
  • Survives bazel clean and even bazel clean --expunge
  • Useful when switching between multiple build configurations
    • These can often thrash the local action cache + output_path
  • Sometimes confusing when you think you’re building from scratch
    • e.g. after bazel clean
  • Disable it by passing an empty path: --disk_cache=

89 of 130

Remote execution

90 of 130

Remote execution

  • Relies on remote caching
  • Also relies on sandboxing, in spirit:
    • If a build action fails in a local sandbox, it probably won’t work in RE
    • If a build action succeeds in a local sandbox, it might also work in RE
    • A working local build is no guarantee that you have resolved all hermeticity issues for RE
  • The idea:
    • If an action is not already cached (locally or remotely)
    • Upload the action’s inputs into the CAS
    • Send a message to execute the action remotely:
      • The remote executor downloads action inputs from the cache, executes the action and uploads the build output into the remote cache
    • Download the action outputs from the remote cache

91 of 130

Remote execution

  • Remote Execution API:
    • gRPC protocol
    • Used by Bazel, but also other build systems: Buck2, Pants, etc.
  • Flags to enable remote execution:
    • --remote_executor=<URL> points to the RE cluster
      • (also sets --remote_cache, if unset)
    • --remote_instance_name can be used to separate workloads inside the RE cluster�
  • Good overview of remote caching and remote execution:�https://www.buildbuddy.io/blog/bazels-remote-caching-and-remote-execution-explained/

92 of 130

Setting up remote execution

93 of 130

Build without the Bytes

By default (before Bazel 7), for each build action that needs to be executed remotely:

  • Upload the action inputs (if necessary)
  • Download the action outputs

This is expensive when all you want is the final build result, i.e. the output of the last build action

Enter “Build without the Bytes”:

  • Limit which action outputs are downloaded to the local machine:
  • Controlled with --remote_download_outputs=<all, minimal or toplevel>
    • all: Download all actions outputs (default before Bazel 7)
    • minimal: Only download what is required by local build actions (e.g. to run tests)
    • toplevel: Download outputs associated with top level targets (default since Bazel 7)
  • More details: https://blog.bazel.build/2023/10/06/bwob-in-bazel-7.html

94 of 130

2.3�Bazel rule sets

95 of 130

Bazel rule sets

  • Why? Make Bazel extensible to many languages and ecosystems
  • What? Bazel repository with rule implementations written in Starlark
  • Examples of rule sets:
    • We’ve already seen rules_cc for building C/C++ code
    • Many other languages: rules_java, rules_go, rules_py, rules_rust, rules_haskell, etc.
    • More than just languages:
      • Google Protocol Buffers: rules_proto
      • Building tar/zip/deb/rpm packages: rules_pkg
      • Building container images: rules_oci
      • General Starlark utilities: bazel-skylib
    • Running other build systems inside Bazel: rules_foreign_cc

96 of 130

Example: rules_go and rules_rust vs. rules_cc

  • Most language rule sets provide similar rules:
    • In BUILD files:
      • cc_binary() / go_binary() / rust_binary()
      • cc_library() / go_library() / rust_library()
      • cc_test() / go_test() / rust_test()
      • as well as more language-specific rules…
        • E.g. cc_import() to import and use pre-compiled dependencies
    • In the WORKSPACE file:
      • register_toolchains() / go_register_toolchains() / rust_register_toolchains()

97 of 130

Example: rules_foreign_cc

  • Supports running other build systems inside Bazel:
    • CMake
    • Meson
    • Ninja
    • GNU Make (including the configure && make && make install pattern)�
  • Why?
    • Build a non-Bazel 3rd-party dependency as part of a bigger Bazel project

98 of 130

Example: rules_foreign_cc

  • Using rules_foreign_cc:
    • Load rules_foreign_cc itself inside WORKSPACE
    • Fetch 3rd-party source code as a Bazel repository
    • Provide a BUILD file with instructions how to build the 3rd-party code
      • Use rules from rules_foreign_cc to invoke the foreign build process
        • E.g. cmake() or configure_make()
      • Declare the overall inputs/output of the foreign build process
    • Depend on these targets from your own targets�
  • Drawbacks: A “foreign” build is “opaque” to Bazel:
    • Run as a single action from Bazel’s POV
    • Hard to set up fine-grained dependencies on parts of the foreign output.
    • Limits parallelism against fine-grained dependencies

99 of 130

Questions?

100 of 130

Exercise #3: Import and build a 3rd-party dependency

  • Recognize the undeclared c-client dependency on curl:
    • #include <curl/curl.h> in the c-client/main.c source code
    • Setting linkopts to link against curl
    • Verify libcurl dependency with: ldd bazel-bin/c-client/c-client
    • But Bazel does not know about curl:
      • Where is curl installed?
      • Which version?
  • Steps:
    • sudo apt remove libcurl4-openssl-dev
    • bazel clean && bazel build //c-client --disk_cache=
      • Observe failure due to curl/curl.h: No such file or directory
    • How can we fix this?

101 of 130

Exercise #3: Import and build a 3rd-party dependency

  • How can we fix this?
    1. Retrieve a known version of the curl library source code
    2. Build a working curl library (curl’s build system is CMake)
    3. Link c-client against this curl library

# WORKSPACE

http_archive(

name = "curl",

...

)

# BUILD

cmake(

name = "curl",

...

)

@curl

//c-client:main.c

//c-client:c-client

@curl//:curl

102 of 130

Exercise #3: Import and build a 3rd-party dependency

  • Oops, transitive dependency! curl depends on openssl
    • Must first get openssl sources
    • Then build openssl (using Configure + make)
    • Then link curl against this openssl

# WORKSPACE

http_archive(

name = "openssl",

...

)

# BUILD

configure_make(

name = "openssl",

...

)

@curl

//c-client:main.c

//c-client:c-client

@curl//:curl

@openssl//:openssl

103 of 130

Exercise #3: Import and build a 3rd-party dependency

  • git checkout exercise3
  • Open the WORKSPACE file
    • We’ve already added rules_foreign_cc and openssl
    • How to get the curl source code?
      • Use this version: https://curl.se/download/curl-7.77.0.tar.gz
      • Download http_archive: bazel sync --only=curl
    • How to build curl?
      • Look to openssl for inspiration
      • cmake() docs: https://bazel-contrib.github.io/rules_foreign_cc/cmake.html
      • See WORKSPACE for more hints…
      • Build curl stand-alone: bazel build @curl
        • @curl is shorthand for @curl//:curl

104 of 130

Exercise #3: Import and build a 3rd-party dependency

  • Adjust c-client to properly depend on @curl
    • No more linkopts, use deps instead
    • Hint: One line change
    • bazel build //c-client
    • Verify result with:
      • bazel query "deps(//c-client:c-client)"
      • ldd bazel-bin/c-client/c-client

105 of 130

Exercise #4: Hermetic toolchain

  • Motivation: Replace non-hermetic use of build machine’s /usr/bin/gcc with a toolchain that is fully controlled by Bazel.
  • Introduce a hermetic C/C++ toolchain by leveraging https://github.com/bazel-contrib/toolchains_llvm.
    • Follow instructions from https://github.com/bazel-contrib/toolchains_llvm/releases/tag/v1.2.0.
    • Be sure to follow the snippet for WORKSPACE, not for bzlmod.
    • Be sure to use the LLVM toolchain version 17.0.6.
    • If you encounter error messages about passing -std=c++17 to C program, feel free to switch from the macro developed in the exercise 2 back to a normal cc_binary rule.
  • Start from branch exercise4
  • See the branch solution4 for solution.

106 of 130

Exercises recap

  • What have we achieved:
    • We are hermetic with respect to libcurl (and its openssl dependency).
    • We are hermetic with respect to the C/C++ toolchain we are using.
  • Are we now fully hermetic from Bazel’s point of view? No!
    • cpp-server still pulls asio from the devcontainer.
    • LLVM toolchain will still access system headers from the devcontainer.
    • Shell utilities such as cat are taken from the system and are not fixed by Bazel.
  • Approaching hermeticity is a process
    • Balancing act between what we allow the system to provide, and what we explicitly get via Bazel.

107 of 130

3.1�Bazel and CI

108 of 130

Repository cache

  • Repository cache is the cache that is used to avoid downloading files from the Internet. It is a content-addressable store (CAS) indexed by SHA256 checksums of downloads.
  • It has nothing to do with remote cache. Remote cache does not cache downloads per se since validation of downloads happens before remote cache gets involved.
  • The same repository cache can be shared across multiple Bazel workspaces.
  • Repository cache is a reason to prefer persistent CI workers as opposed to ephemeral workers that are destroyed after each run. Alternatively, CI could arrange to persist repository cache directory specifically and make sure it is populated on the CI worker whenever it is started.

109 of 130

When is a repository rule re-fetched?

In contrast to regular targets, repos are not necessarily re-fetched when something changes that would cause the repo to be different. This is because there are things that Bazel either cannot detect changes to or it would cause too much overhead on every build (for example, things that are fetched from the network). Therefore, repos are re-fetched only if one of the following things changes:

  • The attributes passed to the repo rule invocation.
  • The Starlark code comprising the implementation of the repo rule.
  • The value of any environment variable passed to repository_ctx's getenv() method
  • The existence, contents, and type of any paths being watched in the implementation function of the repo rule.
  • When bazel fetch --force is executed.

110 of 130

Bazel’s client/server architecture

  • Bazel’s client/server organization allows for many optimizations that would not be otherwise possible, such as caching of BUILD files, dependency graphs, and other metadata from one build to the next.
  • When you run bazel, you're running the client. The client finds the server based on the output base (by default is determined by the path of the base workspace directory and your user id).
  • If the client cannot find a running server instance, it starts a new one.
  • Server shuts down after a period of inactivity, by default 3 hours.

111 of 130

Analysis cache

  • Analysis and loading phases can take a significant amount of time.
  • Analysis cache is part of the in-process state of the Bazel server, so losing the server loses the cache. But the cache is also invalidated very easily: for example, many bazel command line flags cause the cache to be discarded.
  • Tip: for a branch of CI choose a set of flags and stick to it. Do not change flags back and forth during a build.
  • Tip: if persistent CI workers are used, it is beneficial to dedicate a worker per platform so that analysis cache can be reused. For example, use the same worker to build for x86_64_linux, instead of selecting an arbitrary worker every time.
  • Tip: you can use --announce_rc in order to identify and debug analysis cache purges.

112 of 130

Overview of Bazel caches

In memory

On local disk

Remote

Skyframe

Analysis cache

Repository cache

Under output_base

Action cache

Output tree

Disk cache

Action cache

CAS

Remote cache

Action cache

CAS

113 of 130

Target determination

  • In a CI pipeline, build only what you need to build. Building //... may be too much.
  • If you use //... as a build wildcard on CI make sure to tag targets that should not be built by default with the “manual” tag.
  • Furthermore, there is special tooling such as https://github.com/bazel-contrib/target-determinator that can be used to determine a smaller subset of targets to (re)build based on changed as reported by revision control. This aims to save time on analysis.

114 of 130

The relationship between Bazel and CI

  • Bazel's features can be used to replace (parts of) complicated CI pipelines.
  • Bazel can help to make large portions of CI runnable on developer machines as opposed to having to test on CI only, which can help reduce feedback times.
  • With multi-platform remote execution setups it is possible to replace cross platform CI pipeline configurations with a Bazel configuration that can be used to initiate cross platform builds from a developer machine with the help of a remote execution cluster.

115 of 130

3.2�Organizational points

116 of 130

Build system maintenance

  • Code owners
  • DevOps
  • Infrastructure

Debugging and improving the build system issues requires:

  • Cross-functional knowledge and mandates
    • Fixes happen not only in the build system itself, but also in the code, and in the infra!
  • Working across teams
    • Communication and collaboration across team boundaries are put to the test!

Who maintains the build system?

Everybody!

117 of 130

Build system maintenance

The build system is never finished:

  • Initial acceleration after adopting Bazel
  • Don’t allow build + test times to slowly drift back up after the migration
  • Historic metrics of overall and per-target build + test times
  • Spot trends and degradations quickly
    • Easier to fix these now, than later
  • A faster build allows you to do more

118 of 130

Build system maintenance

Build system champions!

  • People in each product team that have some interest in build system work
  • Support them from the build system team!
  • These are your
    • Expert users
    • Early adopters
    • Communication channels to the wider team

119 of 130

3.3�Extra topics

120 of 130

.bazelrc: Storing common command line options

  • Bazel has too many command line options.
  • Often you want a set of options to apply to all (or most) builds.
  • The .bazelrc file allows you to store common options in a file, e.g.:
  • Format:
    • Lines where a bazel command is�followed by options that are applied�to that command.
    • Commands like:
      • build, query, test
    • “Special” command:
      • common: option is applied to� all supported commands.

# .bazelrc

# Remote cache/execution configuration

common --remote_cache=

common --remote_executor=grpcs://my_RE_cluster_URL

# Don't let env vars like $PATH sneak into the build

build --incompatible_strict_action_env

# When running tests, only build necessary test targets

test --build_tests_only

121 of 130

.bazelrc: Configuring named sets of related options

  • What about applying a set of related options to some (but not all) builds?
  • Format: <bazel command>:<config name> <options...>
  • Example usage:
    • bazel build --config=address_sanitizer <target>
  • Option precedence:
    • Multiple matching lines are combined�as if they were listed in the same�order on the command line.
    • Lines for a more specific command�take precedence over general lines.
    • Options on the actual command line�take precedence over .bazelrc.

# .bazelrc

# Enable with --config=leak_sanitizer

build:leak_sanitizer --copt="-fsanitize=leak"

build:leak_sanitizer --linkopt="-fsanitize=leak"

build:leak_sanitizer --copt="-U_FORTIFY_SOURCE"

# Enable with --config=address_sanitizer

build:address_sanitizer --copt="-fsanitize=address"

build:address_sanitizer --linkopt="-fsanitize=address"

build:address_sanitizer --copt="-U_FORTIFY_SOURCE"

122 of 130

.bazelrc: Spread across multiple files

  • Multiple .bazelrc locations:
    • (options in later files can override options in earlier files)
    • The system RC file (e.g. /etc/bazel.bazelrc)
    • The workspace RC file (.bazelrc in your workspace directory)
      • Most commonly used
      • The only one that is typically tracked by version control
    • The home RC file ($HOME/.bazelrc)
    • The user-specified RC file: Specified by --bazelrc=<file> on the command line
  • Imports:
    • Allows spreading option configuration�across multiple files
    • import: Target file must exist
    • try-import: Target file is optional

# .bazelrc

# Import common

import %workspace%/remote.bazelrc

# Import user-specific options (if any)

try-import %workspace%/user.bazelrc

123 of 130

Bzlmod, aka. Bazel modules

  • Modernization of the concept of external repositories in Bazel
  • MODULE.bazel takes over from the WORKSPACE file
  • Improves Bazel’s handling of repositories with transitive dependencies
    • Including how to handle conflicting version requirements
  • Bazel Central Registry: https://registry.bazel.build
    • Easily consume common third-party Bazel repositories (now called Bazel modules)
    • Similar to e.g. the Python Package Index (PyPI), but for Bazel
    • Can set up your own Registry to control consumption of 3rd-party repository dependencies
  • Retains some compatibility with WORKSPACE files
    • Beware: the filesystem structure underneath $bazel_out/external/... is changed
  • Bzlmod migration guide: https://bazel.build/external/migration

124 of 130

Controlling provenance of 3rd-party dependencies

  • The Internet is, in general, unreliable. Everything that needs to be downloaded as part of the build needs to come from reliable sources.
  • Solutions:
    • Mirroring 3rd-party deps on your own server (Artifactory) instead of pulling directly from the Internet. Downside: the impossibility of handling transitive dependencies this way without extensive patching.
    • Using --distdir to search for archives before accessing the network. Downside: means that all dependencies have to be downloaded as part of creation of the devcontainer, which is not in line with the idea of granular upgrades.
    • The best option: using --experimental_downloader_config.

125 of 130

Downloader config that limits access only to Artifactory

allow mycompany.com

block *

rewrite (.*)(api.github.com/.*) https://mycompany.com/artifactory/$2

126 of 130

Exercise #5: Consume ASIO hermetically

  • Replace non-hermetic dependency on ASIO with Bazel-provided version
  • Remove the pre-installed ASIO by running: sudo apt remove libasio-dev
  • Observe build failure: bazel clean && bazel build //... --disk_cache=
  • How to fix? (similar to exercise #3)
    • Fetch ASIO with http_archive() to create a new Bazel repo: @asio
    • Figure out how to build ASIO with Bazel
      • Header-only library, can be wrapped directly with cc_library()
      • Look at cc_library() docs to find appropriate attributes to pass:�https://bazel.build/reference/be/c-cpp#cc_library
    • Add a proper dependency from the project’s code onto @asio//:asio

127 of 130

3.4

Wrap up!

128 of 130

Resources

129 of 130

Thanks for listening to us!

We hope you have:

  • learned something about build systems in general,
  • picked up some Bazel concepts,
  • understood why simply transcribing build rules into Bazel is not enough,
  • built some intuition around what makes a build hermetic and reproducible,
  • seen how this enables Bazel to benefit from caching and remote execution,
  • appreciated the big picture of improving and maintaining a build system, and
  • found some resources to keep exploring!

130 of 130

Questions?