Bazel's dependency on the shell and stuff*
Tracking bug: https://github.com/bazelbuild/bazel/issues/4319
Status: under review
Last update: 2018/04/09
Reviewers: firstname.lastname@example.org [waiting], email@example.com [LGTM]
Guiding philosophy: https://lamport.azurewebsites.net/pubs/state-the-problem.pdf
*: including, but not limited to: coreutils, Perl
Bazel can't work without a shell and bintools. Bazel assumes that /bin/sh and /bin/bash always exist and that bintools are installed and on the PATH. This assumption is wrong on most platforms. Notably on Windows there's no default POSIX shell and the Bazel docs recommend installing MSYS2 (a Bash port with some coreutils).
This document shows how to formalize Bazel's dependency on the shell. Doing so allows Bazel to handle a missing shell and work without one. The same formalization is useful to express the dependency on the bintools.
 In this document by "bintools" we mean the coreutils and additional command line tools such as Perl.
 Along with some of the bintools.
Bazel uses the shell or bintools in the following places:
- TestStrategy executes a test setup script before every test action.
- SpawnAction.setShellCommand, action.run_shell, genrule.cmd, extra_action.cmd all create actions that run shell commands. They assume all tools that the command references are installed and available on the PATH.
- RunCommand assumes /bin/sh is available when it writes a helper script to run the binary.
- Skylark rules often use coreutils like "cp" and "ln" in their actions. The semantics of these tools may vary between platforms.
- CommandBuilder and CommandHelper assume /bin/sh or /bin/bash is installed under this path.
- ShellConfiguration.java hardcodes Bash paths, without checking it's there.
- java_binary requires /bin/bash for its launcher script.
- py_binary requires /usr/bin/env for its launcher script.
- The Bazel client (on Windows), C++ toolchain auto-configuration rules, and the Bazel server all implement some level of Bash-discovery, resulting in duplicate and possibly inconsistent code, redundant work, and inappropriately placed logic.
 Examples: pkg.bzl, java_rules_skylark.bzl, java_grpc_library.bzl, protobuf.bzl
The solution ensures that:
- Bazel is fully functional without a shell and without the stuff in the title of this document -- coreutils, Perl, whatnot.
- Bazel reports an error if a build action needs a shell or bintool but it's missing. As long as none of the actions require a shell or bintools Bazel can complete the build.
- All shell discovery logic is concentrated in one location.
- The shell discovery logic deals with platform differences, thus Bazel handles the shell uniformly on all platforms.
- The solution works with remote execution, even if the host and execution platforms differ.
In addition, a refined solution ensures that:
- Bazel-built binaries such as java_binary and py_binary are fully functional without /bin/bash and /usr/bin/env.
- Bazel formalizes the dependency on bintools, to encode differences in bintool semantics (e.g. BSD sed vs. GNU sed). Rules whose actions require those tools must express some sort of dependency on the bintools.
Instead of storing the shell's path in the ShellConfiguration, create a toolchain rule that stores the path. Every rule that needs a shell interpreter should require this toolchain, and ask it for the interpreter's path.
- Remove Bash- and shell-related code from ShellConfiguration, including the hardcoded Bash paths and the checking of the BAZEL_SH environment variable (wrongly so in the environment of the Bazel server, not of the Bazel client).
- Implement a repository rule that discovers the shell interpreter on the local machine, and creates a toolchain rule that stores its path (or the empty string if no interpreter is found). Change Bazel to automatically create this repository and register the toolchain. Users can manually register additional toolchains for remote builds or alternative shell paths.
- Change all native rules that need a shell, including all test rules, to require the toolchain. To look up the shell interpreter's path, the rule asks the toolchain rule and reports an analysis-time error if the path is empty. The toolchain returns either the --shell_executable flag's value (in the ShellConfiguration fragment) or the selected toolchain's path attribute's value.
- Change Skylark rules to implicitly depend on the shell toolchain. If the rule uses ctx.action.run_shell but the shell path is empty, the Skylark evaluator reports an evaluation error. Since the toolchain rule itself is written in Skylark, it would depend on itself -- to avoid this we can use a boolean attribute that's only ever true on this rule.
- Change java_binary and py_binary rules (not on Windows) to depend on the toolchain and insert the local Bash or Env path into the launcher script's shebang line.
- Change every binary rule to implicitly depend on the toolchain, so that the RunCommand can retrieve the shell path for it. Later maybe change the RunCommand to not use a shell at all.
 Reason: Bazel will require an explicit dependency eventually, see the "refined solutions", 3rd bullet point. Starting to require this dependency from one Bazel release to another without a transitional period would break existing Skylark rules, in user code and in released libraries.
- Rewrite the test setup script as a self-contained native binary (in C++ maybe?), ship it as an embedded binary for the local platform. Expect it to be installed at a hardcoded path on the remote worker. Remove the implicit dependency on the shell toolchain.
- Rewrite the Java and Python launchers as self-contained native binaries, ship them as embedded binaries for the local platform. Note that Bazel on Windows already does this.
- Require Skylark rules to explicitly depend on the Bash toolchain if they need to use ctx.actions.run_shell, remove the implicit dependency on the toolchain.
- Add a toolchain rule to express a required set of bintools and encode their semantics. The shell discovery repository rule instantiates this toolchain too.