Published using Google Docs
Bazel Runfiles Libraries
Updated automatically every 5 minutes

Bazel Runfiles Libraries

Author: laszlocsomor@google.com

Tracking bug: https://github.com/bazelbuild/bazel/issues/4460

Status: under implementation, Java version rolled out

Reviewed: 2018/01/30

Last update: 2018/04/06

Note: as of 2018/04/06 we no longer plan to follow through with the "Rule changes" section, because adding a launcher to sh_* rules introduces problems (launching the main binary with the correct $0 value doesn't work).

Summary

The Bazel-on-Windows team will write platform-independent libraries to let programs easily access data-dependencies (runfiles) at runtime. Target languages are Bash, C++, Java, and Python. The libraries will implement an "rlocation" function that looks up the absolute path of a runfile at runtime. The libraries will require minimal, easy setup from the user (e.g. depend on a given target and include a header file), have the same interface on all platforms within one language, and have similar interfaces across languages. We need to change sh_{binary,test} on Linux/macOS to use a launcher script similar to the launchers of py_* and java_* rules: this new launcher will source the runfiles library for Bash scripts.

Contents

This document:

Motivation

TL;DR: On Windows, Bazel cannot create a runfiles tree because it can't create symlinks. Instead programs need to look up runfiles from a manifest. We want to simplify the lookup process.

Historically Blaze only ran on Linux. To set up the runfiles of a binary, Blaze created a directory tree containing symlinks that point to the data-dependency files of the binary. The symlink targets are either in the source tree or in the output tree.

When we (the Blaze team) open-sourced Bazel and ported it to Windows, we found we couldn't create symlinks on Windows and had to come up with an alternative way to support runfiles. We settled on using runfiles manifests: these are text files that list the mapping of runfiles-root-relative paths (where the symlinks would be) to absolute paths (where the symlinks would point at). To look up a runfile, the client code needs to look it up from the manifest, which needs knowing where the manifest file is and being aware of the manifest's syntax.

We don't want users to need to know where the manifest is, or know its syntax, because we want to keep those as implementation details and have the liberty to change them. We also think the current runfile-lookup mechanism, namely to grep for runfile paths in a manifest file, is not user-friendly enough.

Library interface

TL;DR: Every language's library will implement rlocation(runfile_path), which returns the absolute path of a runfile. Later we'll implement rfind(runfile_path_pattern) that emulates running find(1) in a runfiles tree.

Every language's library will have a similar interface:

Runfiles strategies:

For reference, the Java implementation is already on GitHub and implements Rlocation.

Usage

TL;DR: Users need to depend on a target in @bazel_tools, include/import the runfiles library, create a Runfiles object, and use Rlocation(string).

Example (Java)

BUILD file:

java_binary(

    name = "Foo",

    main_class = "Foo",

    srcs = ["Foo.java"],

    deps = ["@bazel_tools//tools/java/runfiles"],

    data = ["//foo/bar:hello.txt"],

)

Foo.java:

import com.google.devtools.build.runfiles.Runfiles;
import java.io.IOException;

public class Foo {

  public static void main(String[] args) throws IOException {

    System.out.println("Hello Java!");

    Runfiles r = Runfiles.create();

    System.out.println("rloc=" + r.rlocation("foo_ws/foo/bar/hello.txt"));

  }

}

Example (Python)

BUILD file:

py_binary(

    name = "foo",

    srcs = ["foo.py"],

    deps = ["@bazel_tools//tools/py/runfiles"],

    data = ["//foo/bar:hello.txt"],

)

foo.py:

from bazel_tools.tools.py.runfiles import runfiles

print("Hello Python!")

r = runfiles.Create()

print("rloc=" + r.Rlocation("foo_ws/foo/bar/hello.txt"))

Runfiles discovery

TL;DR: The binary will look for the runfiles manifest or runfiles directory next to itself, or check envvars in case it is someone else's data-dependency and has no runfiles on its own.

Binaries can run in one of two ways:

 

When the binary runs as standalone, it needs to look for a runfiles manifest or runfiles directory next to the binary, and if found, respectively use a manifest- or directory-based Runfiles object.

When the binary runs as another rule's data-dependency, it can only rely on the launching environment to tell it where the runfiles are. An easy medium for that is envvars (environment variables): the calling binary can set the RUNFILES_MANIFEST_FILE to the runfiles manifest's path, or RUNFILES_DIR to the runfiles directory's path. The calling binary runs as standalone (by definition) and discovers its own runfiles accordingly.

The unified runfiles discovery strategy is to:

  1. check if RUNFILES_MANIFEST_FILE or RUNFILES_DIR envvars are set, and again initialize a Runfiles object accordingly; otherwise
  2. check if the argv[0] + ".runfiles_manifest" file or the argv[0] + ".runfiles" directory exists (keeping in mind that argv[0] may not include the ".exe" suffix on Windows), and if so, initialize a manifest- or directory-based Runfiles object; otherwise
  3. assume the binary has no runfiles.

Since some languages use a launcher and some don't, and in some languages we can retrieve argv[0] in the main program (Python, Bash, C++) but in some we can't (Java), I suggest the following:

As visible from the process above, I think the launcher and runfiles libraries should support directory-based runfiles on Windows too, in anticipation of the scenario where the user unpacks a deployment archive. Though a Bazel-generated deployment archive (assuming rules existed to generate one) may as well include the runfiles manifest, something would have to update the paths in this manifest to match the unpacked locations, plus the directory-based runfiles strategy is faster than the manifest-based one, so the former is preferable to the latter. And the client needs to check whether the rlocation-returned path exists anyway, so looking up the runfile from a manifest when the directory tree is also available doesn't gain anything.

Rule changes

TL;DR: sh_binary/sh_test will use a launcher script on Linux/macOS too, in order to define rlocation() (or source the runfiles library, which itself is defined in a runfile).

Bash -- obsolete as of 2018/04/06

The easy way to load the runfiles library would be by using rlocation(), however this poses a chicken-and-egg problem, because the library itself defines rlocation(). The code to load the library without rlocation() is too complex for us to reasonably expect users to write it:

if [[ -n "${RUNFILES_MANIFEST_FILE:-}" ]]; then

  _path="$(grep -m1 "^bazel_tools/tools/runfiles/runfiles.sh\b" | cut -d" " -f2-)"

elif [[ -n "${RUNFILES_DIR:-}" ]]; then

  _path="${RUNFILES_DIR}/bazel_tools/tools/runfiles/runfiles.sh"

fi

if [[ -n "${_path:-}" ]]; then

  if ! source "$_path" >&/dev/null; then

    echo "ERROR: cannot source $_path" >&2

    exit 1

  fi

fi

Since the code is non-trivial, we need to update sh_binary/sh_test to use a launcher script on Linux/macOS. The rules already use a launcher on Windows.

The Windows launcher is a native C++ program whose purpose is to set the RUNFILES_MANIFEST_FILE environment variable and run bash.exe with the script. We'll need to change the launcher to also source the runfiles library script. The runfiles library script defines rlocation and initializes runfiles usage (by reading and caching the runfiles manifest).

The new Linux/macOS launcher's purpose would be similar as on Windows: to set the RUNFILES_MANIFEST_FILE and/or RUNFILES_DIR environment variables and source the runfiles library. (We can also embed the runfiles library's code in the launcher.) Since we assume Bash is always available on these platforms, the launcher itself can be a shell script.

C++

I advise no change to the rules. Though cc_binary creates no launcher so the runfiles library will need to look for a runfiles manifest or runfiles directory based on argv[0] (taking into acocunt that on Windows this may not include the ".exe" suffix), I think it's better not to introduce a launcher script or launcher binary.

I base my decision:

Java, Python

No rule changes necessary. These rules already use a launcher that can set the RUNFILES_MANIFEST_FILE or RUNFILES_DIR environment variables.

Action items

We need to accomplish the following items: