diff options
author | Ulf Adams <ulfjack@google.com> | 2016-06-27 14:46:35 +0000 |
---|---|---|
committer | Dmitry Lomov <dslomov@google.com> | 2016-06-27 15:00:49 +0000 |
commit | 2fc015791cd4e6394b17c2867877d8c168abae9b (patch) | |
tree | 0a41b9559b1a083efbc49ea5dcafc9697c24a38e /site | |
parent | 3be6e3969192526c8c7c4bcfc2422d3729785600 (diff) |
Add sandboxing design doc from philwo.
--
Change-Id: Ib40fac6a616805defbf972f7eadd333e3e187335
Reviewed-on: https://bazel-review.googlesource.com/#/c/3891/1
MOS_MIGRATED_REVID=125954946
Diffstat (limited to 'site')
-rw-r--r-- | site/docs/designs/2016-06-02-sandboxing.md | 146 |
1 files changed, 146 insertions, 0 deletions
diff --git a/site/docs/designs/2016-06-02-sandboxing.md b/site/docs/designs/2016-06-02-sandboxing.md new file mode 100644 index 0000000000..c97230a8f7 --- /dev/null +++ b/site/docs/designs/2016-06-02-sandboxing.md @@ -0,0 +1,146 @@ +--- +layout: documentation +title: Specifying environment variables +--- + +# Bazel Sandboxing 2.0 + +This doc was written by [philwo@google.com](mailto:philwo@google.com). + +## Current situation + +Tools that use undeclared input files (files that are not explicitly listed in +the dependencies of an action) are a problem, as Bazel cannot keep track of them +and thus they can cause builds to become incorrect: When one of the undeclared +input files changes, Bazel will still believe that the build is up-to-date and +won't rebuild the action - resulting in an incorrect incremental build. + +Bazel uses sandboxing to prevent tools (e.g. compilers, linkers, ...) from +accidentally working with input files that are not a declared dependency of an +action - the idea is to run each tool in an environment that contains only the +explicitly declared input files of the action. Thus, there simply are no other +files that a tool could access. + +In theory this works well, but as nearly all Bazel users rely at least on some +tools provided by their operating system (e.g. `/usr/bin/zip`, `/usr/bin/gcc`), +which in turn require shared libraries, helper tools or data from other parts +of the installed OS, Bazel currently mounts a number of hard-coded directories +from the operating system into the sandbox in addition to the explicitly +declared inputs. + +However, even with that some users continue to run into issues, making Bazel +hard to use - e.g. the compiler they want to use is in a directory that's not +part of the hard-coded list (such as `/usr/local` or `/opt`) or the tool needs +access to device files (e.g. the nVidia CUDA SDK). + +## Proposal + +We think that it's time to revisit how we do sandboxing in the default settings +of Bazel. Sandboxing was intended to protect the user from forgetting to +declare explicit dependencies between their targets and to protect from tests +or tools accidentally writing all over the hard-disk (e.g. a test that wants to +clean up its temporary work directory via rm -rf and unfortunately wipes the +whole disk), not so much for protecting against an operating system having any +influence on the build. For these users, the current sandboxing with its +hard-coded list of allowed directories is too strict. + +On the other hand, some people absolutely do want 100% reproducible and +hermetic builds - and for them the current sandboxing actually isn't strict +enough, as it allows access to various files from the operating system. + +We believe we have found a solution that satisfies the demands of all users: + + * Bazel sandboxing will by default recursively mount the root directory `/` + into each sandbox in read-only mode, excluding the workspace directory (so + that source files cannot be read from that well-known path) and with a new + empty, writable execroot that contains the declared inputs of the action. + * In addition, Bazel will allow to mount a 'base image' or 'base directory' as + the root directory of the sandbox, thus completely removing any connection + to the operating system the user is running Bazel under. For example, a + project might decide that all builds should be done inside a standardized + Ubuntu 16.04 LTS environment containing certain versions of gcc, etc., that + is shipped as a base image. Now, even if the developer uses Arch Linux or + CentOS on their machine, they can build using the same environment as + everyone else, thus getting the exact same and reproducible outputs. + +### Base images + +Base images are simply `.tar.gz`'s of a directory structure that contains all +files necessary to execute binaries in, e.g. the output of “debootstrap” or +what you would usually “chroot” in and then run a tool inside. They should be +referred to via labels and could for example be downloaded from somewhere via +a `http_file` rule in the WORKSPACE. + +We're investigating if we can reuse +[Docker images (OCI)](https://github.com/opencontainers/image-spec/blob/v0.1.0/serialization.md) +for this, which would make it easier for users to get started with this +feature. + +### Handling of environment variables + +As part of this project, we also propose to change the handling of environment +variables (e.g. `PATH`) in Bazel, as we believe they are an important part of +the configuration of the environment that the build runs in. + +As an example, Bazel currently [resets PATH to a hard-coded string] +(https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/BazelConfiguration.java), +which may not be suitable for the environment that it actually runs in - e.g. +if a user installs a tool called `babel` in `/usr/local/bin` and they call +`babel` in a shell script or Skylark rule they wrote, [they expect it to just +work] (https://github.com/bazelbuild/bazel/issues/884). We can argue that they +instead should check in their tool to the repository and not rely on `PATH` +lookup to find it, however this is sometimes not possible due to: + + * Users just don't think it's feasible and instead want to take whatever is + installed on the system, + * Bazel's restrictions in valid package label identifiers ([you can't check in + nodejs](https://github.com/bazelbuild/bazel/issues/884#issuecomment-183378680) + into your repository or even make it part of a filegroup, because it + contains files that have characters like `$` that are currently illegal from + Bazel's point of view, though that may change in the future), + * Licensing restrictions that disallow users checking in certain tools (such + as XCode). + +The proposal how Bazel should decide whether an environment variable should be +included in the environment of a Spawn is: + + * If `use_default_shell_env` is `True`, set `PATH` and `TMPDIR` env vars + (as we currently do). + * If a rule declares its need for an environment variable, take it. + * We already have an [“env” attribute in Skylark actions] + (http://www.bazel.io/docs/skylark/lib/ctx.html#action) that allows one to + set variables to hard-coded strings, we have `use_default_shell_env` in + Skylark actions, which pulls in `PATH` and `TMPDIR`, but we don't have any + way to just say "This rule needs this environment variable". Laurent + suggested that we discuss this later, as adding yet another attribute is + annoying - maybe there's some way we can fold all these use cases into one + attribute. + * We might want to add the same attribute to genrule as well then. + * Don't include any other environment variables. + +If Bazel decided that an environment variable is needed by a rule, the next +step is to figure out its value. The proposal how Bazel should decide the value +of an environment variable is: + + * If an environment variable is overridden in the `WORKSPACE.local` file + ("machine-specific settings"), take it from there. + * If an environment variable is overridden in the `WORKSPACE` file + ("project-specific settings"), always take the value from there. + * If not and we use a base image, take the environment variable from its + specification (as in OCI). + * If not, take it from the user's environment. + +If an environment variable that is used by a rule changed compared to when it +was built last time, its target has to be rebuild for correctness. + +Bazel should instead use `PATH` from the environment and for correctness +trigger a rebuild when it changes. + +*Open question: Should the whitelist of environment variables be configurable, +e.g. in the WORKSPACE file?* + +### Known issues in this area of work + +[Bazel #577: genrules leaking PATH into environment] +(https://github.com/bazelbuild/bazel/issues/577) + |