aboutsummaryrefslogtreecommitdiffhomepage
path: root/site
diff options
context:
space:
mode:
authorGravatar Ulf Adams <ulfjack@google.com>2016-06-27 14:46:35 +0000
committerGravatar Dmitry Lomov <dslomov@google.com>2016-06-27 15:00:49 +0000
commit2fc015791cd4e6394b17c2867877d8c168abae9b (patch)
tree0a41b9559b1a083efbc49ea5dcafc9697c24a38e /site
parent3be6e3969192526c8c7c4bcfc2422d3729785600 (diff)
Add sandboxing design doc from philwo.
-- Change-Id: Ib40fac6a616805defbf972f7eadd333e3e187335 Reviewed-on: https://bazel-review.googlesource.com/#/c/3891/1 MOS_MIGRATED_REVID=125954946
Diffstat (limited to 'site')
-rw-r--r--site/docs/designs/2016-06-02-sandboxing.md146
1 files changed, 146 insertions, 0 deletions
diff --git a/site/docs/designs/2016-06-02-sandboxing.md b/site/docs/designs/2016-06-02-sandboxing.md
new file mode 100644
index 0000000000..c97230a8f7
--- /dev/null
+++ b/site/docs/designs/2016-06-02-sandboxing.md
@@ -0,0 +1,146 @@
+---
+layout: documentation
+title: Specifying environment variables
+---
+
+# Bazel Sandboxing 2.0
+
+This doc was written by [philwo@google.com](mailto:philwo@google.com).
+
+## Current situation
+
+Tools that use undeclared input files (files that are not explicitly listed in
+the dependencies of an action) are a problem, as Bazel cannot keep track of them
+and thus they can cause builds to become incorrect: When one of the undeclared
+input files changes, Bazel will still believe that the build is up-to-date and
+won't rebuild the action - resulting in an incorrect incremental build.
+
+Bazel uses sandboxing to prevent tools (e.g. compilers, linkers, ...) from
+accidentally working with input files that are not a declared dependency of an
+action - the idea is to run each tool in an environment that contains only the
+explicitly declared input files of the action. Thus, there simply are no other
+files that a tool could access.
+
+In theory this works well, but as nearly all Bazel users rely at least on some
+tools provided by their operating system (e.g. `/usr/bin/zip`, `/usr/bin/gcc`),
+which in turn require shared libraries, helper tools or data from other parts
+of the installed OS, Bazel currently mounts a number of hard-coded directories
+from the operating system into the sandbox in addition to the explicitly
+declared inputs.
+
+However, even with that some users continue to run into issues, making Bazel
+hard to use - e.g. the compiler they want to use is in a directory that's not
+part of the hard-coded list (such as `/usr/local` or `/opt`) or the tool needs
+access to device files (e.g. the nVidia CUDA SDK).
+
+## Proposal
+
+We think that it's time to revisit how we do sandboxing in the default settings
+of Bazel. Sandboxing was intended to protect the user from forgetting to
+declare explicit dependencies between their targets and to protect from tests
+or tools accidentally writing all over the hard-disk (e.g. a test that wants to
+clean up its temporary work directory via rm -rf and unfortunately wipes the
+whole disk), not so much for protecting against an operating system having any
+influence on the build. For these users, the current sandboxing with its
+hard-coded list of allowed directories is too strict.
+
+On the other hand, some people absolutely do want 100% reproducible and
+hermetic builds - and for them the current sandboxing actually isn't strict
+enough, as it allows access to various files from the operating system.
+
+We believe we have found a solution that satisfies the demands of all users:
+
+ * Bazel sandboxing will by default recursively mount the root directory `/`
+ into each sandbox in read-only mode, excluding the workspace directory (so
+ that source files cannot be read from that well-known path) and with a new
+ empty, writable execroot that contains the declared inputs of the action.
+ * In addition, Bazel will allow to mount a 'base image' or 'base directory' as
+ the root directory of the sandbox, thus completely removing any connection
+ to the operating system the user is running Bazel under. For example, a
+ project might decide that all builds should be done inside a standardized
+ Ubuntu 16.04 LTS environment containing certain versions of gcc, etc., that
+ is shipped as a base image. Now, even if the developer uses Arch Linux or
+ CentOS on their machine, they can build using the same environment as
+ everyone else, thus getting the exact same and reproducible outputs.
+
+### Base images
+
+Base images are simply `.tar.gz`'s of a directory structure that contains all
+files necessary to execute binaries in, e.g. the output of “debootstrap” or
+what you would usually “chroot” in and then run a tool inside. They should be
+referred to via labels and could for example be downloaded from somewhere via
+a `http_file` rule in the WORKSPACE.
+
+We're investigating if we can reuse
+[Docker images (OCI)](https://github.com/opencontainers/image-spec/blob/v0.1.0/serialization.md)
+for this, which would make it easier for users to get started with this
+feature.
+
+### Handling of environment variables
+
+As part of this project, we also propose to change the handling of environment
+variables (e.g. `PATH`) in Bazel, as we believe they are an important part of
+the configuration of the environment that the build runs in.
+
+As an example, Bazel currently [resets PATH to a hard-coded string]
+(https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/BazelConfiguration.java),
+which may not be suitable for the environment that it actually runs in - e.g.
+if a user installs a tool called `babel` in `/usr/local/bin` and they call
+`babel` in a shell script or Skylark rule they wrote, [they expect it to just
+work] (https://github.com/bazelbuild/bazel/issues/884). We can argue that they
+instead should check in their tool to the repository and not rely on `PATH`
+lookup to find it, however this is sometimes not possible due to:
+
+ * Users just don't think it's feasible and instead want to take whatever is
+ installed on the system,
+ * Bazel's restrictions in valid package label identifiers ([you can't check in
+ nodejs](https://github.com/bazelbuild/bazel/issues/884#issuecomment-183378680)
+ into your repository or even make it part of a filegroup, because it
+ contains files that have characters like `$` that are currently illegal from
+ Bazel's point of view, though that may change in the future),
+ * Licensing restrictions that disallow users checking in certain tools (such
+ as XCode).
+
+The proposal how Bazel should decide whether an environment variable should be
+included in the environment of a Spawn is:
+
+ * If `use_default_shell_env` is `True`, set `PATH` and `TMPDIR` env vars
+ (as we currently do).
+ * If a rule declares its need for an environment variable, take it.
+ * We already have an [“env” attribute in Skylark actions]
+ (http://www.bazel.io/docs/skylark/lib/ctx.html#action) that allows one to
+ set variables to hard-coded strings, we have `use_default_shell_env` in
+ Skylark actions, which pulls in `PATH` and `TMPDIR`, but we don't have any
+ way to just say "This rule needs this environment variable". Laurent
+ suggested that we discuss this later, as adding yet another attribute is
+ annoying - maybe there's some way we can fold all these use cases into one
+ attribute.
+ * We might want to add the same attribute to genrule as well then.
+ * Don't include any other environment variables.
+
+If Bazel decided that an environment variable is needed by a rule, the next
+step is to figure out its value. The proposal how Bazel should decide the value
+of an environment variable is:
+
+ * If an environment variable is overridden in the `WORKSPACE.local` file
+ ("machine-specific settings"), take it from there.
+ * If an environment variable is overridden in the `WORKSPACE` file
+ ("project-specific settings"), always take the value from there.
+ * If not and we use a base image, take the environment variable from its
+ specification (as in OCI).
+ * If not, take it from the user's environment.
+
+If an environment variable that is used by a rule changed compared to when it
+was built last time, its target has to be rebuild for correctness.
+
+Bazel should instead use `PATH` from the environment and for correctness
+trigger a rebuild when it changes.
+
+*Open question: Should the whitelist of environment variables be configurable,
+e.g. in the WORKSPACE file?*
+
+### Known issues in this area of work
+
+[Bazel #577: genrules leaking PATH into environment]
+(https://github.com/bazelbuild/bazel/issues/577)
+