aboutsummaryrefslogtreecommitdiffhomepage
path: root/site/designs/_posts/2016-06-02-sandboxing.md
diff options
context:
space:
mode:
authorGravatar David Chen <dzc@google.com>2016-07-29 09:05:11 +0000
committerGravatar Damien Martin-Guillerez <dmarting@google.com>2016-07-29 10:26:00 +0000
commit48fa81a3d7678b92d0588ddd832ffdb50fd156d3 (patch)
tree983657cfc689a1d5adc9d6b6f70bc586d2832b1e /site/designs/_posts/2016-06-02-sandboxing.md
parentcba16459185325284ccb4077408f764324a9afe1 (diff)
Move designs/ directory out of docs/ and into site root.
For versioning Bazel's documentation, we only want to version the pages under the Documentation portion of the site. Since Design Docs are more meant for Bazel developers and are generally not continually updated for each release, we should not version the design docs. Moving the directory for the design docs out of docs/ will also simplify the change for versioning Bazel's docs. -- MOS_MIGRATED_REVID=128788588
Diffstat (limited to 'site/designs/_posts/2016-06-02-sandboxing.md')
-rw-r--r--site/designs/_posts/2016-06-02-sandboxing.md150
1 files changed, 150 insertions, 0 deletions
diff --git a/site/designs/_posts/2016-06-02-sandboxing.md b/site/designs/_posts/2016-06-02-sandboxing.md
new file mode 100644
index 0000000000..dd81f61a22
--- /dev/null
+++ b/site/designs/_posts/2016-06-02-sandboxing.md
@@ -0,0 +1,150 @@
+---
+layout: contribute
+title: Sandboxing
+---
+
+# Bazel Sandboxing 2.0
+
+This doc was written by [philwo@google.com](mailto:philwo@google.com).
+Status: unimplemented, section "Handling of environment variables" superseded
+by the
+[Specifying environment variables](/docs/designs/2016/06/21/environment.html)
+design document.
+
+## Current situation
+
+Tools that use undeclared input files (files that are not explicitly listed in
+the dependencies of an action) are a problem, as Bazel cannot keep track of them
+and thus they can cause builds to become incorrect: When one of the undeclared
+input files changes, Bazel will still believe that the build is up-to-date and
+won't rebuild the action - resulting in an incorrect incremental build.
+
+Bazel uses sandboxing to prevent tools (e.g. compilers, linkers, ...) from
+accidentally working with input files that are not a declared dependency of an
+action - the idea is to run each tool in an environment that contains only the
+explicitly declared input files of the action. Thus, there simply are no other
+files that a tool could access.
+
+In theory this works well, but as nearly all Bazel users rely at least on some
+tools provided by their operating system (e.g. `/usr/bin/zip`, `/usr/bin/gcc`),
+which in turn require shared libraries, helper tools or data from other parts
+of the installed OS, Bazel currently mounts a number of hard-coded directories
+from the operating system into the sandbox in addition to the explicitly
+declared inputs.
+
+However, even with that some users continue to run into issues, making Bazel
+hard to use - e.g. the compiler they want to use is in a directory that's not
+part of the hard-coded list (such as `/usr/local` or `/opt`) or the tool needs
+access to device files (e.g. the nVidia CUDA SDK).
+
+## Proposal
+
+We think that it's time to revisit how we do sandboxing in the default settings
+of Bazel. Sandboxing was intended to protect the user from forgetting to
+declare explicit dependencies between their targets and to protect from tests
+or tools accidentally writing all over the hard-disk (e.g. a test that wants to
+clean up its temporary work directory via rm -rf and unfortunately wipes the
+whole disk), not so much for protecting against an operating system having any
+influence on the build. For these users, the current sandboxing with its
+hard-coded list of allowed directories is too strict.
+
+On the other hand, some people absolutely do want 100% reproducible and
+hermetic builds - and for them the current sandboxing actually isn't strict
+enough, as it allows access to various files from the operating system.
+
+We believe we have found a solution that satisfies the demands of all users:
+
+ * Bazel sandboxing will by default recursively mount the root directory `/`
+ into each sandbox in read-only mode, excluding the workspace directory (so
+ that source files cannot be read from that well-known path) and with a new
+ empty, writable execroot that contains the declared inputs of the action.
+ * In addition, Bazel will allow to mount a 'base image' or 'base directory' as
+ the root directory of the sandbox, thus completely removing any connection
+ to the operating system the user is running Bazel under. For example, a
+ project might decide that all builds should be done inside a standardized
+ Ubuntu 16.04 LTS environment containing certain versions of gcc, etc., that
+ is shipped as a base image. Now, even if the developer uses Arch Linux or
+ CentOS on their machine, they can build using the same environment as
+ everyone else, thus getting the exact same and reproducible outputs.
+
+### Base images
+
+Base images are simply `.tar.gz`'s of a directory structure that contains all
+files necessary to execute binaries in, e.g. the output of “debootstrap” or
+what you would usually “chroot” in and then run a tool inside. They should be
+referred to via labels and could for example be downloaded from somewhere via
+a `http_file` rule in the WORKSPACE.
+
+We're investigating if we can reuse
+[Docker images (OCI)](https://github.com/opencontainers/image-spec/blob/v0.1.0/serialization.md)
+for this, which would make it easier for users to get started with this
+feature.
+
+### Handling of environment variables
+
+As part of this project, we also propose to change the handling of environment
+variables (e.g. `PATH`) in Bazel, as we believe they are an important part of
+the configuration of the environment that the build runs in.
+
+As an example, Bazel currently [resets PATH to a hard-coded string]
+(https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/BazelConfiguration.java),
+which may not be suitable for the environment that it actually runs in - e.g.
+if a user installs a tool called `babel` in `/usr/local/bin` and they call
+`babel` in a shell script or Skylark rule they wrote, [they expect it to just
+work] (https://github.com/bazelbuild/bazel/issues/884). We can argue that they
+instead should check in their tool to the repository and not rely on `PATH`
+lookup to find it, however this is sometimes not possible due to:
+
+ * Users just don't think it's feasible and instead want to take whatever is
+ installed on the system,
+ * Bazel's restrictions in valid package label identifiers ([you can't check in
+ nodejs](https://github.com/bazelbuild/bazel/issues/884#issuecomment-183378680)
+ into your repository or even make it part of a filegroup, because it
+ contains files that have characters like `$` that are currently illegal from
+ Bazel's point of view, though that may change in the future),
+ * Licensing restrictions that disallow users checking in certain tools (such
+ as XCode).
+
+The proposal how Bazel should decide whether an environment variable should be
+included in the environment of a Spawn is:
+
+ * If `use_default_shell_env` is `True`, set `PATH` and `TMPDIR` env vars
+ (as we currently do).
+ * If a rule declares its need for an environment variable, take it.
+ * We already have an [“env” attribute in Skylark actions]
+ (http://www.bazel.io/docs/skylark/lib/ctx.html#action) that allows one to
+ set variables to hard-coded strings, we have `use_default_shell_env` in
+ Skylark actions, which pulls in `PATH` and `TMPDIR`, but we don't have any
+ way to just say "This rule needs this environment variable". Laurent
+ suggested that we discuss this later, as adding yet another attribute is
+ annoying - maybe there's some way we can fold all these use cases into one
+ attribute.
+ * We might want to add the same attribute to genrule as well then.
+ * Don't include any other environment variables.
+
+If Bazel decided that an environment variable is needed by a rule, the next
+step is to figure out its value. The proposal how Bazel should decide the value
+of an environment variable is:
+
+ * If an environment variable is overridden in the `WORKSPACE.local` file
+ ("machine-specific settings"), take it from there.
+ * If an environment variable is overridden in the `WORKSPACE` file
+ ("project-specific settings"), always take the value from there.
+ * If not and we use a base image, take the environment variable from its
+ specification (as in OCI).
+ * If not, take it from the user's environment.
+
+If an environment variable that is used by a rule changed compared to when it
+was built last time, its target has to be rebuild for correctness.
+
+Bazel should instead use `PATH` from the environment and for correctness
+trigger a rebuild when it changes.
+
+*Open question: Should the whitelist of environment variables be configurable,
+e.g. in the WORKSPACE file?*
+
+### Known issues in this area of work
+
+[Bazel #577: genrules leaking PATH into environment]
+(https://github.com/bazelbuild/bazel/issues/577)
+