aboutsummaryrefslogtreecommitdiffhomepage
path: root/site/docs/rule-challenges.md
diff options
context:
space:
mode:
authorGravatar David Chen <dzc@google.com>2016-08-29 08:56:37 +0000
committerGravatar Klaus Aehlig <aehlig@google.com>2016-08-29 09:42:52 +0000
commit15c09dd1b5dbd7e76fe42d193a79dab8bfc24abc (patch)
treee6df9943f0f96c095a6f91240a8f4bc3a84708d9 /site/docs/rule-challenges.md
parent6f2e6fb1eff1a7a265778abf1eb32a850765599e (diff)
Replace doc pages with redirects to versioned doc pages.
* Add a new `redirect` Jekyll layout. * Replace all pages under docs/ with redirects to corresponding page under versions/master/. * Prepend links on Documentation sidebar, including generated navs for the Skylark Library and Build Encyclopedia, with prefix for versioned directory. * Add code to both the internal jekyll-config.sh and external jekyll-tree.sh to add redirect pages for the Skylark Library and Build Encyclopedia. * Bring the branched User Manual doc up to date with latest changes. -- MOS_MIGRATED_REVID=131568800
Diffstat (limited to 'site/docs/rule-challenges.md')
-rw-r--r--site/docs/rule-challenges.md214
1 files changed, 2 insertions, 212 deletions
diff --git a/site/docs/rule-challenges.md b/site/docs/rule-challenges.md
index 7b92655b91..88aa5cba13 100644
--- a/site/docs/rule-challenges.md
+++ b/site/docs/rule-challenges.md
@@ -1,214 +1,4 @@
---
-layout: documentation
-title: Challenges of Writing Rules.
+layout: redirect
+redirect: docs/rule-challenges.html
---
-
-# Challenges of Writing Rules.
-
-We have heard feedback from various people that they have
-difficulty to write efficient Bazel rules. There is no single root cause, but
-it’s due to a combination of historical circumstances and intrinsic complexity
-in the problem domain. This document attempts to give a high level overview of
-the specific issues that we believe to be the main contributors.
-
-* Assumption: Aim for Correctness, Throughput, Ease of Use & Latency
-* Assumption: Large Scale Repositories
-* Assumption: BUILD-like Description Language
-* Intrinsic: Remote Execution and Caching are Hard
-* Historic: Hard Separation between Loading, Analysis, and Execution is
- Outdated, but still affects the API
-* Intrinsic: Using Change Information for Correct and Fast Incremental Builds
- requires Unusual Coding Patterns
-* Intrinsic: Avoiding Quadratic Time and Memory Consumption is Hard
-
-## Assumption: Aim for Correctness, Throughput, Ease of Use & Latency
-
-We assume that the build system needs to be first and foremost correct with
-respect to incremental builds, i.e., for a given source tree, the output of the
-same build should always be the same, regardless of what the output tree looks
-like. In the first approximation, this means Bazel needs to know every single
-input that goes into a given build step, such that it can rerun that step if any
-of the inputs change. There are limits to how correct Bazel can get, as it leaks
-some information such as date / time of the build, and ignores certain types of
-changes such as changes to file attributes. Sandboxing helps ensure correctness
-by preventing reads to undeclared input files. Besides the intrinsic limits of
-the system, there are a few known correctness issues, most of which are related
-to Fileset or the C++ rules, which are both hard problems. We have long-term
-efforts to fix these.
-
-The second goal of the build system is to have high throughput; we are
-permanently pushing the boundaries of what can be done within the current
-machine allocation for a remote execution service. If the remote execution
-service gets overloaded, nobody can get work done.
-
-Ease of use comes next, i.e., of multiple correct approaches with the same (or
-similar) footprint of the remote execution service, we choose the one that is
-easier to use.
-
-For the purpose of this document, latency denotes the time it takes from
-starting a build to getting the intended result, whether that is a test log from
-a passing or failing test, or an error message that a BUILD file has a
-typo.
-
-Note that these goals often overlap; latency is as much a function of throughput
-of the remote execution service as is correctness relevant for ease of use.
-
-
-## Assumption: Large Scale Repositories
-
-The build system needs to operate at the scale of large repositories where large
-scale means that it does not fit on a single hard drive, so it is impossible to
-do a full checkout on virtually all developer machines. A medium-sized build
-will need to read and parse tens of thousands of BUILD files, and evaluate
-hundreds of thousands of globs. While it is theoretically possible to read all
-BUILD files on a single machine, we have not yet been able to do so within a
-reasonable amount of time and memory. As such, it is critical that BUILD files
-can be loaded and parsed independently.
-
-
-## Assumption: BUILD-like Description Language
-
-For the purpose of this document, we assume a configuration language that is
-roughly similar to BUILD files, i.e., declaration of library and binary rules
-and their interdependencies. BUILD files can be read and parsed independently,
-and we avoid even looking at source files whenever we can (except for
-existence).
-
-
-## Intrinsic: Remote Execution and Caching are Hard
-
-Remote execution and caching improve build times in large repositories by
-roughly two orders of magnitude compared to running the build on a single
-machine. However, the scale at which it needs to perform is staggering: Google's
-remote execution service is designed to handle a huge number of requests per
-second, and the protocol carefully avoids unnecessary roundtrips as well as
-unnecessary work on the service side.
-
-At this time, the protocol requires that the build system knows all inputs to a
-given action ahead of time; the build system then computes a unique action
-fingerprint, and asks the scheduler for a cache hit. If a cache hit is found,
-the scheduler replies with the digests of the output files; the files itself are
-addressed by digest later on. However, this imposes restrictions on the Bazel
-rules, which need to declare all input files ahead of time.
-
-
-## Historic: Hard Separation between Loading, Analysis, and Execution is Outdated, but still affects the API
-
-Technically, it is sufficient for a rule to know the input and output files of
-an action just before the action is sent to remote execution. However, the
-original Bazel code base had a strict separation of loading packages, then
-analyzing rules using a configuration (command-line flags, essentially), and
-only then running any actions. This distinction is still part of the rules API
-today, even though the core of Bazel no longer requires it (more details below).
-
-That means that the rules API requires a declarative description of the rule
-interface (what attributes it has, types of attributes). There are some
-exceptions where the API allows custom code to run during the loading phase to
-compute implicit names of output files and implicit values of attributes. For
-example, a java_library rule named ‘foo’ implicitly generates an output named
-‘libfoo.jar’, which can be referenced from other rules in the build graph.
-
-Furthermore, the analysis of a rule cannot read any source files or inspect the
-output of an action; instead, it needs to generate a partial directed bipartite
-graph of build steps and output file names that is only determined from the rule
-itself and its dependencies.
-
-
-## Intrinsic: Using Change Information for Correct and Fast Incremental Builds requires Unusual Coding Patterns
-
-Above, we argued that in order to be correct, Bazel needs to know all the input
-files that go into a build step in order to detect whether that build step is
-still up-to-date. The same is true for package loading and rule analysis, and we
-have designed [Skyframe] (http://www.bazel.io/docs/skyframe.html) to handle this
-in general. Skyframe is a graph library and evaluation framework that takes a
-goal node (such as ‘build //foo with these options’), and breaks it down into
-its constituent parts, which are then evaluated and combined to yield this
-result. As part of this process, Skyframe reads packages, analyzes rules, and
-executes actions.
-
-At each node, Skyframe tracks exactly which nodes any given node used to compute
-its own output, all the way from the goal node down to the input files (which
-are also Skyframe nodes). Having this graph explicitly represented in memory
-allows the build system to identify exactly which nodes are affected by a given
-change to an input file (including creation or deletion of an input file), doing
-the minimal amount of work to restore the output tree to its intended state.
-
-As part of this, each node performs a dependency discovery process; i.e., each
-node can declare dependencies, and then use the contents of those dependencies
-to declare even further dependencies. In principle, this maps well to a
-thread-per-node model. However, medium-sized builds contain hundreds of
-thousands of Skyframe nodes, which isn’t easily possible with current Java
-technology (and for historical reasons, we’re currently tied to using Java, so
-no lightweight threads and no continuations).
-
-Instead, Bazel uses a fixed-size thread pool. However, that means that if a node
-declares a dependency that isn’t available yet, we may have to abort that
-evaluation and restart it (possibly in another thread), when the dependency is
-available. This, in turn, means that nodes should not do this excessively; a
-node that declares N dependencies serially can potentially be restarted N times,
-costing O(N^2) time. Instead, we aim for up-front bulk declaration of
-dependencies, which sometimes requires reorganizing the code, or even splitting
-a node into multiple nodes to limit the number of restarts.
-
-Note that this technology isn’t currently available in the rules API; instead,
-the rules API is still defined using the legacy concepts of loading, analysis,
-and execution phases. However, a fundamental restriction is that all accesses to
-other nodes have to go through the framework so that it can track the
-corresponding dependencies. Regardless of the language in which the build system
-is implemented or in which the rules are written (they don’t have to be the
-same), rule authors must not use standard libraries or patterns that bypass
-Skyframe. For Java, that means avoiding java.io.File as well as any form of
-reflection, and any library that does either. Libraries that support dependency
-injection of these low-level interfaces still need to be setup correctly for
-Skyframe.
-
-This strongly suggests to avoid exposing rule authors to a full language runtime
-in the first place. The danger of accidental use of such APIs is just too big -
-several Bazel bugs in the past were caused by rules using unsafe APIs, even
-though the rules were written by the Bazel team, i.e., by the domain experts.
-
-
-## Intrinsic: Avoiding Quadratic Time and Memory Consumption is Hard
-
-To make matters worse, apart from the requirements imposed by Skyframe, the
-historical constraints of using Java, and the outdatedness of the rules API,
-accidentally introducing quadratic time or memory consumption is a fundamental
-problem in any build system based on library and binary rules. There are two
-very common patterns that introduce quadratic memory consumption (and therefore
-quadratic time consumption).
-
-1. Chains of Library Rules
-Consider the case of a chain of library rules A depends on B, depends on C, and
-so on. Then, we want to compute some property over the transitive closure of
-these rules, such as the Java runtime classpath, or the C++ linker command for
-each library. Naively, we might take a standard list implementation; however,
-this already introduces quadratic memory consumption: the first library
-contains one entry on the classpath, the second two, the third three, and so
-on, for a total of 1+2+3+...+N = O(N^2) entries.
-
-2. Binary Rules Depending on the Same Library Rules
-Consider the case where a set of binaries that depend on the same library
-rules; for example, you might have a number of test rules that test the same
-library code. Let’s say out of N rules, half the rules are binary rules, and
-the other half library rules. Now consider that each binary makes a copy of
-some property computed over the transitive closure of library rules, such as
-the Java runtime classpath, or the C++ linker command line. For example, it
-could expand the command line string representation of the C++ link action. N/2
-copies of N/2 elements is O(N^2) memory.
-
-
-### Custom Collections Classes to Avoid Quadratic Complexity
-
-Bazel is heavily affected by both of these scenarios, so we introduced a set of
-custom collection classes that effectively compress the information in memory by
-avoiding the copy at each step. Almost all of these data structures have set
-semantics, so we called the class NestedSet. The majority of changes to reduce
-Bazel’s memory consumption over the past several years were changes to use
-NestedSet instead of whatever was previously used.
-
-Unfortunately, usage of NestedSet does not automatically solve all the issues;
-in particular, even just iterating over a NestedSet in each rule re-introduces
-quadratic time consumption. NestedSet also has some helper methods to facilitate
-interoperability with normal collections classes; unfortunately, accidentally
-passing a NestedSet to one of these methods leads to copying behavior, and
-reintroduces quadratic memory consumption.