diff options
author | Jakob Buchgraber <buchgr@google.com> | 2018-02-02 16:14:15 -0800 |
---|---|---|
committer | Copybara-Service <copybara-piper@google.com> | 2018-02-02 16:16:00 -0800 |
commit | d070f70780ce616ad7a684bca3eebfd541ac16ef (patch) | |
tree | 605494f7d6605c752d5ce569826e837b6a9eb309 /site | |
parent | 7dbfcfbf9e2ab761166f76f6951509675690cd30 (diff) |
site/docs: add documentation for remote caching.
Closes #4540.
PiperOrigin-RevId: 184349872
Diffstat (limited to 'site')
-rw-r--r-- | site/_layouts/documentation.html | 1 | ||||
-rw-r--r-- | site/docs/remote-caching.md | 355 |
2 files changed, 356 insertions, 0 deletions
diff --git a/site/_layouts/documentation.html b/site/_layouts/documentation.html index 7ef96c5b64..5052095515 100644 --- a/site/_layouts/documentation.html +++ b/site/_layouts/documentation.html @@ -109,6 +109,7 @@ nav: docs <li><a href="/versions/{{ site.version }}/query-how-to.html">Querying Builds</a></li> <li><a href="/versions/{{ site.version }}/test-encyclopedia.html">Writing Tests</a></li> <li><a href="/versions/{{ site.version }}/best-practices.html">Best Practices</a></li> + <li><a href="/versions/{{ site.version }}/remote-caching.html">Remote Caching</a></li> </ul> <h3>Reference</h3> diff --git a/site/docs/remote-caching.md b/site/docs/remote-caching.md new file mode 100644 index 0000000000..bfdf4e7b87 --- /dev/null +++ b/site/docs/remote-caching.md @@ -0,0 +1,355 @@ +--- +layout: documentation +title: Remote Caching +--- + +# Remote Caching + +A remote cache is used by a team of developers and/or a continuous integration +(CI) system to share build outputs. If your build is reproducible, the +outputs from one machine can be safely reused on another machine, which can +make builds significantly faster. + +## Contents + +* [Remote caching overview](#remote-caching-overview) +* [How a build uses remote caching](#how-a-build-uses-remote-caching) +* [Setting up a server as the cache’s backend](#setting-up-a-server-as-the-caches-backend) + * [nginx](#nginx) + * [Bazel Remote Cache](#bazel-remote-cache) + * [Google Cloud Storage](#google-cloud-storage) + * [Other servers](#other-servers) +* [HTTP Caching Protocol](#http-caching-protocol) +* [Run Bazel using the remote cache](#run-bazel-using-the-remote-cache) + * [Read from and write to the remote cache](#read-from-and-write-to-the-remote-cache) + * [Read only from the remote cache](#read-only-from-the-remote-cache) + * [Exclude specific targets from using the remote cache](#exclude-specific-targets-from-using-the-remote-cache) + * [Delete content from the remote cache](#delete-content-from-the-remote-cache) +* [Known Issues](#known-issues) +* [Bazel remote execution (in development)](#remote-execution-in-development) + +## Remote caching overview + +Bazel breaks a build into discrete steps, which are called actions. Each action +has inputs, output names, a command line, and environment variables. Required +inputs and expected outputs are declared explicitly for each action. + +You can set up a server to be a remote cache for build outputs, which are these +action outputs. These outputs consist of a list of output file names and the +hashes of their contents. With a remote cache, you can reuse build outputs +from another user’s build rather than building each new output locally. + +To use remote caching: + +* Set up a server as the cache’s backend +* Configure the Bazel build to use the remote cache +* Use Bazel version 0.10.0 or later + +The remote cache stores two types of data: + +* The action cache, which is a map of action hashes to action result metadata. +* A content-addressable store (CAS) of output files. + +### How a build uses remote caching + +Once a server is set up as the remote cache, you use the cache in multiple +ways: + +* Read and write to the remote cache +* Read and/or write to the remote cache except for specific targets +* Only read from the remote cache +* Not use the remote cache at all + +When you run a Bazel build that can read and write to the remote cache, +the build follows these steps: + +1. Bazel creates the graph of targets that need to be built, and then creates +a list of required actions. Each of these actions has declared inputs +and output filenames. +2. Bazel checks your local machine for existing build outputs and reuses any +that it finds. +3. Bazel checks the cache for existing build outputs. If the output is found, +Bazel retrieves the output. This is a cache hit. +4. For required actions where the outputs were not found, Bazel executes the +actions locally and creates the required build outputs. +5. New build outputs are uploaded to the remote cache. + +## Setting up a server as the cache's backend + +You need to set up a server to act as the cache's backend. A HTTP/1.1 +server can treat Bazel's data as opaque bytes and so many existing servers +can be used as a remote caching backend. Bazel's +[HTTP Caching Protocol](#http-caching-protocol) is what supports remote +caching. + +You are responsible for choosing, setting up, and maintaining the backend +server that will store the cached outputs. When choosing a server, consider: + +* Networking speed. For example, if your team is in the same office, you may +want to run your own local server. +* Security. The remote cache will have your binaries and so needs to be secure. +* Ease of management. For example, Google Cloud Storage is a fully managed service. + +There are many backends that can be used for a remote cache. Some options +include: + +* [nginx](#nginx) +* [Bazel Remote Cache](#bazel-remote-cache) +* [Google Cloud Storage](#google-cloud-storage) + +### nginx + +nginx is an open source web server. With its [WebDAV module], it can be +used as a remote cache for Bazel. On Debian and Ubuntu you can install the +`nginx-extras` package. On macOS nginx is available via Homebrew: + +```bash +$ brew tap denji/nginx +$ brew install nginx-full --with-webdav +``` + +Below is an example configuration for nginx. Note that you will need to +change `/path/to/cache/dir` to a valid directory where nginx has permission +to write and read. You may need to change `client_max_body_size` option to a +larger value if you have larger output files. The server will require other +configuration such as authentication. + + +Example configuration for `server section` in `nginx.conf`: + +```nginx +location /cache/ { + # The path to the directory where nginx should store the cache contents. + root /path/to/cache/dir; + # Allow PUT + dav_methods PUT; + # Allow nginx to create the /ac and /cas subdirectories. + create_full_put_path on; + # The maximum size of a single file. + client_max_body_size 1G; + allow all; +} +``` + +### Bazel Remote Cache + +Bazel Remote Cache is an open source remote build cache that you can use on +your infrastructure. It is experimental and unsupported. + +This cache stores contents on disk and also provides garbage collection +to enforce an upper storage limit and clean unused artifacts. The cache is +available as a [docker image] and its code is available on [GitHub]. + +Please refer to the [GitHub] page for instructions on how to use it. + +### Google Cloud Storage + +[Google Cloud Storage] is a fully managed object store which provides an +HTTP API that is compatible with Bazel's remote caching protocol. It requires +that you have a Google Cloud account with billing enabled. + +To use Cloud Storage as the cache: + +1. [Create a storage bucket](https://cloud.google.com/storage/docs/creating-buckets). +Ensure that you select a bucket location that's closest to you, as network bandwidth +is important for the remote cache. + +2. Create a service account for Bazel to authenticate to Cloud Storage. See +[Creating a service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating_a_service_account). + +3. Generate a secret JSON key and then pass it to Bazel for authentication. Store +the key securely, as anyone with the key can read and write arbitrary data +to/from your GCS bucket. + +4. Connect to Cloud Storage by adding the following flags to your Bazel command: + * Pass the following URL to Bazel by using the flag: `--remote_http_cache=https://storage.googleapis.com/bucket-name` where `bucket-name` is the name of your storage bucket. + * Pass the authentication key using the flag: `--google_credentials=/path/to/your/secret-key.json`. + +5. You can configure Cloud Storage to automatically delete old files. To do so, see +[Managing Object Lifecycles](https://cloud.google.com/storage/docs/managing-lifecycles). + +### Other servers + +You can set up any HTTP/1.1 server that supports PUT and GET as the cache's +backend. Users have reported success with caching backends such as [Hazelcast], +[Apache httpd], and [AWS S3]. + +## HTTP Caching Protocol + +Bazel supports remote caching via HTTP/1.1. The protocol is conceptually simple: +Binary data (BLOB) is uploaded via PUT requests and downloaded via GET requests. +Action result metadata is stored under the path `/ac/` and output files are stored +under the path `/cas/`. + +For example, consider a remote cache running under `http://localhost:8080/cache`. +A Bazel request to download action result metadata for an action with the SHA256 +hash `01ba4719...` will look as follows: + +```http +GET /cache/ac/01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b HTTP/1.1 +Host: localhost:8080 +Accept: */* +Connection: Keep-Alive +``` + +A Bazel request to upload an output file with the SHA256 hash `15e2b0d3...` to +the CAS will look as follows: + +```http +PUT /cas/15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c65fbc8c3312448eb225 HTTP/1.1 +Host: localhost:8080 +Accept: */* +Content-Length: 9 +Connection: Keep-Alive + +0x310x320x330x340x350x360x370x380x39 +``` + +## Run Bazel using the remote cache + +Once a server is set up as the remote cache, to use the remote cache you +need to add flags to your Bazel command. See list of configurations and +their flags below. + +You may also need configure authentication, which is specific to your +chosen server. + +You may want to add these flags in a `.bazelrc` file so that you don’t +need to specify them every time you run Bazel. Depending on your project and +team dynamics, you can add flags to a `.bazelrc` file that is: + +* On your local machine +* In your project’s workspace, shared with the team +* On the CI system + +### Read from and write to the remote cache + +Take care in who has the ability to write to the remote cache. You may want +only your CI system to be able to write to the remote cache. + +Use the following flags to: + +* read from and write to the remote cache +* disable sandboxing + +``` +build --spawn_strategy=remote --genrule_strategy=remote +build --strategy=Javac=remote --strategy=Closure=remote +build --remote_http_cache=http://replace-with-your.host:port +``` + +Using the remote cache with sandboxing enabled is experimental. Use the +following flags to read and write from the remote cache with sandboxing +enabled: + +``` +build --experimental_remote_spawn_cache +build --remote_http_cache=http://replace-with-your.host:port +``` + +### Read only from the remote cache + +Use the following flags to: read from the remote cache with sandboxing +disabled. + +``` +build --spawn_strategy=remote --genrule_strategy=remote +build --strategy=Javac=remote --strategy=Closure=remote +build --remote_http_cache=http://replace-with-your.host:port +build --remote_upload_local_results=false +``` + +Using the remote cache with sandboxing enabled is experimental. Use the +following flags to read from the remote cache with sandboxing enabled: + +``` +build --experimental_remote_spawn_cache +build --remote_http_cache=http://replace-with-your.host:port +build --remote_upload_local_results=false +``` + +### Exclude specific targets from using the remote cache + +To exclude specific targets from using the remote cache, tag the target with +`no-cache`. For example: + +``` +java_library( + name = "target", + tags = ["no-cache"], +) +``` + +### Delete content from the remote cache + +Deleting content from the remote cache is part of managing your server. +How you delete content from the remote cache depends on the server you have +set up as the cache. When deleting outputs, either delete the entire cache, +or delete old outputs. + +The cached outputs are stored as a set of names and hashes. When deleting +content, there’s no way to distinguish which output belongs to a specific +build. + +You may want to delete content from the cache to: + +* Create a clean cache after a cache was poisoned +* Reduce the amount of storage used by deleting old outputs + +## Known issues + +**Input file modification during a Build** + +When an input file is modified during a build, Bazel might upload invalid +results to the remote cache. We are working on a solution for this problem. +See [issue #3360] for updates. Avoid this problem by not editing source +files during a build. + + +**Environment variables leaking into an action** + +An action definition contains environment variables. This can be a problem +for sharing remote cache hits across machines. For example, environments +with different `$PATH` variables won't share cache hits. You can specify +`--experimental_strict_action_env` to ensure that that's not the case and +that only environment variables explicitly whitelisted via `--action_env` +are included in an action definition. Bazel's Debian/Ubuntu package used +to install `/etc/bazel.bazelrc` with a whitelist of environment variables +including `$PATH`. If you are getting fewer cache hits than expected, check +that your environment doesn't have an old `/etc/bazel.bazelrc` file. + + +**Bazel does not track tools outside a workspace** + +Bazel currently does not track tools outside a workspace. This can be a +problem if, for example, an action uses a compiler from `/usr/bin/`. Then, +two users with different compilers installed will wrongly share cache hits +because the outputs are different but they have the same action hash. Please +watch [issue #4558] for updates. + + +## Bazel remote execution (in development) + +A [gRPC protocol] that supports both remote caching and remote execution +is in development. Remote execution allows Bazel to execute actions on a +separate platform, such as a datacenter. You can try remote execution with +[Buildfarm], an open source project that aims to provide a distributed remote +execution platform. + +[WebDAV module]: http://nginx.org/en/docs/http/ngx_http_dav_module.html +[docker image]: https://hub.docker.com/r/buchgr/bazel-remote-cache/ +[GitHub]: https://github.com/buchgr/bazel-remote/ +[GitHub Issue Tracker]: https://github.com/buchgr/bazel-remote/issues +[Google Cloud Storage]: https://cloud.google.com/storage +[Google Cloud Console]: https://cloud.google.com/console +[Dialog to create a new GCS bucket]: /assets/remote-cache-gcs-create-bucket.png +[bucket location]: https://cloud.google.com/storage/docs/bucket-locations +[Dialog to create a new GCP Service Account]: /assets/remote-cache-gcp-service-account.png +[Hazelcast]: https://hazelcast.com +[Apache httpd]: http://httpd.apache.org +[AWS S3]: https://aws.amazon.com/s3 +[issue #3360]: https://github.com/bazelbuild/bazel/issues/3360 +[gRPC protocol]: https://github.com/googleapis/googleapis/blob/master/google/devtools/remoteexecution/v1test/remote_execution.proto +[Buildfarm]: https://github.com/bazelbuild/bazel-buildfarm +[issue #4558]: https://github.com/bazelbuild/bazel/issues/4558 + |