aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/tools/remote
Commit message (Collapse)AuthorAge
* Pass digest to Chunker construction when available.Gravatar tomlu2018-08-03
| | | | | | | | | * Refactor Chunker constructor to a builder to reduce constructor overload. * Pass digest into this where we have it * Redo ensureInputsPresent to not lose the missing digests during processing so we can pass them to the Chunker constructor. RELNOTES: None PiperOrigin-RevId: 207297915
* Remove Hazelcast dependencyGravatar Philipp Wollermann2018-07-25
| | | | | | | | | | The only remaining use was a testing REST backend in the LRE. I wrote a replacement for that using netty, which we use for our network stuff in Bazel, which means we can now get rid of Hazelcast. :) I'll remove the Hazelcast files in a separate change when this is merged. PiperOrigin-RevId: 205985996
* Synchronize on process factory to inhibit ETXTBSYGravatar George Gensure2018-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | Refocus synchronization mechanism to cope with file descriptor set fork- induced races to more tightly constrain concurrent fork/exec pairs. This problem has been observed in bazel proper repeatedly, exhibiting as the iconic ETXTBSY - Text file busy in wide worker pool builds and tests. Evidence that this was discovered by @buchgr is in the comment and change to the embedded ExecutionService implementation, and the description of the race and the need for the synchronization was lifted from that scope to the JavaSubprocessFactory. This factory is a singleton and represents the gateway to all worker process execution, and serves as the correct lock primitive to ensure that file descriptor sets are not duplicated across forks, which gave rise to this issue. To test this, I demonstrated a reproducer presented at https://bugs.java.com/view_bug.do?bug_id=8068370 with 2.4% of invocations in that pathological case exhibiting the issue. With a functionally equivalent change - synchronizing around a processBuilder.start() call - as the only modification to the reproducer, no further failures of any kind were observed, over several hundred runs. Closes #5556. PiperOrigin-RevId: 203947224
* Bazel server: ensure OutputStreams are closedGravatar laszlocsomor2018-07-05
| | | | | | | | | | | | | | | | | | | Use try-with-resources to ensure OutputStreams that we open via FileSystem.OutputStream(path) are closed. Eagerly closing OutputStreams avoids hanging on to file handles until the garbage collector finalizes the OutputStream, meaning Bazel on Windows (and other processes) can delete or mutate these files. Hopefully this avoids intermittent file deletion errors that sometimes occur on Windows. See https://github.com/bazelbuild/bazel/issues/5512 RELNOTES: none PiperOrigin-RevId: 203342889
* Move HashFunction out of FileSystem, and turn it into a class, instead of an ↵Gravatar ccalvarin2018-06-21
| | | | | | | | | enum. Now that we aren't using enum names for the hash functions, we also accept the standard names, such as SHA-256. RELNOTES: None. PiperOrigin-RevId: 201624286
* remote: concurrent blob downloads. Fixes #5215Gravatar buchgr2018-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces concurrent downloads of action outputs for remote caching/execution. So far, for an action we would download one output after the other which isn't as bad as it sounds as we would typically run dozens or hundreds of actions in parallel. However, for actions with a lot of outputs or graphs that allow limited parallelism we expect this change to positively impact performance. Note, that with this change the AbstractRemoteActionCache will attempt to always download all outputs concurrently. The actual parallelism is controlled by the underlying network transport. The gRPC transport currently enforces no limits on the concurrent calls, which should be fine given that all calls are multiplexed on a single network connection. The HTTP/1.1 transport also enforces no parallelism by default, but I have added the --remote_max_connections=INT flag which allows to specify an upper bound on the number of network connections to be open concurrently. I have introduced this flag as a defensive mechanism for users who's environment might enforce an upper bound on the number of open connections, as with this change its possible for the number of concurrently open connections to dramatically increase (from NumParallelActions to NumParallelActions * SumParallelActionOutputs). A side effect of this change is that it puts the infrastructure for retries and circuit breaking for the HttpBlobStore in place. RELNOTES: None PiperOrigin-RevId: 199005510
* Allow banning symlink action outputs from being uploaded to a remote cache.Gravatar Benjamin Peterson2018-05-03
| | | | | | | | | | | | | | | | | | | This is mostly a roll-forward of 4465dae23de989f1452e93d0a88ac2a289103dd9, which was reverted by fa36d2f48965b127e8fd397348d16e991135bfb6. The main difference is that the new behavior is now gated behind the --noremote_allow_symlink_upload flag. https://docs.google.com/document/d/1gnOYszitgrLVet3sQk-TKGqIcpkkDsc6aw-izoo-d64 is a design proposal to support symlinks in the remote cache, which would render this change moot. I'd like to be able to prevent incorrect cache behavior until that change is implemented, though. This fixes https://github.com/bazelbuild/bazel/issues/4840 (again). Closes #5122. Change-Id: I2136cfe82c2e1a8a9f5856e12a37d42cabd0e299 PiperOrigin-RevId: 195261827
* Don't add -u flag to local remote worker docker command on Windows.Gravatar Googler2018-04-20
| | | | | | | | | | | | | | | | | -u doesn't currently make sense for Windows: https://github.com/docker/for-win/issues/636#issuecomment-293653788 The local remote worker happens to sometimes work on Windows because we would frequently (always?) hit the timeout here (but recently this hasn't been the case for me): https://github.com/bazelbuild/bazel/blob/fa36d2f48965b127e8fd397348d16e991135bfb6/src/tools/remote/src/main/java/com/google/devtools/build/remote/worker/ExecutionServer.java#L323 When we don't hit the timeout (so "id -u" succeeds, and the "-u" flag is passed to docker), we get an error like this: ``` Error response from daemon: container 06851b64e09bab5a930bfb706892785b24c7538c1a7be826fef315ab8e62c117 encountered an error during CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) ``` The method for detecting Windows is the best I could find, similar to other isWindows functions in Bazel. RELNOTES: None. PiperOrigin-RevId: 193666199
* Automated rollback of commit 4465dae23de989f1452e93d0a88ac2a289103dd9.Gravatar buchgr2018-04-18
| | | | | | | | | | | | | | | | | | | | | *** Reason for rollback *** The no-cache tag is not respected (see b/77857812) and thus this breaks remote caching for all projects with symlink outputs. *** Original change description *** Only allow regular files and directories spawn outputs to be uploaded to a remote cache. The remote cache protocol only knows about regular files and directories. Currently, during action output upload, symlinks are resolved into regular files. This means cached "executions" of an action may have different output file types than the original execution, which can be a footgun. This CL bans symlinks from cachable spawn outputs and fixes http... *** PiperOrigin-RevId: 193338629
* Only allow regular files and directories spawn outputs to be uploaded to a ↵Gravatar Benjamin Peterson2018-03-27
| | | | | | | | | | | | | | | | | | | | remote cache. The remote cache protocol only knows about regular files and directories. Currently, during action output upload, symlinks are resolved into regular files. This means cached "executions" of an action may have different output file types than the original execution, which can be a footgun. This CL bans symlinks from cachable spawn outputs and fixes https://github.com/bazelbuild/bazel/issues/4840. The interface of SpawnCache.CacheHandle.store is refactored: 1. The outputs parameter is removed, since that can be retrieved from the underlying Spawn. 2. It can now throw ExecException in order to fail actions. Closes #4902. Change-Id: I0d1d94d48779b970bb5d0840c66a14c189ab0091 PiperOrigin-RevId: 190608852
* Refactor and cleanup the sandboxing code.Gravatar Philipp Wollermann2018-03-23
| | | | | | | | | | | | | - Remove Optional<> where it's not needed. It's nice for return values, but IMHO it was overused in this code (e.g. Optional<List<X>> is an anti-pattern, as the list itself can already signal that it is empty). - Use Bazel's own Path class when dealing with paths, not String or java.io.File. - Move LinuxSandboxUtil into the "sandbox" package. - Remove dead code and unused fields. - Migrate deprecated VFS method calls to their replacements. - Fix a bug in ExecutionStatistics where a FileInputStream was not closed. Closes #4868. PiperOrigin-RevId: 190217476
* Propagating remote results, including stdout/err, to Bazel on execution ↵Gravatar olaola2018-03-16
| | | | | | | | | timeouts. The refactoring to have an Exception that contains partial results will also be used in the next CL, in order to propagate and save remote server logs. RELNOTES: None PiperOrigin-RevId: 189344465
* remote: Add interceptor for logging gRPC calls during remote execution/cachingGravatar Googler2018-03-05
| | | | | | | | | | This provides a io.grpc.ClientInterceptor implementation that can be used to log gRPC call information. The interceptor can select a logging handler to use based on the gRPC method being called (Watch, Execute, Write, etc) to build a LogEntry, which can then be logged after the call has finished. Unit tests for the interceptor are included. In this change, the interceptor is never invoked, nor are there any handlers implemented for any gRPC methods. The interceptor also never tries to log any entries. To avoid circular dependency issues (Remote library will depend on logger which depends on remote library for utils), I've factored out the utility classes from the remote library into their own directory/package as part of this change. PiperOrigin-RevId: 187926516
* remote: Rewrite the HTTP caching client in Netty. Fixes #4481Gravatar buchgr2018-01-26
| | | | | | | | | | | | | | * This puts in the foundation of HTTP/2 support for remote caching. * Allows us to remove the Apache HTTP library as a dependency, reducing the Bazel binary size by 1MiB. On fast networks (i.e. GCE to GCS) we can see a >2x speed improvement for TLS throughput. Even from my workstation to GCS I get significant build time improvements when using Netty's TLS 18s vs 12s. Closes #4481. PiperOrigin-RevId: 183411787
* Always use the JavaIO VFS implementation in the remote worker.Gravatar John Millikin2018-01-12
| | | | | | | | | | | | The JNI implementation doesn't work from a deployable jar. Fixes https://github.com/bazelbuild/bazel/issues/3249 cc @ulfjack Closes #4438. PiperOrigin-RevId: 181746081
* Fix param names - this is going to be enforced with error proneGravatar ulfjack2018-01-08
| | | | PiperOrigin-RevId: 181162816
* Use linux-sandbox via the (new) LinuxSandboxUtil. (second attempt).Gravatar ruperts2017-12-20
| | | | | RELNOTES: None. PiperOrigin-RevId: 179705357
* remote: add directory support for remote caching and executionGravatar Hadrien Chauvin2017-12-20
| | | | | | Add support for directory trees as artifacts. Closes #4011. PiperOrigin-RevId: 179691001
* Automated rollback of commit 52b62164af031c50b7a0584303caad67af5e1d4d.Gravatar aehlig2017-12-20
| | | | | | | | | | | | | *** Reason for rollback *** Breaks //src/test/shell/bazel:bazel_sandboxing_test *** Original change description *** Use linux-sandbox via the (new) LinuxSandboxUtil. RELNOTES: None. PiperOrigin-RevId: 179676894
* Use linux-sandbox via the (new) LinuxSandboxUtil.Gravatar ruperts2017-12-20
| | | | | RELNOTES: None. PiperOrigin-RevId: 179646155
* remote: Add support for Google Cloud Storage.Gravatar Jakob Buchgraber2017-12-18
| | | | | | | | | | | | | | | | | | | | | | Add support for Google Cloud Storage (GCS) as a HTTP caching backend. This commit mainly adds the infrastructure necessary to authenticate to GCS. Using GCS as a caching backend works as follows: 1) Create a new GCS bucket. 2) Create a service account that can read and write to the GCS bucket and generate a JSON credentials file for it. 3) Invoke Bazel as follows: bazel build --remote_rest_cache=https://storage.googleapis.com/<bucket> --auth_enabled --auth_scope=https://www.googleapis.com/auth/devstorage.read_write --auth_credentials=</path/to/creds.json> I'll add simplification's and docs in a subsequent commit. Change-Id: Ie827d7946a2193b97ea7d9aa72eb15f09de2164d PiperOrigin-RevId: 179406380
* Fix: uploading artifacts of failed actions to remote cache stopped working.Gravatar olaola2017-12-11
| | | | | | | | | | | | | | | | To reproduce: run a failing test with --experimental_remote_spawn_cache or with --spawn_strategy=remote and no executor. Expected: test log is uploaded. Desired behavior: - regardless of whether a spawn is cacheable or not, its artifacts should be uploaded to the remote cache. - the spawn result should only be set if the spawn is cacheable *and* the action succeeded. - when executing remotely, the do_not_cache field should be set for non-cacheable spawns, and the remote execution engine should respect it. This CL contains multiple fixes to ensure the above behaviors, and adds a few tests, both end to end and unit tests. Important behavior change: it is no longer assumed that non-cacheable spawns should use a NO_CACHE SpawnCache! The appropriate test case was removed. Instead, an assumption was added that all implementations of SpawnCache should respect the Spawns.mayBeCached(spawn) property. Currently, only NO_CACHE and RemoteSpawnCache exist, and they (now) support it. TESTED=remote build execution backend. WANT_LGTM: philwo,buchgr RELNOTES: None PiperOrigin-RevId: 178617937
* Moving the RemoteWorker into tools/remote directory.Gravatar olaola2017-12-05
This is because I want to add another remote execution related tool, the remote_client, which will use the Remote Execution API to fetch blobs from a remote cache. I will use this tool as part of end-to-end tests for remote execution. TESTED=remote integration tests, presubmit RELNOTES: None PiperOrigin-RevId: 177995895