aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/main/java/com/google/devtools/build/lib/remote/SimpleBlobStoreActionCache.java
Commit message (Collapse)AuthorAge
* remote: fix race on download error. Fixes #5047Gravatar buchgr2018-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For downloading output files / directories we trigger all downloads concurrently and asynchronously in the background and after that wait for all downloads to finish. However, if a download failed we did not wait for the remaining downloads to finish but immediately started deleting partial downloads and continued with local execution of the action. That leads to two interesting bugs: * The cleanup procedure races with the downloads that are still in progress. As it tries to delete files and directories, new files and directories are created and that will often lead to "Directory not empty" errors as seen in #5047. * The clean up procedure does not detect the race, succeeds and subsequent local execution fails because not all files have been deleted. The solution is to always wait for all downloads to complete before entering the cleanup routine. Ideally we would also cancel all outstanding downloads, however, that's not as straightfoward as it seems. That is, the j.u.c.Future API does not provide a way to cancel a computation and also wait for that computation actually having determinated. So we'd need to introduce a separate mechanism to cancel downloads. RELNOTES: None PiperOrigin-RevId: 205980446
* remote: concurrent blob downloads. Fixes #5215Gravatar buchgr2018-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces concurrent downloads of action outputs for remote caching/execution. So far, for an action we would download one output after the other which isn't as bad as it sounds as we would typically run dozens or hundreds of actions in parallel. However, for actions with a lot of outputs or graphs that allow limited parallelism we expect this change to positively impact performance. Note, that with this change the AbstractRemoteActionCache will attempt to always download all outputs concurrently. The actual parallelism is controlled by the underlying network transport. The gRPC transport currently enforces no limits on the concurrent calls, which should be fine given that all calls are multiplexed on a single network connection. The HTTP/1.1 transport also enforces no parallelism by default, but I have added the --remote_max_connections=INT flag which allows to specify an upper bound on the number of network connections to be open concurrently. I have introduced this flag as a defensive mechanism for users who's environment might enforce an upper bound on the number of open connections, as with this change its possible for the number of concurrently open connections to dramatically increase (from NumParallelActions to NumParallelActions * SumParallelActionOutputs). A side effect of this change is that it puts the infrastructure for retries and circuit breaking for the HttpBlobStore in place. RELNOTES: None PiperOrigin-RevId: 199005510
* Allow banning symlink action outputs from being uploaded to a remote cache.Gravatar Benjamin Peterson2018-05-03
| | | | | | | | | | | | | | | | | | | This is mostly a roll-forward of 4465dae23de989f1452e93d0a88ac2a289103dd9, which was reverted by fa36d2f48965b127e8fd397348d16e991135bfb6. The main difference is that the new behavior is now gated behind the --noremote_allow_symlink_upload flag. https://docs.google.com/document/d/1gnOYszitgrLVet3sQk-TKGqIcpkkDsc6aw-izoo-d64 is a design proposal to support symlinks in the remote cache, which would render this change moot. I'd like to be able to prevent incorrect cache behavior until that change is implemented, though. This fixes https://github.com/bazelbuild/bazel/issues/4840 (again). Closes #5122. Change-Id: I2136cfe82c2e1a8a9f5856e12a37d42cabd0e299 PiperOrigin-RevId: 195261827
* Automated rollback of commit 4465dae23de989f1452e93d0a88ac2a289103dd9.Gravatar buchgr2018-04-18
| | | | | | | | | | | | | | | | | | | | | *** Reason for rollback *** The no-cache tag is not respected (see b/77857812) and thus this breaks remote caching for all projects with symlink outputs. *** Original change description *** Only allow regular files and directories spawn outputs to be uploaded to a remote cache. The remote cache protocol only knows about regular files and directories. Currently, during action output upload, symlinks are resolved into regular files. This means cached "executions" of an action may have different output file types than the original execution, which can be a footgun. This CL bans symlinks from cachable spawn outputs and fixes http... *** PiperOrigin-RevId: 193338629
* Only allow regular files and directories spawn outputs to be uploaded to a ↵Gravatar Benjamin Peterson2018-03-27
| | | | | | | | | | | | | | | | | | | | remote cache. The remote cache protocol only knows about regular files and directories. Currently, during action output upload, symlinks are resolved into regular files. This means cached "executions" of an action may have different output file types than the original execution, which can be a footgun. This CL bans symlinks from cachable spawn outputs and fixes https://github.com/bazelbuild/bazel/issues/4840. The interface of SpawnCache.CacheHandle.store is refactored: 1. The outputs parameter is removed, since that can be retrieved from the underlying Spawn. 2. It can now throw ExecException in order to fail actions. Closes #4902. Change-Id: I0d1d94d48779b970bb5d0840c66a14c189ab0091 PiperOrigin-RevId: 190608852
* remote/http: support refresh of oauth2 tokens in the remote cache.Gravatar Jakob Buchgraber2018-03-10
| | | | | | Closes #4622. PiperOrigin-RevId: 188595430
* remote: Add interceptor for logging gRPC calls during remote execution/cachingGravatar Googler2018-03-05
| | | | | | | | | | This provides a io.grpc.ClientInterceptor implementation that can be used to log gRPC call information. The interceptor can select a logging handler to use based on the gRPC method being called (Watch, Execute, Write, etc) to build a LogEntry, which can then be logged after the call has finished. Unit tests for the interceptor are included. In this change, the interceptor is never invoked, nor are there any handlers implemented for any gRPC methods. The interceptor also never tries to log any entries. To avoid circular dependency issues (Remote library will depend on logger which depends on remote library for utils), I've factored out the utility classes from the remote library into their own directory/package as part of this change. PiperOrigin-RevId: 187926516
* User-friendlier representation of a missing digest.Gravatar olaola2018-02-08
| | | | | | | | I moved it into DigestUtil preemptively in case we switch to binary instead of hex representation. TESTED=manually RELNOTES: None PiperOrigin-RevId: 185007558
* remote: add directory support for remote caching and executionGravatar Hadrien Chauvin2017-12-20
| | | | | | Add support for directory trees as artifacts. Closes #4011. PiperOrigin-RevId: 179691001
* Moving the RemoteWorker into tools/remote directory.Gravatar olaola2017-12-05
| | | | | | | | This is because I want to add another remote execution related tool, the remote_client, which will use the Remote Execution API to fetch blobs from a remote cache. I will use this tool as part of end-to-end tests for remote execution. TESTED=remote integration tests, presubmit RELNOTES: None PiperOrigin-RevId: 177995895
* Refactor the FileSystem API to allow for different hash functions.Gravatar buchgr2017-11-30
| | | | | | | | | | | | | Refactor the FileSystem class to include the hash function as an instance field. This allows us to have a different hash function per FileSystem and removes technical debt, as currently that's somewhat accomplished by a horrible hack that has a static method to set the hash function for all FileSystem instances. The FileSystem's default hash function remains MD5. RELNOTES: None PiperOrigin-RevId: 177479772
* Change error on adding a directory to the remote cache to be non-fatalGravatar Rahul Malik2017-11-29
| | | | | | | | | | @ulfjack @buchgr - I'm resubmitting https://github.com/bazelbuild/bazel/pull/3984 on behalf of @mterring to get past CLA issues that are holding it up from merging. This is a temporary fix for the issue https://github.com/bazelbuild/bazel/issues/3891 while we wait for https://github.com/bazelbuild/bazel/pull/4011 to be reviewed and tested Closes #4188. PiperOrigin-RevId: 177276751
* remote: remove maximum blob size check.Gravatar Jakob Buchgraber2017-10-25
| | | | | | | | | It's a left over. It has already been removed for file downloads and uploads and is now still used for the action cache up- and downloads. Change-Id: Ic65ad0d34cfda75774131094de916e78b8835ff0 PiperOrigin-RevId: 173364883
* Stream rest cache file uploads.Gravatar Benjamin Peterson2017-10-18
| | | | | | | | | | | | | | | This means we no longer keep large action inputs or outputs in memory during upload. I adjusted the SimpleBlogStore interface to require clients to pass in the InputStream length. That allows us to always set Content-length on uploads. It's polite to do so, so that the server may, e.g., preallocate space for the blob. Fixes https://github.com/bazelbuild/bazel/issues/3250. Change-Id: I944c9dbc35fa2fa80dce523b0133ea9757bb3973 PiperOrigin-RevId: 172433522
* Uploading failed action outputs to the remote cache, because even if the ↵Gravatar olaola2017-09-19
| | | | | | | | | tests fails, we still want to be able to download the logs and other outputs from CAS. This fixes a bug introduced by https://github.com/bazelbuild/bazel/commit/562fcf9f5dfd14daea718f77da95b43b1400689b. To reproduce: run a failing test vs a BES service, the test log would not be uploaded. TESTED=unit tests PiperOrigin-RevId: 169143428
* ActionInputFileCache: move getMetadata to a new super-interfaceGravatar ulfjack2017-09-11
| | | | | | Update the callers that only need getMetadata to use the new interface. PiperOrigin-RevId: 167992239
* remote: close file when uploadingGravatar buchgr2017-08-29
| | | | | RELNOTES: None PiperOrigin-RevId: 166841380
* remote/http: distinquish between action cache and casGravatar buchgr2017-08-29
| | | | | | | | | | | | | | | | | | | | | | | | The remote http caching client now prefixes the URL of an action cache request with 'ac' and cas request with 'cas'. This is useful as servers might want to distinquish between the action cache and the cas. Before this change: (GET|PUT) /f141ae2d23d0f976599e678da1c51d82fedaf8b1 After this change: (GET|PUT) /ac/f141ae2d23d0f976599e678da1c51d82fedaf8b1 (GET|PUT) /cas/f141ae2d23d0f976599e678da1c51d82fedaf8b1 This change will likely break any HTTP caching service that assumed the old URL scheme. TESTED: Manually using https://github.com/buchgr/bazel-remote RELNOTES: The remote HTTP/1.1 caching client (--remote_rest_cache) now distinquishes between action cache and CAS. The request URL for the action cache is prefixed with 'ac' and the URL for the CAS is prefixed with 'cas'. PiperOrigin-RevId: 166831997
* remote: Delete output files created by failed download.Gravatar Jakob Buchgraber2017-07-27
| | | | | | | | | | If a remote download fails, delete any output files that might have already been created. Else, this might intefere with a subsequent locally executed actions that expects none of its output files to exist. See #3452. Change-Id: I467a97d05606c586aa257326213940a37dad9dd5 PiperOrigin-RevId: 163336093
* Fixing #3380: output stack traces and informative messages.Gravatar olaola2017-07-27
| | | | | | | | Also added tests specifically for the output, to ensure we don't break it again. TESTED=remote worker, unit tests RELNOTES: fixes #3380 PiperOrigin-RevId: 163283558
* Rewrite blob upload to use temporary filesGravatar ulfjack2017-07-06
| | | | | | | | | | | | | | Previously, it was allocating in-memory buffers for upload, which caused it to run out of memory on large file uploads (or on many small uploads running simultaneously). Unfortunately, we don't create the temporary file in the right location (due to separation of the BlobStore from the ByteStreamServer), so we copy the file again to write it to the OnDiskBlobStore, which isn't ideal. There'll need to be another BlobStore API change to make that work, when the InMemoryBlobStore is gone. PiperOrigin-RevId: 160974550
* Change the SimpleBlobStore API to use Input/OutputStreamsGravatar ulfjack2017-07-05
| | | | | | | This avoid having to load all the data into memory at once in most cases, especially for large files. PiperOrigin-RevId: 160954187
* Move the SimpleBlobStore and implementations to a subpackageGravatar ulfjack2017-07-05
| | | | | | | | | | Also extend the API to throw exceptions rather than having to wrap or swallow. This is in preparation for adding yet another implementation using an on-disk cache. Having the RemoteWorker keep everything in memory is not good for my sanity. PiperOrigin-RevId: 160871586
* Enable connection pooling for the remote REST cacheGravatar Benjamin Peterson2017-06-27
| | | | | | | | | | | | | | | | | Connection pooling is a useful optimization for REST caches that are far away as it avoids constantly redoing the TCP handshake. It also prevents large builds from exhausting the local interface's source ports through tens of thousands of one-transaction connections. The default connection pool size of 20 is fairly arbitrary. Users probably want to set this to something close to the value of --jobs. We introduce some generic infrastructure for closing remote cache instances and use it to cleanly shutdown the connection pool between builds. Change-Id: I73adc29ecae15cc10a1217ffbaa483892bcd4f9a PiperOrigin-RevId: 160264681
* Simplify the RemoteActionCache interfaceGravatar ulfjack2017-06-27
| | | | | | | | | | | | | | | | | | | - merge all the inputs upload functionality into a single ensureInputsPresent method - merge all of the action result upload functionality into a single upload method - merge all the download functionality into a single download method This significantly simplifies the caller of this interface, and opens the door for additional performance improvements in implementations which now have more control over the upload / download flows; in particular, in the gRPC case, we can upload stdout / stderr using the existing chunker - upload of stdout / stderr is no longer serialized with file upload. In particular, the CachedLocalSpawnRunner test becomes much simpler, since it no longer needs to handle the previous more complex upload code path. PiperOrigin-RevId: 160260161
* Switching Bazel to use the new remote execution API: ↵Gravatar olaola2017-06-09
| | | | | | | | | https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX_A3_axxNnvroblTZ_6s/edit Also refactored away the various *Interface* files, no need since unit testing can be done with mocking the appropriate gRPC Impl classes directly (see tests). This also fixes the RemoteSpawnRunner, which should use different objects for remote caching and remote execution, the same way RemoteSpawnStrategy does. RELNOTES: n/a PiperOrigin-RevId: 158473700
* Remote worker: skip non-existent files after action executionGravatar ulfjack2017-05-30
| | | | | | | | | | | | | | | | | Tests basically always finish with non-existent files, and this is not an error on the server side. Instead, the client (Bazel) checks whether all the files it expects are present. The test didn't catch this because it was silently falling back to local execution. Non-existent files were leading to an IOException, which caused the remote worker to log nothing, and silently return an error with no output. Log errors in the remote worker to make future debugging easier. Fixes #2887. PiperOrigin-RevId: 157400131
* Handle null action inputs in TreeNodeRepositoryGravatar ulfjack2017-05-03
| | | | | | | | The SpawnInputExpander can return null action inputs to indicate that we should create an empty file at the corresponding location, without a corresponding input file. PiperOrigin-RevId: 154832564
* Clone the remote execution implementation into a new classGravatar ulfjack2017-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new RemoteExecutionClient class only performs remote execution, and nothing else; all higher-level functions, local rety, etc. will live outside of the client. In order to add unit tests, I had to add another layer of indirection between the Grpc{RemoteExecutor,ActionCache} and GRPC, since GRPC generates final, non-mockable classes. While a testing approach that uses a fake server can also get some test coverage (as in GrpcActionCacheTest), it doesn't allow us to test the full range of bad things that can happen at the GRPC layer. The cloned implementation uses a single GRPC channel, as was recommended to me by Jakob, who worked on GRPC. A single channel should be sufficiently scalable, it's thread-safe, and it performs chunking internally. On the server-side, the requests from a single channel can be dispatched to a thread pool, so this should not be a blocker for server-side parallelism. I also changed it to throw an exception whenever anything bad happens - this makes it much more obvious if there's still bug in this code; the old code silently swallows many errors, falling back to local execution, which papers over many issues. Furthermore, we now return a RemoteExecutionResult to indicate whether the action ran at all (regardless of exit code), as well as the exit code. All in all, this implementation is closer to the production code we're using internally, although quite a few things are still missing. The cloned implementation is not hooked up to RemoteSpawnStrategy yet. It also does not support combining remote caching with local execution, but note that RemoteSpawnStrategy regressed in that respect and currently also does not support that mode. PiperOrigin-RevId: 151578409
* Refactor simple distributed caching support Gravatar Alpha Lam2017-03-28
The simple distributed caching in Bazel used ConcurrentMap as a blob store. It is incorrect to use an overloaded interface for this purpose. This change defines a SimpleBlobStore interface that only has put(), get() and containsKey() methods and allows a simple implementation of a blob store as remote cache for Bazel. Also updated documentation to summarize the options available in the remote spwan strategy. There is no functional change. TESTED=shell integration tests -- Change-Id: Iedff0bc4f06c4a93c398c53801014d998c3df13b Reviewed-on: https://cr.bazel.build/9330 PiperOrigin-RevId: 151439467 MOS_MIGRATED_REVID=151439467