aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnStrategy.java
Commit message (Collapse)AuthorAge
* Plumb exec root through to all spawn runners.Gravatar tomlu2018-01-11
| | | | | | They need this to parse input manifests. Previously we would grab the exec root from the Root, but wish to unsupport this. PiperOrigin-RevId: 181669143
* AbstractSpawnStrategy: use ActionExecutionContext.getVerboseFailuresGravatar ulfjack2017-08-11
| | | | | | | ...instead of injecting it through the constructor. Simplify all the callers accordingly. PiperOrigin-RevId: 164955391
* Extract a common AbstractSpawnStrategy parent classGravatar ulfjack2017-07-24
| | | | | | This removes a bunch of code duplication that I previously introduced. PiperOrigin-RevId: 162909430
* Move ActionInputPrefetcher to the actions packageGravatar ulfjack2017-07-21
| | | | | | The plan is to add it to ActionExecutionContext, which is also there. PiperOrigin-RevId: 162656835
* Extend the SpawnRunner APIGravatar ulfjack2017-07-17
| | | | | | | | | | | | | - add an id for logging; this allows us to correlate log entries for the same spawn from multiple spawn runner implementations in the future - add a prefetch method to the SpawnExecutionPolicy; better than relying on the ActionInputPrefetcher being injected in the constructor - add a name parameter to the report method; this is in preparation for a single unified SpawnStrategy implementation - it's basically the last bit of difference between SandboxStrategy and RemoteSpawnStrategy; they're otherwise equivalent (if not identical) PiperOrigin-RevId: 162194684
* Simplify exception handling in spawn strategiesGravatar ulfjack2017-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | The main change here is to only catch SpawnExecException in StandaloneTestStrategy, so all other exceptions simplify propagate up. As a result, Bazel no longer retries tests that fail with an exception, we only retry tests that actually ran, had a spawn result, and resulted in a UserExecException. That is probably what we want. Also do some cleanup: - Remove ExecException.timedOut; nobody was calling it (but there's still SpawnExecException.timedOut) - Remove SpawnActionContext.shouldPropagateExecException; all exceptions (except SpawnExecException) are now propagated by default - Remote the SandboxOptions from the SandboxStrategies; all sandboxing options are now handled by the underlying SpawnRunner implementations I'll send a followup CL to remove the UserExecException and EnvironmentalExecException types; the types don't do anything special, and there are no catch blocks in production code that catch one of these more specific types. This should fix #3322 by removing a bunch of special handling. PiperOrigin-RevId: 161960919
* Bring back the very useful stacktrace printouts on --verbose_failures (#3380).Gravatar olaola2017-07-14
| | | | | | TESTED=remote worker RELNOTES: fixes #3380 PiperOrigin-RevId: 161922635
* Rewrite all the sandbox strategy implementationsGravatar ulfjack2017-07-12
| | | | | | | | | | | | | | - Make use of existing abstractions like SpawnRunner and SpawnExecutionPolicy. - Instead of having the *Strategy create a *Runner, and then call back into SandboxStrategy, create a single SandboxContainer which contains the full command line, environment, and everything needed to create and delete the sandbox directory. - Do all the work in SandboxStrategy, including creation and deletion of the sandbox directory. - Use SpawnResult instead of throwing, catching, and rethrowing. - Simplify the control flow a bit. PiperOrigin-RevId: 161644979
* Reimplement RemoteSpawnStrategy on top of RemoteSpawnRunnerGravatar ulfjack2017-07-10
| | | | | | | | | | | | | | | | | It is intentional that RemoteSpawnStrategy no longer contains any remote execution specific code. My intent is to merge all SpawnStrategy implementations into a single class (similar to the new RemoteSpawnStrategy), and delegate all the specific work to SpawnRunner implementations. However, we're not there yet, and we still need to be able to look up SpawnStrategy implementations by name through the annotations, so we still need separate classes for now. We might also want to have a shared test suite for all SpawnRunner instances that checks for basic compliance with the specification. Progress on #1531. PiperOrigin-RevId: 161377751
* Refactor RemoteSpawn{Strategy,Runner}Gravatar ulfjack2017-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Make the RemoteSpawnRunner match RemoteSpawnStrategy functionality, including local fallback, remote caching, and execution. This is done so we can actually finish the migration to the SpawnRunner API, for which I've had a pending change for several months now. - Never throw StatusRuntimeException from the GrpcRemoteCache or the GrpcExecutor. We almost never do, since the Retrier catches them implcitly, so a number of catch blocks were already unreachable. Carefully document the cases where we still need to handle it. - RemoteSpawnStrategy / RemoteSpawnRunner no longer catch gRPC-specific exceptions; they should be able to handle any reasonable remote caching / execution implementation (except we don't have a common interface for GrpcRemoteExecutor yet), with no dependency on gRPC as such. Note that the RemoteSpawnStrategy class will actually go away after the SpawnRunner migration (eventually). - However, ensure that we _are_ actually throwing CacheNotFoundException; the retrier implicitly catches that also, so we need to manually unwrap from RetryException. - Don't call into the EventHandler from RemoteSpawnStrategy; instead, throw an exception with the message, and let the higher levels handle the reporting (we only allow this for exception + local fallback, for which there's no good reporting API right now). PiperOrigin-RevId: 161195666
* Move the remote initialization code out of RemoteSpawnStrategyGravatar ulfjack2017-07-07
| | | | | | | | This is in preparation for making it use the RemoteSpawnRunner, as part of which it will no longer need to do that. Also, Java style says you shouldn't do work in the constructor, and it's better dependency injection. PiperOrigin-RevId: 161071134
* Allow the remote server to set its own default action timeout when Bazel ↵Gravatar olaola2017-07-05
| | | | | | | does not have one. Hardcoded value of 120 seconds is clearly a mistake. TESTED=remote worker PiperOrigin-RevId: 160891214
* Better error handling in RemoteSpawnStrategyGravatar ulfjack2017-07-05
| | | | | | Make sure that we print the failing command / target in all cases. PiperOrigin-RevId: 160881591
* Rename GrpcActionCache to GrpcRemoteCacheGravatar ulfjack2017-07-05
| | | | PiperOrigin-RevId: 160872755
* Implement retry logic for the gRPC calls in remote execution and caching. TheGravatar olaola2017-06-30
| | | | | | | | retry strategy may need tuning. Other behavior changes: swallowing gRPC CANCELLED errors when the thread is interrupted, as these are expected and just make debugging difficult. Also, distinguishing between the gRPC DEADLINE_EXCEEDED caused by the actual command timing out on the server vs. other causes (the former should not be retriable, while the latter should retry). TESTED=unit tests, remote worker on Bazel PiperOrigin-RevId: 160605830
* Rename RemoteUtils to GrpcUtilsGravatar ulfjack2017-06-28
| | | | PiperOrigin-RevId: 160285362
* Enable connection pooling for the remote REST cacheGravatar Benjamin Peterson2017-06-27
| | | | | | | | | | | | | | | | | Connection pooling is a useful optimization for REST caches that are far away as it avoids constantly redoing the TCP handshake. It also prevents large builds from exhausting the local interface's source ports through tens of thousands of one-transaction connections. The default connection pool size of 20 is fairly arbitrary. Users probably want to set this to something close to the value of --jobs. We introduce some generic infrastructure for closing remote cache instances and use it to cleanly shutdown the connection pool between builds. Change-Id: I73adc29ecae15cc10a1217ffbaa483892bcd4f9a PiperOrigin-RevId: 160264681
* Simplify the RemoteActionCache interfaceGravatar ulfjack2017-06-27
| | | | | | | | | | | | | | | | | | | - merge all the inputs upload functionality into a single ensureInputsPresent method - merge all of the action result upload functionality into a single upload method - merge all the download functionality into a single download method This significantly simplifies the caller of this interface, and opens the door for additional performance improvements in implementations which now have more control over the upload / download flows; in particular, in the gRPC case, we can upload stdout / stderr using the existing chunker - upload of stdout / stderr is no longer serialized with file upload. In particular, the CachedLocalSpawnRunner test becomes much simpler, since it no longer needs to handle the previous more complex upload code path. PiperOrigin-RevId: 160260161
* Only create a single per-build instance of the remote cache / executorGravatar ulfjack2017-06-22
| | | | | | | Fixes #3189. Fixes #2823. PiperOrigin-RevId: 159699146
* Rewrite the Executor/ActionExecutionContext splitGravatar ulfjack2017-06-19
| | | | | | | Move everything to ActionExecutionContext, and drop Executor whereever possible. This clarifies the API, makes it simpler to test, and simplifies the code. PiperOrigin-RevId: 159414816
* Rewrite StandaloneSpawnStrategy to use LocalSpawnRunnerGravatar ulfjack2017-06-19
| | | | PiperOrigin-RevId: 159221067
* Switching to Watcher API instead of wait_for_completion, in preparation forGravatar olaola2017-06-14
| | | | | | | | | deprecating the wait_for_completion field. Note on errors: in the RemoteWorker, I currently handle all errors as onError of the watch call. Other options are: pass them as the operation error field, and pass some of them as the onError of the execute call. For now, I'm just using the simplest option; the Bazel client is ready to handle all possible options. RELNOTES: none PiperOrigin-RevId: 158974207
* Fix test.xml download for failing testsGravatar ulfjack2017-06-09
| | | | PiperOrigin-RevId: 158503746
* Switching Bazel to use the new remote execution API: ↵Gravatar olaola2017-06-09
| | | | | | | | | https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX_A3_axxNnvroblTZ_6s/edit Also refactored away the various *Interface* files, no need since unit testing can be done with mocking the appropriate gRPC Impl classes directly (see tests). This also fixes the RemoteSpawnRunner, which should use different objects for remote caching and remote execution, the same way RemoteSpawnStrategy does. RELNOTES: n/a PiperOrigin-RevId: 158473700
* Fix: remote execution was ignoring the exit codeGravatar Ulf Adams2017-06-06
| | | | | | | | | | | | | | | | | | | | I likely broke that in 7f6e27f, although it was a pre-existing condition: previously, the remote worker was reporting non-zero exit as a failure. Now it reports the run as successful with a non-zero exit code. Update RemoteSpawnStrategy to handle the exit code and generate an appropriate error. In the future, we won't throw exceptions for non-zero exit at this level, but instead report the non-zero exit in the SpawnResult, and have the caller inspect that (and generate the error if applicable). Fixes #3121. Change-Id: Ia39f5c2ef5622544285c1957bb9ebae89e58edf2 Closes #3130. Change-Id: Ia39f5c2ef5622544285c1957bb9ebae89e58edf2 PiperOrigin-RevId: 158120222
* Remote tests should not depend on lib:runtimeGravatar buchgr2017-05-31
| | | | | | | | Move AuthAndTLSOptions to its own package, so that tests/remote no longer depends on lib:runtime. RELNOTES: None. PiperOrigin-RevId: 157469629
* RemoteSpawnStrategy: don't round-trip through StringGravatar ulfjack2017-05-30
| | | | | | Instead, print the downloaded bytes directly to stdout / stderr. PiperOrigin-RevId: 157435933
* Remote+BES: Stabilize command line flags.Gravatar buchgr2017-05-22
| | | | | | | | | | | | | | | | | | | | Update the command line flags used by remote execution/caching as well as the build event service (BES). Major changes: - Remote execution/caching and BES share flags for authentication and TLS. - Removed API Key authentication from BES, as it's not being used. - Add TLS support to BES upload. - Add --bes_project_id flag. If set, the value is propagated as part of BES lifecycle events. For reviewers: Start your review at CommonRemoteAndBesOptions, BuildEventServiceOptions and RemoteOptions. The other changes are mostly automatic IDE renames of fields and flag updates in shell script tests. RELNOTES: None. PiperOrigin-RevId: 156553857
* Also download stdout & stderr in case of a cache hitGravatar ulfjack2017-04-20
| | | | | | Fixes #1413. PiperOrigin-RevId: 153684106
* OnePlatform auth support for Bazel, in preparation for next version of the API.Gravatar olaola2017-04-20
| | | | | | TESTED: local server RELNOTES: n/a PiperOrigin-RevId: 153599636
* Fix RemoteSpawnStrategy to upload stdout/stderr to the remote cacheGravatar ulfjack2017-04-19
| | | | | | | | | | This is already fixed in the CachedLocalSpawnRunner, with tests there, which will replace RemoteSpawnStrategy in the near future. For now, I'd like to get this in in time for 0.5.0 to get test caching working. Fixes #1413. PiperOrigin-RevId: 153486592
* Re-enabling the remote caching without remote execution code path, which wasGravatar olaola2017-04-12
| | | | | | | | accidentally regressed. TESTED=local RemoteWorker without work_path RELNOTES: n/a PiperOrigin-RevId: 152806430
* Clone the remote execution implementation into a new classGravatar ulfjack2017-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new RemoteExecutionClient class only performs remote execution, and nothing else; all higher-level functions, local rety, etc. will live outside of the client. In order to add unit tests, I had to add another layer of indirection between the Grpc{RemoteExecutor,ActionCache} and GRPC, since GRPC generates final, non-mockable classes. While a testing approach that uses a fake server can also get some test coverage (as in GrpcActionCacheTest), it doesn't allow us to test the full range of bad things that can happen at the GRPC layer. The cloned implementation uses a single GRPC channel, as was recommended to me by Jakob, who worked on GRPC. A single channel should be sufficiently scalable, it's thread-safe, and it performs chunking internally. On the server-side, the requests from a single channel can be dispatched to a thread pool, so this should not be a blocker for server-side parallelism. I also changed it to throw an exception whenever anything bad happens - this makes it much more obvious if there's still bug in this code; the old code silently swallows many errors, falling back to local execution, which papers over many issues. Furthermore, we now return a RemoteExecutionResult to indicate whether the action ran at all (regardless of exit code), as well as the exit code. All in all, this implementation is closer to the production code we're using internally, although quite a few things are still missing. The cloned implementation is not hooked up to RemoteSpawnStrategy yet. It also does not support combining remote caching with local execution, but note that RemoteSpawnStrategy regressed in that respect and currently also does not support that mode. PiperOrigin-RevId: 151578409
* Refactor simple distributed caching support Gravatar Alpha Lam2017-03-28
| | | | | | | | | | | | | | | | | | | The simple distributed caching in Bazel used ConcurrentMap as a blob store. It is incorrect to use an overloaded interface for this purpose. This change defines a SimpleBlobStore interface that only has put(), get() and containsKey() methods and allows a simple implementation of a blob store as remote cache for Bazel. Also updated documentation to summarize the options available in the remote spwan strategy. There is no functional change. TESTED=shell integration tests -- Change-Id: Iedff0bc4f06c4a93c398c53801014d998c3df13b Reviewed-on: https://cr.bazel.build/9330 PiperOrigin-RevId: 151439467 MOS_MIGRATED_REVID=151439467
* Deja-vu: Using an ActionInputFileCache for SHA1 digests used with remote ↵Gravatar Ola Rozenfeld2017-03-27
| | | | | | | | | | | | | execution. If you're feeling like you've already seen this, that's correct, these were the exact contents of commit e860316559eac366d47923a8eb4b5489a661aa35... and then, on Nov 15, something unclear happened and the code disappeared! Perhaps it was the result of a faulty sync. In any case, nobody noticed, and the CL went in. It was later rolloed back and resubmitted, but the crucial code changes were gone. TESTED=local server with profiling for SHA1 specifically RELNOTES: n/a -- PiperOrigin-RevId: 151139685 MOS_MIGRATED_REVID=151139685
* Use SpawnInputExpander in the remote spawn strategy to expand runfiles treesGravatar Ulf Adams2017-03-24
| | | | | | | | | | This fixes remote test execution. Fixes #1593. -- PiperOrigin-RevId: 151030133 MOS_MIGRATED_REVID=151030133
* Adding a temporary flag to Bazel to allow Platform override from the commandGravatar Ola Rozenfeld2017-03-14
| | | | | | | | | | | | | line. This will be used during development to test new toolchains in docker containers. Example usage: --experimental_remote_platform_override='entry:{ name:"a" value:"b" } entry:{ name:"c" value:"d" }' TESTED=local server -- PiperOrigin-RevId: 149933081 MOS_MIGRATED_REVID=149933081
* Also report the spawn from the remote strategyGravatar Ulf Adams2017-03-10
| | | | | | | | | | It's silly that we require every spawn strategy to do this individually, and the new spawn scheduler will fix this. However, it's useful to add this for debugging. -- PiperOrigin-RevId: 149743992 MOS_MIGRATED_REVID=149743992
* Remove all the action resource estimation codeGravatar Ulf Adams2017-03-06
| | | | | | -- PiperOrigin-RevId: 149110466 MOS_MIGRATED_REVID=149110466
* Rationalize local resource acquisitionGravatar Ulf Adams2017-03-03
| | | | | | | | | | | | | | | | | | | | | | Move all local resource acquisition to where local execution actually happens. Don't attempt to acquire resources per action, but only for individual spawns. This significantly simplifies the code. The downside is that we don't account for action-level work anymore. In general, actions should not perform any process execution themselves, but always delegate such work to a SpawnStrategy implementation. This change makes sure that every Spawn has local resources set in a way that is consistent with the previous state. However, there are two actions - Fileset and FileWrite -, which are not spawns, and so we now don't limit their concurrent execution anymore. For Fileset, all work is done in a custom Fileset-specific thread pool, so this shouldn't be a problem. I'm not sure about FileWriteAction. -- PiperOrigin-RevId: 149012600 MOS_MIGRATED_REVID=149012600
* Adding a small flag allowing to control remote caching without remote ↵Gravatar Ola Rozenfeld2017-03-03
| | | | | | | | execution. Usually it is enabled, and will be triggered whenever a remote_cache is specified, but a remote_worker is not; however, this flag allows to specifically disable it for the cases where remote_worker is defined, but something went wrong and we were forced to execute locally. This is useful to save time when the remote API does not actually support setting the remote action result. -- PiperOrigin-RevId: 149007755 MOS_MIGRATED_REVID=149007755
* Make --hazelcast_node and --remote_cache options mutually exclusive Gravatar Marcin Maliszkiewicz2017-01-20
| | | | | | | | | | Specifying both options can cause OOM on OSX. -- Change-Id: I52daf194a8840f9e63f1d537f13152e53f8436a7 Reviewed-on: https://cr.bazel.build/8220 PiperOrigin-RevId: 145079331 MOS_MIGRATED_REVID=145079331
* Creating separate instances of CAS and execution handlers for every action. ↵Gravatar Ola Rozenfeld2016-12-16
| | | | | | | | | | This allows Bazel to talk to multiple instances of the server, if these exist, enabling server-side parallelism (due to using separate gRPC channels). TESTED: internally and local server -- PiperOrigin-RevId: 142262973 MOS_MIGRATED_REVID=142262973
* Add a flag for disabling local fallback for actions that failed remotely.Gravatar Ola Rozenfeld2016-12-13
| | | | | | -- PiperOrigin-RevId: 141817345 MOS_MIGRATED_REVID=141817345
* Debugging flag (will rarely be used by actual users) that disables remoteGravatar Ola Rozenfeld2016-12-13
| | | | | | | | execution cache. -- PiperOrigin-RevId: 141807596 MOS_MIGRATED_REVID=141807596
* Printing the stack trace of remote failures on --verbose_failures.Gravatar Ola Rozenfeld2016-12-13
| | | | | | | | Helps debugging. -- PiperOrigin-RevId: 141802189 MOS_MIGRATED_REVID=141802189
* Attempt #2 to submit commit e860316559eac366d47923a8eb4b5489a661aa35: Using ↵Gravatar Ola Rozenfeld2016-11-24
| | | | | | | | | | an ActionFileInputCache for SHA1 digests used with remote execution. I missed an important test failure -- turned out we have a remote execution test that does not live under lib/remote. -- MOS_MIGRATED_REVID=140135277
* Rollback of commit e860316559eac366d47923a8eb4b5489a661aa35.Gravatar Alex Humesky2016-11-21
| | | | | | | *** Reason for rollback *** -- MOS_MIGRATED_REVID=139640949
* Using an ActionFileInputCache for SHA1 digests used with remote execution.Gravatar Ola Rozenfeld2016-11-21
| | | | | -- MOS_MIGRATED_REVID=139613925
* Description redacted.Gravatar Ola Rozenfeld2016-09-20
| | | | | -- MOS_MIGRATED_REVID=133584935