| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
They need this to parse input manifests. Previously we would grab the exec root from the Root, but wish to unsupport this.
PiperOrigin-RevId: 181669143
|
|
|
|
|
|
|
| |
...instead of injecting it through the constructor. Simplify all the callers
accordingly.
PiperOrigin-RevId: 164955391
|
|
|
|
|
|
| |
This removes a bunch of code duplication that I previously introduced.
PiperOrigin-RevId: 162909430
|
|
|
|
|
|
| |
The plan is to add it to ActionExecutionContext, which is also there.
PiperOrigin-RevId: 162656835
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add an id for logging; this allows us to correlate log entries for the same
spawn from multiple spawn runner implementations in the future
- add a prefetch method to the SpawnExecutionPolicy; better than relying on
the ActionInputPrefetcher being injected in the constructor
- add a name parameter to the report method; this is in preparation for a
single unified SpawnStrategy implementation - it's basically the last bit of
difference between SandboxStrategy and RemoteSpawnStrategy; they're otherwise
equivalent (if not identical)
PiperOrigin-RevId: 162194684
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is to only catch SpawnExecException in
StandaloneTestStrategy, so all other exceptions simplify propagate up. As a
result, Bazel no longer retries tests that fail with an exception, we only
retry tests that actually ran, had a spawn result, and resulted in a
UserExecException. That is probably what we want.
Also do some cleanup:
- Remove ExecException.timedOut; nobody was calling it (but there's still
SpawnExecException.timedOut)
- Remove SpawnActionContext.shouldPropagateExecException; all exceptions
(except SpawnExecException) are now propagated by default
- Remote the SandboxOptions from the SandboxStrategies; all sandboxing options
are now handled by the underlying SpawnRunner implementations
I'll send a followup CL to remove the UserExecException and
EnvironmentalExecException types; the types don't do anything special, and
there are no catch blocks in production code that catch one of these more
specific types.
This should fix #3322 by removing a bunch of special handling.
PiperOrigin-RevId: 161960919
|
|
|
|
|
|
| |
TESTED=remote worker
RELNOTES: fixes #3380
PiperOrigin-RevId: 161922635
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make use of existing abstractions like SpawnRunner and SpawnExecutionPolicy.
- Instead of having the *Strategy create a *Runner, and then call back into
SandboxStrategy, create a single SandboxContainer which contains the full
command line, environment, and everything needed to create and delete the
sandbox directory.
- Do all the work in SandboxStrategy, including creation and deletion of the
sandbox directory.
- Use SpawnResult instead of throwing, catching, and rethrowing.
- Simplify the control flow a bit.
PiperOrigin-RevId: 161644979
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is intentional that RemoteSpawnStrategy no longer contains any remote
execution specific code. My intent is to merge all SpawnStrategy
implementations into a single class (similar to the new RemoteSpawnStrategy),
and delegate all the specific work to SpawnRunner implementations. However,
we're not there yet, and we still need to be able to look up SpawnStrategy
implementations by name through the annotations, so we still need separate
classes for now.
We might also want to have a shared test suite for all SpawnRunner instances
that checks for basic compliance with the specification.
Progress on #1531.
PiperOrigin-RevId: 161377751
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make the RemoteSpawnRunner match RemoteSpawnStrategy functionality, including
local fallback, remote caching, and execution. This is done so we can
actually finish the migration to the SpawnRunner API, for which I've had a
pending change for several months now.
- Never throw StatusRuntimeException from the GrpcRemoteCache or the
GrpcExecutor. We almost never do, since the Retrier catches them implcitly,
so a number of catch blocks were already unreachable. Carefully document
the cases where we still need to handle it.
- RemoteSpawnStrategy / RemoteSpawnRunner no longer catch gRPC-specific
exceptions; they should be able to handle any reasonable remote caching /
execution implementation (except we don't have a common interface for
GrpcRemoteExecutor yet), with no dependency on gRPC as such. Note that the
RemoteSpawnStrategy class will actually go away after the SpawnRunner
migration (eventually).
- However, ensure that we _are_ actually throwing CacheNotFoundException;
the retrier implicitly catches that also, so we need to manually unwrap
from RetryException.
- Don't call into the EventHandler from RemoteSpawnStrategy; instead, throw
an exception with the message, and let the higher levels handle the
reporting (we only allow this for exception + local fallback, for which
there's no good reporting API right now).
PiperOrigin-RevId: 161195666
|
|
|
|
|
|
|
|
| |
This is in preparation for making it use the RemoteSpawnRunner, as part of
which it will no longer need to do that. Also, Java style says you shouldn't
do work in the constructor, and it's better dependency injection.
PiperOrigin-RevId: 161071134
|
|
|
|
|
|
|
| |
does not have one. Hardcoded value of 120 seconds is clearly a mistake.
TESTED=remote worker
PiperOrigin-RevId: 160891214
|
|
|
|
|
|
| |
Make sure that we print the failing command / target in all cases.
PiperOrigin-RevId: 160881591
|
|
|
|
| |
PiperOrigin-RevId: 160872755
|
|
|
|
|
|
|
|
| |
retry strategy may need tuning.
Other behavior changes: swallowing gRPC CANCELLED errors when the thread is interrupted, as these are expected and just make debugging difficult. Also, distinguishing between the gRPC DEADLINE_EXCEEDED caused by the actual command timing out on the server vs. other causes (the former should not be retriable, while the latter should retry).
TESTED=unit tests, remote worker on Bazel
PiperOrigin-RevId: 160605830
|
|
|
|
| |
PiperOrigin-RevId: 160285362
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Connection pooling is a useful optimization for REST caches that are
far away as it avoids constantly redoing the TCP handshake. It also
prevents large builds from exhausting the local interface's source
ports through tens of thousands of one-transaction connections.
The default connection pool size of 20 is fairly arbitrary. Users
probably want to set this to something close to the value of --jobs.
We introduce some generic infrastructure for closing remote cache
instances and use it to cleanly shutdown the connection pool between
builds.
Change-Id: I73adc29ecae15cc10a1217ffbaa483892bcd4f9a
PiperOrigin-RevId: 160264681
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- merge all the inputs upload functionality into a single ensureInputsPresent
method
- merge all of the action result upload functionality into a single upload
method
- merge all the download functionality into a single download method
This significantly simplifies the caller of this interface, and opens the door
for additional performance improvements in implementations which now have more
control over the upload / download flows; in particular, in the gRPC case, we
can upload stdout / stderr using the existing chunker - upload of stdout /
stderr is no longer serialized with file upload.
In particular, the CachedLocalSpawnRunner test becomes much simpler, since it
no longer needs to handle the previous more complex upload code path.
PiperOrigin-RevId: 160260161
|
|
|
|
|
|
|
| |
Fixes #3189.
Fixes #2823.
PiperOrigin-RevId: 159699146
|
|
|
|
|
|
|
| |
Move everything to ActionExecutionContext, and drop Executor whereever possible.
This clarifies the API, makes it simpler to test, and simplifies the code.
PiperOrigin-RevId: 159414816
|
|
|
|
| |
PiperOrigin-RevId: 159221067
|
|
|
|
|
|
|
|
|
| |
deprecating the wait_for_completion field.
Note on errors: in the RemoteWorker, I currently handle all errors as onError of the watch call. Other options are: pass them as the operation error field, and pass some of them as the onError of the execute call. For now, I'm just using the simplest option; the Bazel client is ready to handle all possible options.
RELNOTES: none
PiperOrigin-RevId: 158974207
|
|
|
|
| |
PiperOrigin-RevId: 158503746
|
|
|
|
|
|
|
|
|
| |
https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX_A3_axxNnvroblTZ_6s/edit
Also refactored away the various *Interface* files, no need since unit testing can be done with mocking the appropriate gRPC Impl classes directly (see tests). This also fixes the RemoteSpawnRunner, which should use different objects for remote caching and remote execution, the same way RemoteSpawnStrategy does.
RELNOTES: n/a
PiperOrigin-RevId: 158473700
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I likely broke that in 7f6e27f, although it was a pre-existing condition:
previously, the remote worker was reporting non-zero exit as a failure. Now it
reports the run as successful with a non-zero exit code. Update
RemoteSpawnStrategy to handle the exit code and generate an appropriate error.
In the future, we won't throw exceptions for non-zero exit at this level, but
instead report the non-zero exit in the SpawnResult, and have the caller
inspect that (and generate the error if applicable).
Fixes #3121.
Change-Id: Ia39f5c2ef5622544285c1957bb9ebae89e58edf2
Closes #3130.
Change-Id: Ia39f5c2ef5622544285c1957bb9ebae89e58edf2
PiperOrigin-RevId: 158120222
|
|
|
|
|
|
|
|
| |
Move AuthAndTLSOptions to its own package, so that tests/remote no longer
depends on lib:runtime.
RELNOTES: None.
PiperOrigin-RevId: 157469629
|
|
|
|
|
|
| |
Instead, print the downloaded bytes directly to stdout / stderr.
PiperOrigin-RevId: 157435933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the command line flags used by remote execution/caching as well as the
build event service (BES).
Major changes:
- Remote execution/caching and BES share flags for authentication and TLS.
- Removed API Key authentication from BES, as it's not being used.
- Add TLS support to BES upload.
- Add --bes_project_id flag. If set, the value is propagated as part of BES
lifecycle events.
For reviewers:
Start your review at CommonRemoteAndBesOptions, BuildEventServiceOptions and
RemoteOptions. The other changes are mostly automatic IDE renames of fields and
flag updates in shell script tests.
RELNOTES: None.
PiperOrigin-RevId: 156553857
|
|
|
|
|
|
| |
Fixes #1413.
PiperOrigin-RevId: 153684106
|
|
|
|
|
|
| |
TESTED: local server
RELNOTES: n/a
PiperOrigin-RevId: 153599636
|
|
|
|
|
|
|
|
|
|
| |
This is already fixed in the CachedLocalSpawnRunner, with tests there, which
will replace RemoteSpawnStrategy in the near future. For now, I'd like to get
this in in time for 0.5.0 to get test caching working.
Fixes #1413.
PiperOrigin-RevId: 153486592
|
|
|
|
|
|
|
|
| |
accidentally regressed.
TESTED=local RemoteWorker without work_path
RELNOTES: n/a
PiperOrigin-RevId: 152806430
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new RemoteExecutionClient class only performs remote execution, and
nothing else; all higher-level functions, local rety, etc. will live outside
of the client.
In order to add unit tests, I had to add another layer of indirection between
the Grpc{RemoteExecutor,ActionCache} and GRPC, since GRPC generates final,
non-mockable classes. While a testing approach that uses a fake server can
also get some test coverage (as in GrpcActionCacheTest), it doesn't allow us
to test the full range of bad things that can happen at the GRPC layer.
The cloned implementation uses a single GRPC channel, as was recommended to
me by Jakob, who worked on GRPC. A single channel should be sufficiently
scalable, it's thread-safe, and it performs chunking internally. On the
server-side, the requests from a single channel can be dispatched to a thread
pool, so this should not be a blocker for server-side parallelism.
I also changed it to throw an exception whenever anything bad happens - this
makes it much more obvious if there's still bug in this code; the old code
silently swallows many errors, falling back to local execution, which papers
over many issues.
Furthermore, we now return a RemoteExecutionResult to indicate whether the
action ran at all (regardless of exit code), as well as the exit code.
All in all, this implementation is closer to the production code we're using
internally, although quite a few things are still missing.
The cloned implementation is not hooked up to RemoteSpawnStrategy yet. It
also does not support combining remote caching with local execution, but note
that RemoteSpawnStrategy regressed in that respect and currently also does
not support that mode.
PiperOrigin-RevId: 151578409
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The simple distributed caching in Bazel used ConcurrentMap as a blob store. It is incorrect to use an overloaded interface for this purpose.
This change defines a SimpleBlobStore interface that only has put(), get() and containsKey()
methods and allows a simple implementation of a blob store as remote cache for Bazel.
Also updated documentation to summarize the options available in the remote spwan strategy.
There is no functional change.
TESTED=shell integration tests
--
Change-Id: Iedff0bc4f06c4a93c398c53801014d998c3df13b
Reviewed-on: https://cr.bazel.build/9330
PiperOrigin-RevId: 151439467
MOS_MIGRATED_REVID=151439467
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
execution.
If you're feeling like you've already seen this, that's correct, these were the exact contents of commit e860316559eac366d47923a8eb4b5489a661aa35... and then, on Nov 15, something unclear happened and the code disappeared! Perhaps it was the result of a faulty sync. In any case, nobody noticed, and the CL went in. It was later rolloed back and resubmitted, but the crucial code changes were gone.
TESTED=local server with profiling for SHA1 specifically
RELNOTES: n/a
--
PiperOrigin-RevId: 151139685
MOS_MIGRATED_REVID=151139685
|
|
|
|
|
|
|
|
|
|
| |
This fixes remote test execution.
Fixes #1593.
--
PiperOrigin-RevId: 151030133
MOS_MIGRATED_REVID=151030133
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
line. This will be used during development to test new toolchains in docker
containers.
Example usage: --experimental_remote_platform_override='entry:{ name:"a" value:"b" } entry:{ name:"c" value:"d" }'
TESTED=local server
--
PiperOrigin-RevId: 149933081
MOS_MIGRATED_REVID=149933081
|
|
|
|
|
|
|
|
|
|
| |
It's silly that we require every spawn strategy to do this individually, and
the new spawn scheduler will fix this. However, it's useful to add this for
debugging.
--
PiperOrigin-RevId: 149743992
MOS_MIGRATED_REVID=149743992
|
|
|
|
|
|
| |
--
PiperOrigin-RevId: 149110466
MOS_MIGRATED_REVID=149110466
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move all local resource acquisition to where local execution actually happens.
Don't attempt to acquire resources per action, but only for individual spawns.
This significantly simplifies the code.
The downside is that we don't account for action-level work anymore. In
general, actions should not perform any process execution themselves, but
always delegate such work to a SpawnStrategy implementation.
This change makes sure that every Spawn has local resources set in a way that
is consistent with the previous state.
However, there are two actions - Fileset and FileWrite -, which are not spawns,
and so we now don't limit their concurrent execution anymore. For Fileset, all
work is done in a custom Fileset-specific thread pool, so this shouldn't be a
problem. I'm not sure about FileWriteAction.
--
PiperOrigin-RevId: 149012600
MOS_MIGRATED_REVID=149012600
|
|
|
|
|
|
|
|
| |
execution. Usually it is enabled, and will be triggered whenever a remote_cache is specified, but a remote_worker is not; however, this flag allows to specifically disable it for the cases where remote_worker is defined, but something went wrong and we were forced to execute locally. This is useful to save time when the remote API does not actually support setting the remote action result.
--
PiperOrigin-RevId: 149007755
MOS_MIGRATED_REVID=149007755
|
|
|
|
|
|
|
|
|
|
| |
Specifying both options can cause OOM on OSX.
--
Change-Id: I52daf194a8840f9e63f1d537f13152e53f8436a7
Reviewed-on: https://cr.bazel.build/8220
PiperOrigin-RevId: 145079331
MOS_MIGRATED_REVID=145079331
|
|
|
|
|
|
|
|
|
|
| |
This allows Bazel to talk to multiple instances of the server, if these exist, enabling server-side parallelism (due to using separate gRPC channels).
TESTED: internally and local server
--
PiperOrigin-RevId: 142262973
MOS_MIGRATED_REVID=142262973
|
|
|
|
|
|
| |
--
PiperOrigin-RevId: 141817345
MOS_MIGRATED_REVID=141817345
|
|
|
|
|
|
|
|
| |
execution cache.
--
PiperOrigin-RevId: 141807596
MOS_MIGRATED_REVID=141807596
|
|
|
|
|
|
|
|
| |
Helps debugging.
--
PiperOrigin-RevId: 141802189
MOS_MIGRATED_REVID=141802189
|
|
|
|
|
|
|
|
|
|
| |
an ActionFileInputCache for SHA1 digests used with remote execution.
I missed an important test failure -- turned out we have a remote execution
test that does not live under lib/remote.
--
MOS_MIGRATED_REVID=140135277
|
|
|
|
|
|
|
| |
*** Reason for rollback ***
--
MOS_MIGRATED_REVID=139640949
|
|
|
|
|
| |
--
MOS_MIGRATED_REVID=139613925
|
|
|
|
|
| |
--
MOS_MIGRATED_REVID=133584935
|