| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
This provides a io.grpc.ClientInterceptor implementation that can be used to log gRPC call information. The interceptor can select a logging handler to use based on the gRPC method being called (Watch, Execute, Write, etc) to build a LogEntry, which can then be logged after the call has finished. Unit tests for the interceptor are included.
In this change, the interceptor is never invoked, nor are there any handlers implemented for any gRPC methods. The interceptor also never tries to log any entries.
To avoid circular dependency issues (Remote library will depend on logger which depends on remote library for utils), I've factored out the utility classes from the remote library into their own directory/package as part of this change.
PiperOrigin-RevId: 187926516
|
|
|
|
|
|
|
|
| |
The current behavior is already correct, just adding a test to make sure we retry reads as we should.
TESTED=the unit test
RELNOTES: None
PiperOrigin-RevId: 187398578
|
|
|
|
|
|
|
|
| |
So far, nobody uses it, but I want to start using this field soon.
TESTED=unit test
RELNOTES: None
PiperOrigin-RevId: 186290375
|
|
|
|
|
|
| |
Closes #4609.
PiperOrigin-RevId: 185032751
|
|
|
|
|
|
|
|
|
|
|
| |
This is to prevent this error:
SEVERE: *~*~*~ Channel io.grpc.internal.ManagedChannelImpl-56 for target directaddress:///io.grpc.inprocess.InProcessSocketAddress@3ecbfba1 was not shutdown properly!!! ~*~*~*
Make sure to call shutdown()/shutdownNow() and awaitTermination().
TESTED=ran tests
RELNOTES: None
PiperOrigin-RevId: 185020683
|
|
|
|
|
|
|
|
| |
I moved it into DigestUtil preemptively in case we switch to binary instead of hex representation.
TESTED=manually
RELNOTES: None
PiperOrigin-RevId: 185007558
|
|
|
|
|
|
|
|
| |
This is an important regression, we will want to patch the fix into 0.10
TESTED=fixed unit test, with A/B testing
RELNOTES: Resolved an issue where a failure in the remote cache would not trigger local re-execution of an action.
PiperOrigin-RevId: 184991670
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* This puts in the foundation of HTTP/2 support for remote caching.
* Allows us to remove the Apache HTTP library as a dependency, reducing
the Bazel binary size by 1MiB.
On fast networks (i.e. GCE to GCS) we can see a >2x speed improvement for TLS
throughput. Even from my workstation to GCS I get significant build time
improvements when using Netty's TLS 18s vs 12s.
Closes #4481.
PiperOrigin-RevId: 183411787
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Local execution has an inherent race condition: if a user modifies a file while an action is executed, then it is impossible for Bazel to tell which version of the file was actually read during action execution. The file may have been modified before or after the tool has read it, or, in the worst case, the tool may have read both the original and the modified version. In addition, the file may be changed back to the original state before Bazel can check the file, so computing the digest before / after may not be sufficient.
This is a concern for both local and remote caches, although the cost of poisoning a shared remote cache is significantly higher, and is what has triggered this work.
Fixes #3360.
We solve this by keeping a reference to the FileContentsProxy, and using that to check for modificaitons before storing the cache entry. We output a warning if this check fails.
This change does not increase memory consumption; Java objects are always allocated in multiples of 8 bytes, we use compressed oops, and the FileArtifactValue currently has 12 bytes worth of fields (excl. object overhead), so adding another pointer is effectively free.
As a possible performance optimization on purely local builds, we could also consider not computing digests at all, and only use the FileContentsProxy for caching.
PiperOrigin-RevId: 182510358
|
|
|
|
|
|
|
|
|
|
|
| |
This class represents a root (such as a package path or an output root) used for file lookups and artifacts. It is meant to be as opaque as possible in order to hide the user's environment from sky keys and sky functions.
Roots are used by RootedPaths and ArtifactRoots.
This CL attempts to make the minimum number of modifications necessary to change RootedPath and ArtifactRoot to use these fields. Deprecated methods and invasive accessors are permitted to minimise the risk of any observable changes.
RELNOTES: None
PiperOrigin-RevId: 182271759
|
|
|
|
|
|
| |
This is slightly more descriptive, and we will potentially want to use the name Root for a broader concept shared between ArtifactRoot and RootedPath.
PiperOrigin-RevId: 182082367
|
|
|
|
|
|
|
|
| |
This method violates the invariant that derived roots are never equal to the exec root. Only source roots can be equal to the exec root.
Note that this method was only used in tests, so this CL should be completely safe as long as its tests pass.
PiperOrigin-RevId: 181998483
|
|
|
|
|
|
| |
They need this to parse input manifests. Previously we would grab the exec root from the Root, but wish to unsupport this.
PiperOrigin-RevId: 181669143
|
|
|
|
|
|
|
|
| |
This simplifies some spawn runners, which no longer have to specially handle
null; unfortunately, the sandbox runners do not support VirtualActionInput,
so they still have to special-case it.
PiperOrigin-RevId: 181175408
|
|
|
|
|
|
|
|
| |
Fix a bug where Bazel would crash if two Directory protos had the same
hash.
RELNOTES: Remote Caching and Execution support output directories.
PiperOrigin-RevId: 179731040
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--auth_* flags only work with Google Cloud Authentication. That's
confusing and restricts the naming of more general purpose authentication
flags that we might want to add in the future. So instead of --auth_*
let's call them --google_* (the old ones will continue working for a
while).
Also, --auth_enabled (aka --google_default_credentials) is no longer required
when specifying --auth_credentials (aka --google_credentials).
So now there's two simple ways to authenticate with Google Cloud:
* bazel build --google_default_credentials
* bazel build --google_credentials=creds.json
RELNOTES: --auth_* flags were renamed to --google_* flags. The old names
will continue to work for this release but will be removed in the next
release.
Change-Id: Ia1736f32e15a37995be3172cd9608d518ddeab44
PiperOrigin-RevId: 179700832
|
|
|
|
|
|
| |
Add support for directory trees as artifacts. Closes #4011.
PiperOrigin-RevId: 179691001
|
|
|
|
|
|
|
|
|
|
|
|
| |
--auth_scopes can be passed a comma-separated list of authentication
scopes.
Add "https://www.googleapis.com/auth/devstorage.read_write" to the list
of defaults. This scope is used when using Google Cloud Storage (GCS) as
a remote caching backend.
Change-Id: I62e6fed28b28737823ad6c70cbc5048b3a3190b5
PiperOrigin-RevId: 179548090
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for Google Cloud Storage (GCS) as a HTTP caching backend.
This commit mainly adds the infrastructure necessary to authenticate
to GCS.
Using GCS as a caching backend works as follows:
1) Create a new GCS bucket.
2) Create a service account that can read and write to the GCS bucket
and generate a JSON credentials file for it.
3) Invoke Bazel as follows:
bazel build
--remote_rest_cache=https://storage.googleapis.com/<bucket>
--auth_enabled
--auth_scope=https://www.googleapis.com/auth/devstorage.read_write
--auth_credentials=</path/to/creds.json>
I'll add simplification's and docs in a subsequent commit.
Change-Id: Ie827d7946a2193b97ea7d9aa72eb15f09de2164d
PiperOrigin-RevId: 179406380
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To reproduce: run a failing test with --experimental_remote_spawn_cache or with --spawn_strategy=remote and no executor. Expected: test log is uploaded.
Desired behavior:
- regardless of whether a spawn is cacheable or not, its artifacts should be uploaded to the remote cache.
- the spawn result should only be set if the spawn is cacheable *and* the action succeeded.
- when executing remotely, the do_not_cache field should be set for non-cacheable spawns, and the remote execution engine should respect it.
This CL contains multiple fixes to ensure the above behaviors, and adds a few tests, both end to end and unit tests. Important behavior change: it is no longer assumed that non-cacheable spawns should use a NO_CACHE SpawnCache! The appropriate test case was removed. Instead, an assumption was added that all implementations of SpawnCache should respect the Spawns.mayBeCached(spawn) property. Currently, only NO_CACHE and RemoteSpawnCache exist, and they (now) support it.
TESTED=remote build execution backend.
WANT_LGTM: philwo,buchgr
RELNOTES: None
PiperOrigin-RevId: 178617937
|
|
|
|
|
|
|
|
|
| |
- Replace the existing Retrier with Retrier2.
- Rename Retrier2 to Retrier and remove the old Retrier + RetryException
class.
RELNOTES: None.
PiperOrigin-RevId: 177835070
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the FileSystem class to include the hash function as an
instance field. This allows us to have a different hash function
per FileSystem and removes technical debt, as currently that's
somewhat accomplished by a horrible hack that has a static method
to set the hash function for all FileSystem instances.
The FileSystem's default hash function remains MD5.
RELNOTES: None
PiperOrigin-RevId: 177479772
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that the SUCCESS status is often misunderstood to mean "zero exit",
even though this is clearly documented. I've decided to add another status for
non-zero exit, and use success only for zero exit to avoid this pitfall.
Also, many of the status codes are set, but never used. I decided to reduce the
number of status codes to only those that are actually relevant, which
simplifies further processing. Instead, we should add a string message for the
error case when we need one - we're not using it right now, so I decided not to
add that yet.
PiperOrigin-RevId: 177129441
|
|
|
|
|
|
|
|
| |
Fixes #3930. Also added tests.
TESTED=added tests
RELNOTES: Fixing regression to --experimental_remote_spawn_cache
PiperOrigin-RevId: 172852740
|
|
|
|
|
|
|
|
| |
We should only fall back if a remote execution error occurred, not if the command itself failed.
TESTED=better unit tests
RELNOTES: None
PiperOrigin-RevId: 172406687
|
|
|
|
|
|
|
|
| |
Adding unit tests.
TESTED=unit tests
RELNOTES: None
PiperOrigin-RevId: 170750220
|
|
|
|
|
|
|
| |
The old field is the error on Operation proto. The new field is the ExecuteResponse status field.
Note that the new field will also allow us to fetch logs for timing out tests, resolving a TODO, but this is not yet done is this change.
PiperOrigin-RevId: 170370676
|
|
|
|
|
|
|
| |
build.lib.actions.SpawnActionContext can import SpawnResult without creating a cyclic dependency.
RELNOTES: None.
PiperOrigin-RevId: 169642267
|
|
|
|
|
|
| |
TESTED=unit tests
RELNOTES: none
PiperOrigin-RevId: 169395919
|
|
|
|
|
|
|
|
|
| |
tests fails, we still want to be able to download the logs and other outputs from CAS.
This fixes a bug introduced by https://github.com/bazelbuild/bazel/commit/562fcf9f5dfd14daea718f77da95b43b1400689b. To reproduce: run a failing test vs a BES service, the test log would not be uploaded.
TESTED=unit tests
PiperOrigin-RevId: 169143428
|
|
|
|
| |
PiperOrigin-RevId: 168359681
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a generic retrier implementation (Retrier2) that can be
configured by plugging in a backoff strategy, a function to
decide on retriable errors and a circuit breaker. A concrete
implementation is added via RemoteRetrier that mostly is a
copy of the code of the existing Retrier.
Retrier2 adds support for circuit breaking [1]. It allows the
retrier to reject execution when failure rates are high. The
remote execution code will use this to gently switch between
local and remote execution/caching if the latter experiences
lots of failures.
Retrier2 is also useful when not used with gRPC. We need
retriers for the HTTP caching interface too.
All the code added in this CL is unused, to keep reviews
managable. In a follow up CL, I will switch the code to use
the new Retrier and delete the old retrier.
[1] https://martinfowler.com/bliki/CircuitBreaker.html
PiperOrigin-RevId: 168355597
|
|
|
|
|
|
|
|
|
|
| |
For any errors that are due to failures in the remote caching /
execution layers Bazel now returns exit code 34 (ExitCode.REMOTE_ERROR).
This includes errors where the remote cache / executor is unreachable or
crashes. It does not include errors if the test / build failure is due
to user errors i.e. compilation or test failures.
PiperOrigin-RevId: 167259236
|
|
|
|
| |
PiperOrigin-RevId: 166981977
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the upload of local build artifacts fails, the build no longer fails
but instead a warning is printed once. If --verbose_failures is
specified, a detailed warning is printed for every failure.
This helps fixing #2964, however it doesn't fully fix it due to timeouts
and retries slowing the build significantly.
Also, add some other tests related to fallback behavior.
Change-Id: Ief49941f9bc7e0123b5d93456d77428686dd5268
PiperOrigin-RevId: 165938874
|
|
|
|
|
|
|
|
| |
should no longer be thrown by any CAS implementations.
TESTED=unit tests
RELNOTES: none
PiperOrigin-RevId: 165937395
|
|
|
|
|
|
|
| |
A bug was introduced in patch 9626bb4923c74c6d3c09b7438eb24b32191053df, where a cache miss would not result in action re-execution, making the cache miss non-recoverable.
RELNOTES: fixes #3552
PiperOrigin-RevId: 165434579
|
|
|
|
|
|
|
|
|
|
| |
AbstractSpawnRunner now uses a SpawnCache if one is registered, this allows
adding caching to any spawn runner without having to be aware of the
implementations.
I will delete the old CachedLocalSpawnRunner in a follow-up CL.
PiperOrigin-RevId: 165024382
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
state.
A remote cache must never serve a failed action. However, if it did
Bazel would not detect this and simply fail and display an error message
that's hard to distinquish from a local execution failure.
Bazel now displays a clear error message stating what went wrong.
RELNOTES: None.
PiperOrigin-RevId: 164975631
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the persistent worker spawn strategy to extend
AbstractSpawnStrategy and put the actual logic into
WorkerSpawnRunner. WorkerTestStrategy is unaffected.
I had to extend SpawnPolicy with a speculating() method. Persistent
workers need to know if speculation is happening in order to require
sandboxing.
Additionally, I added java_test rules for the local runner tests and
worker tests. See https://github.com/bazelbuild/bazel/issues/3481.
NOTE: ulfjack@ made some changes to this change before merging:
- changed Reporter to EventHandler; added TODO about its usage
- reverted non-semantic indentation change in AbstractSpawnStrategy
- reverted a non-semantic indentation change in WorkerSpawnRunner
- updated some internal classes to match
- removed catch IOException in WorkerSpawnRunner in some cases,
removed verboseFailures flag from WorkerSpawnRunner, updated callers
- disable some tests on Windows; we were previously not running them,
now that we do, they fail :-(
Change-Id: I207b3938f0dc84d374ab052d5030020886451d47
PiperOrigin-RevId: 164965398
|
|
|
|
|
|
|
| |
All prefetching now goes through AbstractSpawnStrategy's implementation of
SpawnExecutionPolicy. Make sure the sandbox runners also do this consistently.
PiperOrigin-RevId: 164836877
|
|
|
|
| |
PiperOrigin-RevId: 164577062
|
|
|
|
|
|
|
|
|
|
|
| |
BES and Remote Execution have separate implementations of gRPC channel
creation, authentication and TLS. We should merge them, to avoid
duplication and bugs. One such bug is #3640, where the BES code had a
different implementation for Google Application Default Credentials.
RELNOTES: The Build Event Service (BES) client now properly supports
Google Applicaton Default Credentials.
PiperOrigin-RevId: 164253879
|
|
|
|
|
|
|
| |
Also, restructure the code for better read- and testability.
Change-Id: Ibdd0413f89e4687b836b768a9e7d6315234cb825
PiperOrigin-RevId: 163322658
|
|
|
|
|
|
|
|
| |
Also added tests specifically for the output, to ensure we don't break it again.
TESTED=remote worker, unit tests
RELNOTES: fixes #3380
PiperOrigin-RevId: 163283558
|
|
|
|
|
|
|
|
|
| |
success.
This can happen per spec, if multiple builds try to upload the same blob concurrently.
Also, added this to the RemoteWorker, per spec.
PiperOrigin-RevId: 162647548
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add an id for logging; this allows us to correlate log entries for the same
spawn from multiple spawn runner implementations in the future
- add a prefetch method to the SpawnExecutionPolicy; better than relying on
the ActionInputPrefetcher being injected in the constructor
- add a name parameter to the report method; this is in preparation for a
single unified SpawnStrategy implementation - it's basically the last bit of
difference between SandboxStrategy and RemoteSpawnStrategy; they're otherwise
equivalent (if not identical)
PiperOrigin-RevId: 162194684
|
|
|
|
|
| |
RELNOTES: None.
PiperOrigin-RevId: 161970540
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#3379
Commit dc24004873c335 broke the upload of locally executed action
results. Also, update our unit tests who did not catch this error.
P.S.: olaola@ is the author of this change, but due to time constraints
we had to merge it while she was asleep.
Change-Id: Ib150152c0bddc8311908c105aef208506d3b6a8d
PiperOrigin-RevId: 161954553
|
|
|
|
|
|
|
|
|
|
|
| |
- Move flag handling into RemoteModule to fail as early as possible.
- Make error messages from flag handling human readable.
- Fix a bug where remote execution would only support TLS with a root
certificate being specified.
- If a remote executor without a remote cache is specified, assume the
remote cache to be the same as the executor.
PiperOrigin-RevId: 161946029
|