| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
mechanism, since the meta optimizer only checks if it has been cancelled before running each sub-optimizer. We can add cancellation to each sub-optimizer if necessary.
PiperOrigin-RevId: 216234262
|
|
|
|
| |
PiperOrigin-RevId: 215254762
|
|
|
|
| |
PiperOrigin-RevId: 215014737
|
|
|
|
|
|
|
|
|
| |
the duration of a single RunInternal() call from RunHandlerPool. It is used for
running inter-op closures with a global scheduler (which in the future) to
improve both median and tail latency (for use-cases like CPU inference).
In the case that global pools aren't used, this change should be a no-op.
PiperOrigin-RevId: 214992852
|
|
|
|
| |
PiperOrigin-RevId: 214853846
|
|
|
|
|
|
|
|
|
| |
the duration of a single RunInternal() call from RunHandlerPool.
We want to leverage this abstraction for improving the cross-session inter-op
parallelism for lower latency inference in the future.
In the case that global pools aren't used, this change should be a no-op.
PiperOrigin-RevId: 214818187
|
|
|
|
| |
PiperOrigin-RevId: 214794973
|
|
|
|
| |
PiperOrigin-RevId: 214275960
|
|
|
|
|
|
| |
GPU). This avoids many unnecessary CPU<->GPU memcpy and syncs.
PiperOrigin-RevId: 214108484
|
|
|
|
| |
PiperOrigin-RevId: 213875284
|
|\
| |
| |
| |
| |
| | |
ROCmSoftwarePlatform:upstream-staging-gpu-common-runtime-1
PiperOrigin-RevId: 213653830
|
| |
| |
| |
| |
| |
| | |
value}}` and `^^key:value^^`. This change consolidate these two format.
PiperOrigin-RevId: 211550259
|
|/
|
|
|
| |
Rename CUDA GPU ID to platform GPU ID so the notion is applicable on both CUDA
and ROCm platform.
|
|
|
|
|
| |
RELNOTES: tfdbg: Limit the total disk space occupied by dumped tensor data to 100 GBytes. Add environment variable `TFDBG_DISK_BYTES_LIMIT` to allow adjustment of this upper limit.
PiperOrigin-RevId: 210648585
|
|
|
|
| |
PiperOrigin-RevId: 207394440
|
|
|
|
|
|
| |
protobuf API to specify executor to use.
PiperOrigin-RevId: 206681376
|
|
|
|
|
|
|
|
|
|
|
| |
At times, a server cannot open a reverse connection to the client. This is
required when using the _Send/_Recv ops and the client needs to send a tensor
to the server (tensors are pulled). Instead, this adds a way to push the
tensors directly from the client.
Currently, pushing tensors always happens in sync mode.
PiperOrigin-RevId: 205888825
|
|
|
|
| |
PiperOrigin-RevId: 205712557
|
|
|
|
|
|
|
| |
by proto3 semantics. Silently correct that value to 1, without logging
an error.
PiperOrigin-RevId: 205157429
|
|\
| |
| |
| | |
PiperOrigin-RevId: 204789700
|
| |
| |
| |
| |
| |
| | |
This is part of our effort to improve Python error messages by allowing the runtime to output formatted messages for the Python layer to interpolate. This will be gated by this config field to begin with.
PiperOrigin-RevId: 204731230
|
|/
|
|
|
|
|
|
|
|
| |
Example code as follow:
config = tf.estimator.RunConfig(protocol='grpc+verbs')
nn = tf.estimator.Estimator(model_fn=model_fn,
model_dir=model_dir,
params=params,
config=config)
|
|
|
|
|
|
| |
* debug_gateway and the related node_outputs_callback are not used and hence are removed in this CL.
PiperOrigin-RevId: 204519574
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
more than one device-to-device copy stream per GPU device.
This is an experimental feature that will have no effect unless
copy operations explicitly request a stream other than 0, which
currently does not occur anywhere in a standard build.
Eventually it may be of benefit in the presence of multiple
bi-directional concurrent data copies.
PiperOrigin-RevId: 202354513
|
|
|
|
|
|
| |
when using Session::RunCallable().
PiperOrigin-RevId: 202234757
|
|
|
|
|
|
|
|
| |
Since we respond with the shape, all RPCs will happen sync (note
that we may still hide the python overhead, since the op is still scheduled for
execution via the eager executor).
PiperOrigin-RevId: 202207324
|
|
|
|
|
|
| |
Update references in source files and docs in tensorflow and related projects.
PiperOrigin-RevId: 201766994
|
|
|
|
| |
PiperOrigin-RevId: 201586130
|
|
|
|
| |
PiperOrigin-RevId: 201095811
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The efficiency of CollectiveReduce is greatly improved by merging
multiple parallel reductions over smaller tensors into a single
reduction over a larger tensor that is the concatentation of the
smaller tensors. Because CollectiveReduce is essentially an
element-wise array operation which operates on a 1-D reshape of
the input tensor it is eligible for a ScopedAllocation optimization.
The optimization works by looking for serially independent instances
of CollectiveReduce that lie within the same name-scope tier and
have the same control-flow (e.g. loop) embedding structure. Where
two or more such nodes are found the upstream nodes that generate
their inputs are modified to write their outputs into consecutive
regions of a single tensor buffer maintained by a ScopedAllocator.
The multiple CollectiveReduce nodes are then replaced by a single
CollectiveReduce that operates in-place on the backing buffer.
The effectiveness of the optimization depends on there being candidate
CollectiveReduce nodes with these characteristics that become eligible
for execution at close to the same time. If the name scope is too
large, and includes nodes that become execution eligible at very different
times, this graph rewrite could result in a slowdown.
Note that this optimization is experimental: it is not guaranteed to
work, especially for ops other than CollectiveReduce.
PiperOrigin-RevId: 198089642
|
|
|
|
|
|
|
| |
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId: 197991247
|
|
|
|
|
|
|
|
| |
Complete just enough of the core implementation to run
multi-device collectives locally within a single process.
Interfaces are still private and not availble for general use.
PiperOrigin-RevId: 197617132
|
|
|
|
| |
PiperOrigin-RevId: 197490523
|
|
|
|
| |
PiperOrigin-RevId: 197439191
|
|
|
|
|
|
|
|
|
| |
Since transitioning to proto3, it was not possible to distinguish between the absence of
LoggingRequest::rpc_logging and it being set to false. This led to a bug that ignored
log-disabling messages in some implementations, which meant that logging was never
disabled. This fix adds explicit fields in LoggingRequest for enabling and disabling RPC
logging.
PiperOrigin-RevId: 196782547
|
|
|
|
| |
PiperOrigin-RevId: 196762618
|
|
|
|
| |
PiperOrigin-RevId: 196170800
|
|
|
|
|
|
|
|
|
|
| |
Distributed-mode implementations of CollectiveRemoteAccess.
Extend Worker interface with corresponding new methods.
This change is part of a series of changes introducing infrastructure
for collective ops and initial implementations of reduction and broadcast.
PiperOrigin-RevId: 196010718
|
|
|
|
|
|
|
|
|
|
|
| |
Distributed-mode implementations of DeviceResolverInterface
and ParamResolverInterface. Extend Worker interface with
new methods in support of these interfaces.
This change is part of a series of changes introducing infrastructure
for collective ops and initial implementations of reduction and broadcast.
PiperOrigin-RevId: 194984585
|
|
|
|
| |
PiperOrigin-RevId: 194596337
|
|
|
|
| |
PiperOrigin-RevId: 194262260
|
|
|
|
| |
PiperOrigin-RevId: 194031845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the session handle was unrecognized by the worker, it
would default to using the LegacySession. This prevents us from
noticing that a server has been restarted.
To address the problem in a backwards-compatible way, we add a bit to
each session-handle-carrying worker request, indicating whether the
master believes that CreateWorkerSession has been called. If this bit
is set and the handle is unrecognized, the worker will raise an
AbortedError, which can be caught by high-level frameworks such as
`tf.estimator`.
Note that CreateWorkerSession is not yet used by default, and a
follow-up change will add that.
PiperOrigin-RevId: 193427057
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doesn't add to the public API yet, just shifts code around. Changes:
- A tiny bit of renaming (to avoid having _Checkpoint and Checkpoint in the same file)
- Removed the garbage collection decorator from a few tests due to the uuid4() garbage issue (apparently core tests get run on Python 2.7.9?)
- Renamed "Object" to "CheckpointableObject" in the proto, since core protos have Java bindings and apparently Java had something else in mind for the keyword "Object" :)
but otherwise this is a pure move.
After this CL I'll propose adding tf.train.Checkpoint to the API (currently tf.contrib.eager.Checkpoint), move the utilities that are still in contrib/eager to their own contrib directory (there will be a few more misc. utilities for inspecting checkpoints and managing dependencies), get tf.train.Saver to read object-based checkpoints for compatibility, and work on Model.save_weights/load_weights.
PiperOrigin-RevId: 192646890
|
|
|
|
| |
PiperOrigin-RevId: 191964391
|
|
|
|
| |
PiperOrigin-RevId: 191960845
|
|
|
|
|
|
|
| |
Don't crash in layout optimizer if no cluster is given.
Clean up Cluster::DisableOptimizer() so it actually turns all current optimizers off.
PiperOrigin-RevId: 191368433
|
|
|
|
| |
PiperOrigin-RevId: 190391193
|
|
|
|
| |
PiperOrigin-RevId: 189719711
|
|
|
|
| |
PiperOrigin-RevId: 189641729
|