| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
the same graph so using them reduces the graph construction overhead.
PiperOrigin-RevId: 198090110
|
|
|
|
| |
PiperOrigin-RevId: 198089875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The efficiency of CollectiveReduce is greatly improved by merging
multiple parallel reductions over smaller tensors into a single
reduction over a larger tensor that is the concatentation of the
smaller tensors. Because CollectiveReduce is essentially an
element-wise array operation which operates on a 1-D reshape of
the input tensor it is eligible for a ScopedAllocation optimization.
The optimization works by looking for serially independent instances
of CollectiveReduce that lie within the same name-scope tier and
have the same control-flow (e.g. loop) embedding structure. Where
two or more such nodes are found the upstream nodes that generate
their inputs are modified to write their outputs into consecutive
regions of a single tensor buffer maintained by a ScopedAllocator.
The multiple CollectiveReduce nodes are then replaced by a single
CollectiveReduce that operates in-place on the backing buffer.
The effectiveness of the optimization depends on there being candidate
CollectiveReduce nodes with these characteristics that become eligible
for execution at close to the same time. If the name scope is too
large, and includes nodes that become execution eligible at very different
times, this graph rewrite could result in a slowdown.
Note that this optimization is experimental: it is not guaranteed to
work, especially for ops other than CollectiveReduce.
PiperOrigin-RevId: 198089642
|
|
|
|
|
|
| |
output in prediction ops (example id, tree leaf node index id) for input as other model features
PiperOrigin-RevId: 198087342
|
|
|
|
| |
PiperOrigin-RevId: 198085532
|
|
|
|
| |
PiperOrigin-RevId: 198083156
|
|
|
|
| |
PiperOrigin-RevId: 198079927
|
|
|
|
| |
PiperOrigin-RevId: 198078724
|
|
|
|
| |
PiperOrigin-RevId: 198077643
|
|
|
|
| |
PiperOrigin-RevId: 198073059
|
|
|
|
| |
PiperOrigin-RevId: 198071709
|
|
|
|
| |
PiperOrigin-RevId: 198070157
|
|
|
|
|
|
|
| |
Just output all arrays, before writing edges, so we don't
need to keep track of which arrays we've already output.
PiperOrigin-RevId: 198055327
|
|
|
|
| |
PiperOrigin-RevId: 198044106
|
|
|
|
| |
PiperOrigin-RevId: 198022387
|
|
|
|
| |
PiperOrigin-RevId: 198017870
|
|
|
|
| |
PiperOrigin-RevId: 197996636
|
|
|
|
| |
PiperOrigin-RevId: 197993384
|
|
|
|
| |
PiperOrigin-RevId: 197993147
|
|
|
|
| |
PiperOrigin-RevId: 197991672
|
|
|
|
|
|
|
| |
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId: 197991247
|
|
|
|
| |
PiperOrigin-RevId: 197989813
|
|
|
|
|
|
|
| |
In a later change I will expand MemoryTile to store tiles and load "3d" tiles
(where we broadcast along one dimension as we load).
PiperOrigin-RevId: 197987185
|
|
|
|
|
|
|
|
|
|
| |
Also move AlgorithmPicker after layout assignment, as now
cudnn_convolution_runner will return failures on invalid input layouts.
Also add a backend debug option to switch the layout heuristic. By default
it has the old behavior (all NCHW).
PiperOrigin-RevId: 197983747
|
|
|
|
| |
PiperOrigin-RevId: 197979118
|
|
|
|
|
|
|
|
|
| |
becomes a float64 tensor.
Earlier py_seq_tensor would fall back to a float32 if not explicitly requesting
a float64 (which would not happen if we had no other information).
PiperOrigin-RevId: 197977260
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, when we executed a naked variable read (i.e. outside of
a defun, directly running <xla_device>->Compute()), tf2xla kernel would
copy the variable's tensor leading to many unnecessary copies.
This change uses the regular non-tf2xla kernel for naked variable reads
and marks the tf2xla one for CompilationOnly().
PiperOrigin-RevId: 197976146
|
|
|
|
| |
PiperOrigin-RevId: 197974385
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(_magic_gradient_function was renamed to _gradient_function)
Before:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.88456789653
extras {
key: "examples_per_sec"
value {
double_value: 169936.011885
}
}
}
After:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.04853725433
extras {
key: "examples_per_sec"
value {
double_value: 198077.175551
}
}
}
PiperOrigin-RevId: 197972668
|
|
|
|
| |
PiperOrigin-RevId: 197969642
|
|
|
|
| |
PiperOrigin-RevId: 197968452
|
|
|
|
|
|
|
| |
These maps aren't really pulling their weight, fold them to the instruction
that they compute.
PiperOrigin-RevId: 197967117
|
|
|
|
| |
PiperOrigin-RevId: 197965508
|
|
|
|
|
|
|
|
|
|
| |
Applying a generator to a class is the same as applying that generator to every member of that class. It is meant to allow avoiding repetition in some cases.
The implementation relies on some internals of parameterized tests and how it works with a class level declaration: https://github.com/abseil/abseil-py/blob/master/absl/testing/parameterized.py#L319.
The "mode" argument is required before this change. To accommodate cases where execution mode isn't the point of the test, "mode" became optional with "graph" mode being default. Another idea I had was to pick a random mode by default.
PiperOrigin-RevId: 197964501
|
|
|
|
|
|
|
| |
collective_graph_key can be passed in when collective ops are used
in variable initialization.
PiperOrigin-RevId: 197964316
|
|
|
|
| |
PiperOrigin-RevId: 197963232
|
|
|
|
| |
PiperOrigin-RevId: 197959602
|
|
|
|
| |
PiperOrigin-RevId: 197959536
|
|
|
|
| |
PiperOrigin-RevId: 197959372
|
|
|
|
| |
PiperOrigin-RevId: 197952565
|
|
|
|
| |
PiperOrigin-RevId: 197949637
|
|
|
|
|
|
|
|
|
|
| |
* Source file content is now sent one by one, making it less likely that individual
messages will have sizes above the 4-MB gRPC message size limit.
* In case the message for a single source file exceeds the limit, the client handles
it gracefully by skipping the sending and print a warning message.
Fixes: https://github.com/tensorflow/tensorboard/issues/1118
PiperOrigin-RevId: 197949416
|
|
|
|
| |
PiperOrigin-RevId: 197943921
|
|
|
|
| |
PiperOrigin-RevId: 197942379
|
|
|
|
| |
PiperOrigin-RevId: 197942180
|
|
|
|
| |
PiperOrigin-RevId: 197941740
|
|
|
|
| |
PiperOrigin-RevId: 197941474
|
|
|
|
|
|
|
|
| |
This addresses a race condition where LookupOrCreate is called at the same time from two threads, and both Lookup()s fail, so the creator() function is run twice, even though only a single Create() will then succeed.
The motivation is that some creator() functions have side-effects, e.g. tf.contrib.summary.create_file_writer()'s init op opens an events file. This change ensures that if two init ops for file writers with the same resource name are run in the same session.run() call, only one events file will be created. (Current behavior will often open two files; typically the second one overwrites the first but this won't happen if the filename_suffix values are different or the timestamps happen to straddle a second boundary.)
PiperOrigin-RevId: 197940607
|
|
|
|
| |
PiperOrigin-RevId: 197939808
|
|
|
|
|
|
|
|
|
| |
The waiting was implemented to avoid reading stale models as much as possible.
However with this dependency, each input column creates a Send/Recv to PS0
which slows down training significantly.
Colocate Quantile and Stats accumulators for the same handler.
PiperOrigin-RevId: 197939327
|