| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 175304705
|
|
|
|
| |
PiperOrigin-RevId: 175304150
|
|
|
|
|
|
| |
I attempted to exercise compute_sum_on_device(IndexedSlices) via the DNNCLassifier test per reviewer's suggestion. This changelist is way to do it correctly. I verified that it indeed triggers the required codepath by adding logging and removing compute_sum_on_device(IndexedSlices) support.
PiperOrigin-RevId: 175303333
|
|
|
|
| |
PiperOrigin-RevId: 175302425
|
|
|
|
| |
PiperOrigin-RevId: 175297329
|
|
|
|
|
|
| |
fusion instructions.
PiperOrigin-RevId: 175295981
|
|
|
|
| |
PiperOrigin-RevId: 175277161
|
|
|
|
| |
PiperOrigin-RevId: 175275184
|
|
|
|
|
|
|
|
| |
the release process.
See #13872
PiperOrigin-RevId: 175261983
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tiling dimension corresponding to the number of vector registers in the tile
can be changed easily. Expose this value as a backend specific flag so that we
can experiment with it to find a good default value.
This CL also fixes a bug exposed by a variable tiling factor in the row major
GEMV implementation. This wasn't caught before because having tile_rows ==
tile_cols hides the bug.
PiperOrigin-RevId: 175258553
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on it unless we lock each call to GetNext which is not preferable.
Each iterator now handles saving/restoring exhausted state.
As a guideline, we always reset the input_impl(s) when they get exhausted. This can be used as an indicator of exhausted-ness for non-terminal iterators. Also reduces memory overhead.
Each iterator should also handle calls to GetNextInternal when it is exhausted. Fixed this for some datasets.
Also fix a bug in dataset_serialization_test_base. We were not saving
a checkpoint after exhausting the iterator so verify_exhausted_iterator
was not really testing restoring an exhausted iterator.
PiperOrigin-RevId: 175253023
|
|
|
|
|
|
| |
This is necessary in providing bfloat support in GPU backend.
RELNOTES: bfloat support is now added to XLA infra.
PiperOrigin-RevId: 175252067
|
|
|
|
|
|
|
|
|
|
| |
Nodes inside of subcomputations (e.g. fusion computations) are always
printed by the HLO graph dumper. Before this change, the dumper was not
fully aware of this fact, leading it to mark as "deemphasized" (i.e.
draw as gray with a dashed outline) nodes that had no business of being
deemphasized.
PiperOrigin-RevId: 175247474
|
|
|
|
|
|
| |
Also, give PaddingConfig its own ToString format.
PiperOrigin-RevId: 175239832
|
|
|
|
|
|
|
|
|
| |
Previously we LOG(INFO)'ed the driver version, which meant it wouldn't
be printed unless you passed --logtostderr. But this information is
pretty important, especially since cudnnCreate failing is likely to be a
fatal error.
PiperOrigin-RevId: 175235628
|
|
|
|
| |
PiperOrigin-RevId: 175232587
|
|
|
|
|
|
|
|
| |
"Reduce metric variables" operation is a single operation across all metric variables, which means it is across all eval metrics. Previously, an update op for every eval metric was conditioned on a copy of overall "reduce metric variables" op. The latter was meant to be idempotent and thus the end result was supposed to be correct.
However, "reduce metric variables" op consists of a number of variable assignments and thus is not atomic. If execution of two "reduce metric variables" ops interleaves, then the end result might come out to be incorrect. This caused flakiness in replicate_model_fn_test.py. To fix the problem, there is now a single copy of the "reduce metric variables" and every eval metric is associated with that single instance.
PiperOrigin-RevId: 175232016
|
|
|
|
|
|
| |
Reduce, SelectAndScatter, Reverse, Slice, DynamicSlice, DynamicUpdateSlice, Transpose, BatchNormTraining, BatchNormInference, BatchNormGrad.
PiperOrigin-RevId: 175231463
|
|
|
|
| |
PiperOrigin-RevId: 175230217
|
|
|
|
| |
PiperOrigin-RevId: 175229944
|
|
|
|
| |
PiperOrigin-RevId: 175228315
|
|
|
|
| |
PiperOrigin-RevId: 175228264
|
|
|
|
| |
PiperOrigin-RevId: 175225805
|
|
|
|
| |
PiperOrigin-RevId: 175219920
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of assigning the pre and post optimization to a singleton xla::Compiler
object, prefer creating a short-lived CpuCompiler or a GpuCompiler instance on
the stack. Without this change, adding a second test case on the
(Cpu|Gpu)Compiler in the same process triggers a use-after-free.
(Btw, LLVMCompiler should really be spelled LlvmCompiler per Google C++ style,
I'll do that rename shortly).
PiperOrigin-RevId: 175218617
|
|
|
|
| |
PiperOrigin-RevId: 175217850
|
|
|
|
|
|
| |
differs from streaming_auc because it uses every prediction as a threshold rather than linearly spaced fixed thresholds.
PiperOrigin-RevId: 175217002
|
|
|
|
|
|
|
| |
and Recv into {Recv, RecvDone}. See operation_semantics.md for the updated
semantics.
PiperOrigin-RevId: 175216012
|
|
|
|
| |
PiperOrigin-RevId: 175213336
|
|
|
|
|
|
|
|
|
|
| |
the labels.
Clarify current backprop behavior.
Original bugfix by Alexandre Passos.
PiperOrigin-RevId: 175211803
|
|
|
|
| |
PiperOrigin-RevId: 175210678
|
|
|
|
|
|
|
|
|
|
| |
Previously, if you had a very large allocation, it would round up to the
next power of 2, and then, if this didn't fit in your GPU's available
memory, eat all remaining memory in the device.
Now we waste at most 128mb of memory in a large alloc.
PiperOrigin-RevId: 175209995
|
|
|
|
| |
PiperOrigin-RevId: 175207829
|
|
|
|
|
|
|
|
| |
have already been applied.
Make sure rewrites are idempotent by running the optimizer twice in unit tests.
PiperOrigin-RevId: 175206742
|
|
|
|
| |
PiperOrigin-RevId: 175205782
|
|
|
|
| |
PiperOrigin-RevId: 175204075
|
|
|
|
|
|
| |
types, not just Conv2D.
PiperOrigin-RevId: 175204002
|
|
|
|
| |
PiperOrigin-RevId: 175203593
|
|
|
|
|
|
| |
I suspect that reducing local variables for eval metrics over more than one tower is flaky, but I haven't figured out why yet.
PiperOrigin-RevId: 175201241
|
|
|
|
| |
PiperOrigin-RevId: 175200199
|
|
|
|
|
|
| |
Flaky in open source build.
PiperOrigin-RevId: 175199083
|
|
|
|
| |
PiperOrigin-RevId: 175198248
|
|
|
|
| |
PiperOrigin-RevId: 175195239
|
|
|
|
|
|
|
|
| |
`gradients.gradients` may return computed gradients in IndexedSlices as opposed to a Tensor: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/ops/gradients_impl.py#L881.
`replicate_model_fn` currently uses math_ops.add_n to aggregate gradients from all towers. It doesn't work with IndexedSlices and thus needs to be handled separately.
PiperOrigin-RevId: 175194893
|
|
|
|
|
| |
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/14412 from yifeif:yifeif-patch-3 4b91380c6fc1f995d48a5f184e7307f776541bd0
PiperOrigin-RevId: 175192097
|
|
|
|
| |
PiperOrigin-RevId: 175178089
|
|
|
|
|
|
| |
cleanup.
PiperOrigin-RevId: 175176635
|
|
|
|
| |
PiperOrigin-RevId: 175174326
|
|
|
|
| |
PiperOrigin-RevId: 175167946
|
|
|
|
|
|
|
| |
In practice this does not seem to make a difference, but I did it
anyway for completeness.
PiperOrigin-RevId: 175167706
|