| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 184291701
|
|
|
|
|
|
| |
deduped.
PiperOrigin-RevId: 184289685
|
|
|
|
| |
PiperOrigin-RevId: 184285125
|
|
|
|
|
|
|
|
| |
dereferenced ref tensor.
Previously the inliner would add an identity node with an invalid ref-type attr when the actual parameter had ref type. The changed version removes the reference.
PiperOrigin-RevId: 184285084
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When partitioned variables are used in a TPU training loop,
concat gradient operations get generated for which XLA requires
the concat dimension argument to be a constant (or foldable to a constant).
However since such constant is defined outside of the train while context
an Enter node is generated in order to pass it.
The fix consists in detecting such case, and to duplicate the (scalar) constant
inside the while context, so that XLA can succesfully process the resulting
graph.
PiperOrigin-RevId: 184273245
|
|
|
|
| |
PiperOrigin-RevId: 184247187
|
|
|
|
| |
PiperOrigin-RevId: 184240222
|
|
|
|
| |
PiperOrigin-RevId: 184247187
|
|
|
|
| |
PiperOrigin-RevId: 184240222
|
|
|
|
| |
PiperOrigin-RevId: 184239740
|
|
|
|
| |
PiperOrigin-RevId: 184236409
|
|
|
|
| |
PiperOrigin-RevId: 184233513
|
|
|
|
| |
PiperOrigin-RevId: 184227786
|
|
|
|
| |
PiperOrigin-RevId: 184225409
|
|
|
|
| |
PiperOrigin-RevId: 184220615
|
|
|
|
| |
PiperOrigin-RevId: 184220515
|
|
|
|
|
|
|
|
| |
Now whenever we want to operate in dependency order we use execution_plan.
It begins as identity map (0, ..., nodes_size()) but can be changed
in the future. This is the basis for more pluggable delegation.
PiperOrigin-RevId: 184216885
|
|
|
|
| |
PiperOrigin-RevId: 184213576
|
|
|
|
|
|
|
| |
Previously, 1 token was approximately 256 bytes. This is slightly less
intuitive than 1 kb.
PiperOrigin-RevId: 184212503
|
|
|
|
|
|
|
|
|
|
|
|
| |
batch norm) to support quantized training. The weights are always now scaled by gamma/sigma, where sigma is the moving standard deviation for stability prior to quantization. For improved performance, the moving means and variances are frozen and the training graph modified accordingly.
An additional parameter freeze_batch_norm_delay is added to foldbatchnorm function to set the delay at which training switches from regular batch norm to frozen mean and variances.
Remove placement options within FoldBatchNorm as this causes folded training to place all ops on a single GPU. The modification now significantly speeds up distributed training.
The tests for folding batch norms are also updated to reflect the additional topological changes to the graph.
PiperOrigin-RevId: 184211434
|
|
|
|
| |
PiperOrigin-RevId: 184205196
|
|
|
|
| |
PiperOrigin-RevId: 184202470
|
|
|
|
| |
PiperOrigin-RevId: 184202425
|
|
|
|
| |
PiperOrigin-RevId: 184201506
|
|
|
|
| |
PiperOrigin-RevId: 184194895
|
|
|
|
|
|
| |
The alternative to this is to have an adaptive approach that would unevenly split input into per-tower batches. The concern with that was that all towers will be as slow as the one with more input reducing the performance. Batch size seems to be commonly tailored to the available hardware.
PiperOrigin-RevId: 184192793
|
|
|
|
|
|
| |
invalid set of fetch nodes
PiperOrigin-RevId: 184192790
|
|
|
|
|
|
|
|
|
|
|
| |
* Change how switch grouping works:
- This is an intermediate step, next is combining
DetermineBranchMapAndFrontier into one traversal.
* Homogeneous the naming (switch_nodes -> switches);
* Change graph dumping to be due to class member - currently still performed when vlog-level is sufficiently high;
* Pass in correct library when dumping graphs;
PiperOrigin-RevId: 184188816
|
|
|
|
|
|
| |
TensorFlow Serving Model Server.
PiperOrigin-RevId: 184188752
|
|
|
|
|
|
| |
reduction=SUM_OVER_BATCH_SIZE.
PiperOrigin-RevId: 184186573
|
|
|
|
| |
PiperOrigin-RevId: 184183730
|
|
|
|
| |
PiperOrigin-RevId: 184183725
|
|
|
|
| |
PiperOrigin-RevId: 184179246
|
|
|
|
|
|
|
|
| |
generated code.
Add a helper for the control dependencies context manager.
PiperOrigin-RevId: 184176409
|
|
|
|
| |
PiperOrigin-RevId: 184174800
|
|
|
|
| |
PiperOrigin-RevId: 184173047
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change ensures that a shared iterator (which requires a private
FunctionLibraryRuntime that outlasts the calling op's runtime, because
it can outlive a single session) uses the same Device as a non-shared
iterator, and hence capturing resources from the creating graph will
work as intended.
Fixes #16481.
PiperOrigin-RevId: 184172498
|
|
|
|
|
|
| |
in the process.
PiperOrigin-RevId: 184172483
|
|
|
|
| |
PiperOrigin-RevId: 184169668
|
|
|
|
| |
PiperOrigin-RevId: 184165180
|
|
|
|
|
|
| |
fixes #16167
PiperOrigin-RevId: 184160925
|
|
|
|
| |
PiperOrigin-RevId: 184160009
|
|
|
|
|
|
| |
of the graph to enable inference of the shape of a SendFromHost Op once the shape of corresponding RecvAtHost Ops are known.
PiperOrigin-RevId: 184153187
|
|
|
|
| |
PiperOrigin-RevId: 184141875
|
|
|
|
|
|
| |
Add support for int32 indices to the MatrixBandPart operator.
PiperOrigin-RevId: 184133343
|
|
|
|
|
|
|
|
|
|
|
|
| |
- The @org_tensorflow package designation is unnecessary, and breaks the build
when building without a sandbox.
- The generated tests must use tf_cc_test, not cc_test. See the note in
tensorflow/core/BUILD.
Partially addresses #15338
PiperOrigin-RevId: 184095571
|
|
|
|
| |
PiperOrigin-RevId: 184088913
|
|
|
|
| |
PiperOrigin-RevId: 184086955
|
|
|
|
| |
PiperOrigin-RevId: 184085402
|
|
|
|
| |
PiperOrigin-RevId: 184078894
|