| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
| |
Previously, we printed everything about the kernel *except* its name.
:)
PiperOrigin-RevId: 178037469
|
|
|
|
|
|
| |
See #15137.
PiperOrigin-RevId: 178037461
|
|
|
|
|
|
|
|
|
|
| |
tensor
Notes: for learning tasks built on sparse signals, most of the tensors that go
into the returned tensors are embedding, which are potentially useful for
applications that consume embeddings from other models. This makes it easy for
the caller to retrieve these tensors and make their customized signatures.
PiperOrigin-RevId: 178033410
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, some code paths through `tf.train.export_meta_graph()` did
not ensure that the function library was persisted in the resulting
`MetaGraphDef`. This would break serialization for meta-graphs that
included `tf.data` pipelines that used functions. This fix ensures
that the library is copied to all such meta-graphs.
Fixes #15019. Fixes #14143.
PiperOrigin-RevId: 178033103
|
|
|
|
| |
PiperOrigin-RevId: 178033021
|
|
|
|
| |
PiperOrigin-RevId: 178032838
|
|
|
|
|
|
|
|
|
|
|
|
| |
an optional embedded message itself containing a repeated int field
(now called 'dims'). This matches existing shape structurs (both in
Toco internally, and in TensorFlow) and is necessary in order to
disambiguate between a 0-dimensional shape and an undefined/unknown
shape. This is a necessary prerequisite, in particular, for allowing
toco to operate without given fixed input shapes, as so far these
were impossible to disambiguate from fixed 0-dimensional shapes.
PiperOrigin-RevId: 178027064
|
|
|
|
|
|
| |
idempotent.
PiperOrigin-RevId: 178026253
|
|
|
|
|
|
| |
of a grappler item's save/restore subgraph.
PiperOrigin-RevId: 178025696
|
|
|
|
| |
PiperOrigin-RevId: 178021454
|
|
|
|
|
|
|
|
| |
Current users are unaffected. Running
`//tensorflow/core/common_runtime_direct_session_test
--benchmarks=all`, which stresses the Arg and Retval ops, reveals no
performance change.
PiperOrigin-RevId: 178015803
|
|
|
|
| |
PiperOrigin-RevId: 178013302
|
|
|
|
| |
PiperOrigin-RevId: 178010405
|
|
|
|
| |
PiperOrigin-RevId: 178009859
|
|
|
|
|
|
|
|
|
| |
generate input data that is constrained for certain entry computation parameters.
Generate fake literals that are within bounds for DynamicSlice and other
operations that accept dynamically computed indices.
PiperOrigin-RevId: 178006866
|
|
|
|
| |
PiperOrigin-RevId: 177999275
|
|
|
|
| |
PiperOrigin-RevId: 177994155
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change teaches the configure script how to search for Android NDK
and SDK installations and create new WORKSPACE rules pointing to them.
It also refactors many similar loop-over-user-input functions into using
a reusable method (not the more complex ones).
Specifying an SDK directory will further query for the available SDK API
levels and build tools versions, but it won't perform any compatibility
checks.
Like other settings, every android-related setting can be set beforehand
via an env param. The script will not ask for any Android settings if
there are already any android repository rules in the WORKSPACE.
The script will emit a warning if using an NDK version newer than 14 due
to https://github.com/bazelbuild/bazel/issues/4068.
PiperOrigin-RevId: 177989785
|
|
|
|
| |
PiperOrigin-RevId: 177989542
|
|
|
|
|
|
| |
Also add support for rank != 4 tensors to the TF/XLA fused batchnorm operators, although the TF core ops don't actually support other ranks yet so this is not tested.
PiperOrigin-RevId: 177987592
|
|
|
|
|
|
|
|
| |
This requires absl-py 0.1.6.
Also remove the manual tag on //tensorflow/python:app_test.
PiperOrigin-RevId: 177986813
|
|
|
|
|
|
|
|
| |
Also fix a TODO in XlaOpRegistry to filter by the types allowed by the OpDef.
Also see #14798
PiperOrigin-RevId: 177986664
|
|
|
|
|
|
|
| |
This option is necessary to mimic the Python import_graph_def method's
behavior.
PiperOrigin-RevId: 177986165
|
|
|
|
| |
PiperOrigin-RevId: 177972555
|
|
|
|
| |
PiperOrigin-RevId: 177971801
|
|
|
|
|
|
|
| |
Change dependency optimizer to remove isolated NoOps when it is safe.
Fix bug in arithmetic optimizer: Only remove deduped nodes if we know the fetches.
PiperOrigin-RevId: 177970063
|
|
|
|
| |
PiperOrigin-RevId: 177966156
|
|
|
|
| |
PiperOrigin-RevId: 177964932
|
|
|
|
|
|
|
|
| |
requiring
a ShapeTree.
PiperOrigin-RevId: 177956572
|
|
|
|
| |
PiperOrigin-RevId: 177956552
|
|
|
|
| |
PiperOrigin-RevId: 177953076
|
|
|
|
|
|
| |
that rng instructions are not rematerialized. This also lists Rng as non-rematerializable.
PiperOrigin-RevId: 177932160
|
|
|
|
|
|
| |
with input shape != output shape.
PiperOrigin-RevId: 177920882
|
|
|
|
| |
PiperOrigin-RevId: 177908680
|
|
|
|
|
|
|
|
| |
Use ShapedBuffer to allocate required memory for the shape, then transfer the
literal to the allocated addresses on each replica. Also, add Allocate() method
to ShapedBuffer.
PiperOrigin-RevId: 177900588
|
|
|
|
|
|
| |
rather than Pad.
PiperOrigin-RevId: 177896187
|
|
|
|
|
|
| |
Also arrange for continuous testing with GPUs.
PiperOrigin-RevId: 177895214
|
|
|
|
| |
PiperOrigin-RevId: 177892591
|
|
|
|
|
|
| |
This fixes subtle problems with partitioned variables.
PiperOrigin-RevId: 177892499
|
|
|
|
|
|
| |
copy.
PiperOrigin-RevId: 177891209
|
|
|
|
| |
PiperOrigin-RevId: 177890892
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The colocation attrs must be updated after all NodeDefs have been
processed. The nodes are processed and uniquified in topological
order, which allows us to update the inputs simultaneously due to the
topological ordering, but this doesn't work for the colocation groups.
I also considered updating all the NodeDefs with prefixes or unique
names at the very beginning, before starting conversion. This would
make the logic simpler, but require us to potentially keep a full copy
of all the NodeDefs in memory (so we could edit them), so I decided to
edit in-place after construction. We might want to consider this
alternate in future though.
PiperOrigin-RevId: 177890362
|
|
|
|
|
|
|
|
|
| |
Before, we assumed that if you passed --use_fake_data, you didn't care
about the output of the computation. With this patch, we decouple the
decision of using fake data from the decision of whether or not to print
the results.
PiperOrigin-RevId: 177889877
|
|
|
|
| |
PiperOrigin-RevId: 177886163
|
|
|
|
| |
PiperOrigin-RevId: 177884096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, we supported two algorithms for choosing the number
of threads per block:
* "optimize-for-latency" algorithm assumed that each thread would want
the maximum number of registers it could have, and chose a block size
small enough to accommodate this.
* "optimize-for-throughput" algorithm packed as many threads into a
block as possible.
In practice we always chose the optimize-for-latency algorithm.
This change removes the choice of algorithm and changes us to
unconditionally use a new one. In our new algorithm, we choose the
smallest block size that still has the potential to allow the GPU to
reach maximum occupancy.
When each thread's register usage is small, we can pack many of these
blocks into one SM and hit maximum occupancy. When the threads'
register usage is larger, we degrade gracefully (unlike with larger
block sizes, where the occupancy degredation is more quantized).
On our benchmarks, this is a moderate (0-10%) speedup on K40, and a
large (10-25%) speedup on P100.
PiperOrigin-RevId: 177879741
|
|
|
|
| |
PiperOrigin-RevId: 177878887
|
|
|
|
| |
PiperOrigin-RevId: 177877751
|
|
|
|
| |
PiperOrigin-RevId: 177876455
|
|
|
|
|
|
|
| |
* add a bfloat16 Python type and NumPy extension.
* allow the bfloat16 type in a number places in the Python libraries.
PiperOrigin-RevId: 177875784
|