aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Automated g4 rollback of changelist 177799252Gravatar A. Unique TensorFlower2017-12-05
| | | | PiperOrigin-RevId: 177989542
* [TF:XLA] Add support for FusedBatchNormGrad where is_training=False.Gravatar Peter Hawkins2017-12-05
| | | | | | Also add support for rank != 4 tensors to the TF/XLA fused batchnorm operators, although the TF core ops don't actually support other ranks yet so this is not tested. PiperOrigin-RevId: 177987592
* Only parse known flags in tf.app.run().Gravatar Yilei Yang2017-12-05
| | | | | | | | This requires absl-py 0.1.6. Also remove the manual tag on //tensorflow/python:app_test. PiperOrigin-RevId: 177986813
* Add the tf2xla_supported_ops tool, which dumps ops supported by tf2xla.Gravatar A. Unique TensorFlower2017-12-05
| | | | | | | | Also fix a TODO in XlaOpRegistry to filter by the types allowed by the OpDef. Also see #14798 PiperOrigin-RevId: 177986664
* Add ImportGraphDefOptions::uniquify_prefix.Gravatar Skye Wanderman-Milne2017-12-05
| | | | | | | This option is necessary to mimic the Python import_graph_def method's behavior. PiperOrigin-RevId: 177986165
* nn_impl.py cleanup: used keepdims instead of deprecated keep_dims.Gravatar A. Unique TensorFlower2017-12-05
| | | | PiperOrigin-RevId: 177972555
* Adding variant-based serialization and deserialization for sparse tensors.Gravatar Jiri Simsa2017-12-05
| | | | PiperOrigin-RevId: 177971801
* Simplify code in dependency optimizer.Gravatar A. Unique TensorFlower2017-12-05
| | | | | | | Change dependency optimizer to remove isolated NoOps when it is safe. Fix bug in arithmetic optimizer: Only remove deduped nodes if we know the fetches. PiperOrigin-RevId: 177970063
* Improve handling of operations that are known to TOCO but not to TF Lite.Gravatar A. Unique TensorFlower2017-12-05
| | | | PiperOrigin-RevId: 177966156
* Make RevBlock a subclass of LayerGravatar A. Unique TensorFlower2017-12-05
| | | | PiperOrigin-RevId: 177964932
* Add a helper to HloSharding to easily create trivial flat tuples without ↵Gravatar A. Unique TensorFlower2017-12-05
| | | | | | | | requiring a ShapeTree. PiperOrigin-RevId: 177956572
* Estimate Placeholder as a no-op.Gravatar Max Galkin2017-12-05
| | | | PiperOrigin-RevId: 177956552
* [TF:XLA] Add support for NCHW format to SpaceToDepth and DepthToSpace.Gravatar Peter Hawkins2017-12-05
| | | | PiperOrigin-RevId: 177953076
* [XLA] Mark Rng as side-effecting and add a rematerialization test to ensure ↵Gravatar Blake Hechtman2017-12-05
| | | | | | that rng instructions are not rematerialized. This also lists Rng as non-rematerializable. PiperOrigin-RevId: 177932160
* Fix bugs in neutral element code and add more unit tests to cover matmul ↵Gravatar A. Unique TensorFlower2017-12-05
| | | | | | with input shape != output shape. PiperOrigin-RevId: 177920882
* Generates a warning if the global step is not increased.Gravatar Jianwei Xie2017-12-04
| | | | PiperOrigin-RevId: 177908680
* Enable transferring a tuple literal to a replicated device.Gravatar Mark Heffernan2017-12-04
| | | | | | | | Use ShapedBuffer to allocate required memory for the shape, then transfer the literal to the allocated addresses on each replica. Also, add Allocate() method to ShapedBuffer. PiperOrigin-RevId: 177900588
* [TF2XLA] Change the implementation of Diag and MatrixDiag to use arithmetic ↵Gravatar A. Unique TensorFlower2017-12-04
| | | | | | rather than Pad. PiperOrigin-RevId: 177896187
* Reproduce an issue with MonitoredSession when saving a variable on a GPU.Gravatar Igor Saprykin2017-12-04
| | | | | | Also arrange for continuous testing with GPUs. PiperOrigin-RevId: 177895214
* Modifying _get_examples in graph_io.py to utilize tf.cond.Gravatar A. Unique TensorFlower2017-12-04
| | | | PiperOrigin-RevId: 177892591
* Treat integer default initializers like floating point ones.Gravatar Alexandre Passos2017-12-04
| | | | | | This fixes subtle problems with partitioned variables. PiperOrigin-RevId: 177892499
* Fix tf.identity(resource variable) with eager execution and a deviceGravatar A. Unique TensorFlower2017-12-04
| | | | | | copy. PiperOrigin-RevId: 177891209
* Add BF16 tests for reduce-window.Gravatar Yuanzhong Xu2017-12-04
| | | | PiperOrigin-RevId: 177890892
* Fix bug with uniquified colocation attrs in ImportGraphDef.Gravatar Skye Wanderman-Milne2017-12-04
| | | | | | | | | | | | | | | | The colocation attrs must be updated after all NodeDefs have been processed. The nodes are processed and uniquified in topological order, which allows us to update the inputs simultaneously due to the topological ordering, but this doesn't work for the colocation groups. I also considered updating all the NodeDefs with prefixes or unique names at the very beginning, before starting conversion. This would make the logic simpler, but require us to potentially keep a full copy of all the NodeDefs in memory (so we could edit them), so I decided to edit in-place after construction. We might want to consider this alternate in future though. PiperOrigin-RevId: 177890362
* [XLA] Add --print_result flag to replay_computation tool.Gravatar Justin Lebar2017-12-04
| | | | | | | | | Before, we assumed that if you passed --use_fake_data, you didn't care about the output of the computation. With this patch, we decouple the decision of using fake data from the decision of whether or not to print the results. PiperOrigin-RevId: 177889877
* Correct trivial spelling error in internal_convert_to_tensorGravatar A. Unique TensorFlower2017-12-04
| | | | PiperOrigin-RevId: 177886163
* Fix minor typos in the doc of SpaceToDepth and DepthToSpace.Gravatar Jingyue Wu2017-12-04
| | | | PiperOrigin-RevId: 177884096
* [XLA:GPU] Use more threads per thread block.Gravatar Justin Lebar2017-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change, we supported two algorithms for choosing the number of threads per block: * "optimize-for-latency" algorithm assumed that each thread would want the maximum number of registers it could have, and chose a block size small enough to accommodate this. * "optimize-for-throughput" algorithm packed as many threads into a block as possible. In practice we always chose the optimize-for-latency algorithm. This change removes the choice of algorithm and changes us to unconditionally use a new one. In our new algorithm, we choose the smallest block size that still has the potential to allow the GPU to reach maximum occupancy. When each thread's register usage is small, we can pack many of these blocks into one SM and hit maximum occupancy. When the threads' register usage is larger, we degrade gracefully (unlike with larger block sizes, where the occupancy degredation is more quantized). On our benchmarks, this is a moderate (0-10%) speedup on K40, and a large (10-25%) speedup on P100. PiperOrigin-RevId: 177879741
* [XLA] Add a default implementation of Literal::ToString for rank >= 6 tensors.Gravatar Peter Hawkins2017-12-04
| | | | PiperOrigin-RevId: 177878887
* Add a single capacity prefetch to `tf.contrib.data.read_batch_features`.Gravatar A. Unique TensorFlower2017-12-04
| | | | PiperOrigin-RevId: 177877751
* hsv_in_yiq gpu implementation.Gravatar A. Unique TensorFlower2017-12-04
| | | | PiperOrigin-RevId: 177876455
* Enable bfloat16 use from Python:Gravatar Peter Hawkins2017-12-04
| | | | | | | * add a bfloat16 Python type and NumPy extension. * allow the bfloat16 type in a number places in the Python libraries. PiperOrigin-RevId: 177875784
* Fix ResourceVariable's docstring example.Gravatar Reed Wanderman-Milne2017-12-04
| | | | PiperOrigin-RevId: 177875589
* [StreamExecutor] Add UnqueryableDeviceParams for all nvidia GPUs.Gravatar Justin Lebar2017-12-04
| | | | | | | | | | | Some properties of nvidia GPUs cannot be queried via the driver API -- these are hardcoded in the UnqueryableDeviceParams struct in StreamExecutor. Before this change, we only had values for sm_35. This change adds the values for all other nvidia GPUs, sm_20 through sm_70. PiperOrigin-RevId: 177874401
* Fix edge case with ImportGraphDefOption.uniquify_names = true.Gravatar Skye Wanderman-Milne2017-12-04
| | | | | | | | | | | | | | | This change fixes the case where a newly-generated uniquified name conflicts with another NodeDef being imported (the original NodeDef names are required to be unique among each other, so this is only an issue when we create new names). Note that this behavior is not well defined in the Python import_graph_def method. It will always generate unique names, but the exact naming scheme may depend on the order the NodeDefs are imported. I didn't write a corresponding Python unit test or try to make this change produce the same names for this reason. PiperOrigin-RevId: 177872720
* Allow test_util.evaluate handle nested tensors.Gravatar Sergio Guadarrama2017-12-04
| | | | PiperOrigin-RevId: 177871523
* Sort sections in operation semantics alphabetically.Gravatar Nick Desaulniers2017-12-04
| | | | PiperOrigin-RevId: 177871286
* Fix TFGAN's `clip_weights_test.py` bugs.Gravatar A. Unique TensorFlower2017-12-04
| | | | PiperOrigin-RevId: 177870577
* Internal change.Gravatar Anna R2017-12-04
| | | | PiperOrigin-RevId: 177869591
* [XLA:CPU] Avoid over-aligning parameter buffersGravatar Sanjoy Das2017-12-04
| | | | | | | | | We sometimes pass scalars to non-entry computations, and since these are pointers pointing to elements in a buffer and are not individually allocated buffers, they don't have to follow the same alignment rules as buffers, even though they incidentally do so today. PiperOrigin-RevId: 177868506
* [XLA:CPU] Use an AVX optimized reduction step for row-major matrix-vector dotGravatar Sanjoy Das2017-12-04
| | | | | | The optimization is to use the vhaddps instruction when possible. PiperOrigin-RevId: 177868238
* Internal-only changesGravatar Shanqing Cai2017-12-04
| | | | PiperOrigin-RevId: 177865604
* Marks args as runtime consts in XLA EncapsulateSubgraphsPass.Gravatar Vinu Rajashekhar2017-12-04
| | | | | | | - Using the GuaranteeConstOp. - Runs a backwards analysis on the args to see if all the paths lead to GuaranteeConstOps/ConstOps. PiperOrigin-RevId: 177862716
* Update pin for bazel-toolchains to latest versionGravatar A. Unique TensorFlower2017-12-04
| | | | | | https://github.com/bazelbuild/bazel-toolchains/releases/tag/b49ba36 PiperOrigin-RevId: 177858255
* Getting rid of obsolete function is_variable_registered from ↵Gravatar A. Unique TensorFlower2017-12-04
| | | | | | LayerCollection. Replaced it with a simple function that returns a list of all the registered variables. PiperOrigin-RevId: 177857623
* Sanitize formatting in IdTableWithHashBuckets doc comment.Gravatar Max Galkin2017-12-04
| | | | | | | | | | Fixes list formatting and sanitizes words in angle brackets, which aren't rendered in the web doc: https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/lookup/IdTableWithHashBuckets Follows the working formatting example of TextFileInitializer. PiperOrigin-RevId: 177856349
* [TF:XLA] Fix wrong output of FloorDiv op for DT_HALF values.Gravatar Peter Hawkins2017-12-04
| | | | PiperOrigin-RevId: 177851804
* Apply oss_serial tag to tests that use portpicker to create local clustersGravatar Shanqing Cai2017-12-04
| | | | | | to avoid port conflicts with other tests during parallel bazel tests. PiperOrigin-RevId: 177851615
* Actually use ApiDef when generating Python API.Gravatar Anna R2017-12-04
| | | | PiperOrigin-RevId: 177851421
* [XLA:GPU] Switch from specifying maxntid to reqntid.Gravatar Justin Lebar2017-12-04
| | | | | | | | | | | | maxntid specifies the max number of threads in a block, whereas reqntid says that we will use *exactly* this many threads in a block. This doesn't have any effect on the benchmarks I ran, but we might as well do it in case it helps ptxas make a better decision at some point on some GPU. At least it will prevent the next person to come along from doing this same investigation I just did. :) PiperOrigin-RevId: 177851116