aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
...
* Add veneer for SQLite C APIGravatar Justine Tunney2017-10-12
| | | | PiperOrigin-RevId: 171983705
* Automated g4 rollback of changelist 170358888Gravatar David Majnemer2017-10-12
| | | | PiperOrigin-RevId: 171982861
* Make keras estimator_test less flaky.Gravatar Yifei Feng2017-10-12
| | | | PiperOrigin-RevId: 171982493
* eager: Expose tfe.run_test_in_graph_and_eager_modes decorator. Useful when ↵Gravatar A. Unique TensorFlower2017-10-12
| | | | | | writing tests of libraries. PiperOrigin-RevId: 171973311
* Wrap grad_ys tensors passed to tf.gradients in the tf.gradients name scope.Gravatar Eugene Brevdo2017-10-12
| | | | | | Fixes #13355. PiperOrigin-RevId: 171972633
* Add warm_starting_utils for Estimators.Gravatar A. Unique TensorFlower2017-10-12
| | | | PiperOrigin-RevId: 171966540
* Use a serialized graph compiler to generate xla graph.Gravatar Yunxing Dai2017-10-12
| | | | | | | | | - Move away from previous TF graph executor, which contains few features that we need and also introduces indeterminism. - Unlike previous executor, the new serial graph compiler doesn't recurse into a function and inlines it. Instead, it creates a computation of the function and then creates a `call` op to call into the newly created computation. - Add a optional comparator in DFS algorithm, which is needed to make the compiler deterministic. RELNOTES: Use a determinisitc executor to generate xla graph. PiperOrigin-RevId: 171962775
* Minimal support for running OpsTest on GPU, using CUDA unified memory.Gravatar A. Unique TensorFlower2017-10-12
| | | | PiperOrigin-RevId: 171961190
* [TF:TPU] Move the metadata for tpu.replicate() into a separate ↵Gravatar Peter Hawkins2017-10-12
| | | | | | TPUReplicateMetadata graph node, rather than attaching a copy of it to every node that is to be replicated. PiperOrigin-RevId: 171957514
* BUILD cleanup in contrib/boosted_trees/...Gravatar A. Unique TensorFlower2017-10-12
| | | | PiperOrigin-RevId: 171956450
* Made sure that the virtual placer correctly handles short device namesGravatar Benoit Steiner2017-10-12
| | | | PiperOrigin-RevId: 171931173
* Optimized C++ and CUDA kernels for matrix_set_diag op. The new code is ↵Gravatar A. Unique TensorFlower2017-10-12
| | | | | | faster and more readable and avoids an issue with using the Eigen generator mechanism with GPUs on Windows. PiperOrigin-RevId: 171924800
* Add more explicit carve-out for experimental proto fields.Gravatar Martin Wicke2017-10-11
| | | | PiperOrigin-RevId: 171919244
* Go: Update generated wrapper functions for TensorFlow ops.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171918115
* Update ops-related pbtxt files.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171917856
* Improves error message when labels is None.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171917834
* [tf.data] Add `tf.contrib.data.get_single_element()`.Gravatar Derek Murray2017-10-11
| | | | | | | | This utility function is designed for using a `tf.data.Dataset` in a serving context, where it is useful for expressing the stateless transformation from a fed-in batch into the serving input. PiperOrigin-RevId: 171915928
* Disabling a couple of ClusterFLR tests since test clusters in GRPC seem to ↵Gravatar Rohan Jain2017-10-11
| | | | | | have issues with multiple servers and have intermittent failures (https://github.com/grpc/grpc/issues/10142) PiperOrigin-RevId: 171915902
* Automated g4 rollback of changelist 171877766Gravatar Anna R2017-10-11
| | | | PiperOrigin-RevId: 171915087
* Update docstring for tpu-configGravatar Jianwei Xie2017-10-11
| | | | PiperOrigin-RevId: 171914551
* Internal change.Gravatar Anna R2017-10-11
| | | | PiperOrigin-RevId: 171913954
* Optimize gradients for MeanGravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171904584
* Implement NCHW_VECT_C support for tf.depth_to_space on GPU.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171904046
* Update ops-related pbtxt files.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171900256
* Go: Update generated wrapper functions for TensorFlow ops.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171895671
* Extend the transpose ops in TensorFlow to support conjugate (a.k.a. ↵Gravatar A. Unique TensorFlower2017-10-11
| | | | | | | | | | | | Hermitian) transposition. Currently, this can only be accomplished by adding extra conjugation ops, which means reading the tensor data from memory twice. More importantly, Hermitian transpose is the most common transposition operation when using complex arithmetic, so using it in new code helps prevent "conjugation bugs" by making the math work for real and complex types alike. The alias tf.linalg.adjoint was added to help with the latter. Optimized fused conjugate transpose op for GPU will be added in a followup. Get rid of some duplication of code among CPU/GPU/SYCL in transpose_functor. Support accelerating 2D transpose ops using MKL in more cases. PiperOrigin-RevId: 171895454
* Link RNN ops/kernels in contrib/BUILD.Gravatar Adam Roberts2017-10-11
| | | | PiperOrigin-RevId: 171890081
* Fix LSTM tests to use the same parameters for all blocks. Now that the testsGravatar A. Unique TensorFlower2017-10-11
| | | | | | are correct, lower the tolerance from 1e-2 to 1e-6. PiperOrigin-RevId: 171885525
* Changed embedding op to use parallel version of DynamicStitch.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171884257
* [XLA:CPU] Switch TF gather's HLO implementation to use dynamic-update-slice ↵Gravatar Justin Lebar2017-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in a "while" loop. Benchmarks results (times in ms): nontrivial_gather.axis0_cpu: 0.110 nontrivial_gather.axis0_xla_cpu: 0.139 nontrivial_gather.axis1_cpu: 0.093 nontrivial_gather.axis1_xla_cpu: 0.142 nontrivial_gather.axis4_cpu: 1.183 nontrivial_gather.axis4_xla_cpu: 2.658 slice_gather.axis0_cpu: 0.00388 slice_gather.axis0_xla_cpu: 0.00397 slice_gather.axis1_cpu: 0.00421 slice_gather.axis1_xla_cpu: 0.00427 slice_gather.axis4_cpu: 0.252 slice_gather.axis4_xla_cpu: 0.114 As you can see, the pure-XLA implementation is slower in all the nontrivial cases and as-fast or faster in the slice-gather cases. The slice-gather cases are gathers that can be implemented as a single XLA dynamic-slice, and so the speedup here is likely understated: Once we can simplify the gather to a single dynamic-slice, we should be able to do many other optimizations to it, ideally fusing it so it has zero cost. The nontrivial gathers all gather more than one element, and are implemented with an XLA while loop. The most important one is the axis 0 gather -- gathering from an inner dimension is so slow no matter what you do that it's probably not worth optimizing. It's possible to make this XLA implementation faster -- one option I've considered is "unrolling" the gather into a series of dynamic-slice's that are then concat'ed together. This would be totally fusable, unlike the implementation in this CL. Another option would be adding a notion of uninitialized memory into XLA -- part of what makes us slow is that we have to initialize the memset our output to 0 before we overwrite it. But given that the shape we're benchmarking here is totally arbitrary, and given that we're getting decent performance, I think this is good enough to start with. PiperOrigin-RevId: 171883273
* [XLA:CPU] Adds intra-op parallelism to the "sequential" CPU backend (which ↵Gravatar A. Unique TensorFlower2017-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | already has intra-op parallelism for library calls). Adds support for parallel task assignment to instructions in entry (or embedded) computations. Adds code to emit calls to a new a runtime parallel fork/join function for instructions which have been assigned parallel tasks. Adds a simple cost model for I/O bound instructions. *) Translation (deleuze model) wall time (seconds). large_model small_model small_model_small_attn sequential: 0.00556 0.00484 0.00155 parallel: 0.00263 0.00163 0.00106 *) Wavenet sequential: Avg. latency (30 runs): 1026.13ms, min/max: 988/1108ms parallel: Avg. latency (30 runs): 800.633ms, min/max: 785/818ms *) ParallelFusion benchmark. Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- sequential cpu backend (at head) 610584 611467 1000 parallel cpu backend 153241 836097 4528 sequential cpu backend (this CL) 113482 679535 6017 PiperOrigin-RevId: 171877766
* Define Eager-safe Network to hold the composition of Layers.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171876670
* [XLA:CPU] Add an in-place implementation of fused-dynamic-update-slice.Gravatar Justin Lebar2017-10-11
| | | | | | | | | | | This implementation, which applies when a loop-fusion node's root is a dynamic-update-slice whose input operand and output share the same buffer slice, is much faster than the out-of-place implementation. This patch also unifies the implementation of the CPU and GPU versions of this algorithm. PiperOrigin-RevId: 171863142
* Build file cleanup for iOS.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171853263
* More Variant cross-device support:Gravatar Eugene Brevdo2017-10-11
| | | | | | | * Remove HostConstraint for ops taking Variants; they can now be copied from/to Device. * Add ResourceVariable assign operations that support variants. PiperOrigin-RevId: 171845029
* Implement NCHW_VECT_C support for tf.space_to_depth on GPU.Gravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171843463
* Fixes typo in hparams commentGravatar A. Unique TensorFlower2017-10-11
| | | | PiperOrigin-RevId: 171842961
* Add gradient for SVD. This is based on draft code by Catalin Ionescu ↵Gravatar A. Unique TensorFlower2017-10-11
| | | | | | | | | | | | | | | | | | | (cdi@google.com), using the algorithm outlined in Mike Giles' paper: http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf. This initial version has the following restrictions: Only supports statically known inner matrix dimensions m and n. Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: a) Only supports real tensors. b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1. c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b). Feature request on Github: #6503 This CL also adds support for calling tf.real, tf.imag, and tf.angle with real arguments. PiperOrigin-RevId: 171836140
* Add tf.contrib.distributions.bijectors.Permute.Gravatar Joshua V. Dillon2017-10-11
| | | | PiperOrigin-RevId: 171833156
* Copy public tf.contrib.graph_editor.reroute_{inputs,outputs} docs.Gravatar Dan Ringwalt2017-10-11
| | | | | | | | There are multiple references "see reroute_inputs" which are unhelpful because the full docstring now only exists on _reroute_sgv_inputs (likewise for reroute_outputs). Copy most of the docstring to reroute_{inputs,outputs} so that it is outputted in the docs. Update some other dangling doc references from _reroute to _reroute_sgv, but that docstring will not be included the docs. PiperOrigin-RevId: 171821659
* Internal change.Gravatar Anna R2017-10-11
| | | | PiperOrigin-RevId: 171789232
* TensorFlow base ApiDefs and tests to make sure they are kept in sync.Gravatar Anna R2017-10-11
| | | | PiperOrigin-RevId: 171788007
* Improves "SparseTensor labels are not supported" error message.Gravatar A. Unique TensorFlower2017-10-10
| | | | PiperOrigin-RevId: 171775503
* Automated g4 rollback of changelist 171769504Gravatar Sanjoy Das2017-10-10
| | | | PiperOrigin-RevId: 171774816
* Minor code cleanup in grappler cost estimation.Gravatar Max Galkin2017-10-10
| | | | PiperOrigin-RevId: 171772766
* Add some CPU specific test casesGravatar Sanjoy Das2017-10-10
| | | | PiperOrigin-RevId: 171769504
* * Passing `training_features` (without weight column) instead of `features` ↵Gravatar A. Unique TensorFlower2017-10-10
| | | | | | | | into GradientBoostedDecisionTreeModel. * Export GTFlow model into generic format with features defined in proto. PiperOrigin-RevId: 171766066
* [XLA] Simplify trivial dynamic-slices.Gravatar Justin Lebar2017-10-10
| | | | | | | | | | Also make the dynamic-update-slice simplification respect the is_layout_sensitive_ flag in algebraic-simplifier While we're here, make the algebraic-simplifier test use the new HloVerifiedTestBase class. PiperOrigin-RevId: 171759708
* Add an option to apply ModelPruner when building a grappler item and an ↵Gravatar Max Galkin2017-10-10
| | | | | | option to provide specific feed nodes to the item builder. PiperOrigin-RevId: 171758733
* Fix docstring typos in tf.distributions.bijectors.Bijector.Gravatar A. Unique TensorFlower2017-10-10
| | | | PiperOrigin-RevId: 171756150