aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/framework/rendezvous.cc
Commit message (Collapse)AuthorAge
* Optimize the destruction of CancellationManager and LocalRendezvousImpl.Gravatar Derek Murray2018-03-08
| | | | | | | In the common case of clean termination, we can avoid performing several atomic operations and allocations. PiperOrigin-RevId: 188339594
* Removed StringPiece::set and StringPiece::clear, as they have no ↵Gravatar A. Unique TensorFlower2017-11-10
| | | | | | | | | | | | | absl::string_view equivalents. This will allow for a more convenient transition to absl::string_view. Calls to set StringPiece::set and StringPiece::clear were replaced with the StringPiece constructor as follows: string_piece_foo.set(data, size) => string_piece_foo = StringPiece(data, size) string_piece_foo.clear() => string_piece_foo = StringPiece() PiperOrigin-RevId: 175326576
* Merge changes from github.Gravatar Benoit Steiner2017-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | END_PUBLIC --- Commit 9f8523640 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 173145770 --- Commit 01b6b0638 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Cut tracing memory cost PiperOrigin-RevId: 173144626 --- Commit 5e23e0e67 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Erase cloned instructions on the fly when merging fusion nodes. This avoids the awkward situation where an RNG which is clearly eligible for fusion becomes ineligible mid-fusion because it suddenly has an extra (dead) user. PiperOrigin-RevId: 173141716 --- Commit 1038927c0 authored by Saurabh Saxena<srbs@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add SerializeIterator op that serializes an IteratorResource into a variant tensor. Add DeserializeIterator op that builds IteratorResource from a variant tensor. Move BundleReaderWrapper and BundleWriterWrapper from dataset.h to iterator_ops.cc. Add generic key-value store interfaces IteratorStateReader and IteratorStateWriter for reading/writing state of iterators. Get rid of IteratorBundleReader and IteratorBundleWriter. PiperOrigin-RevId: 173140858 --- Commit 57f3e529d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 173136642 --- Commit 0e56ffb7b authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix breakages in OSS builds See example breakages logs at: http://ci.tensorflow.org/job/tensorflow-cl-cpu-python3-pip/10847/console http://ci.tensorflow.org/job/tensorflow-cl-gpu/11008/console 1. CL/172477381 added the no_oss tag to tests with oss_serial tags, which broke the logic of OSS_SERIAL tests in pip.sh and run_pip_test.sh. This CL fixes that. 2. The nccl_kernels BUILD target in contrib/nccl/BUILD was missing some dependencies. This CL adds the missing ones. Fixes: #13918 PiperOrigin-RevId: 173133914 --- Commit 3ed049b67 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allows calling keras layers in eager mode. PiperOrigin-RevId: 173129805 --- Commit 4ec6f2b07 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Switching contrib.summaries API to be context-manager-centric PiperOrigin-RevId: 173129793 --- Commit 03b02ffc9 authored by Justine Tunney<jart@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Put Bazel mirror URLs first PiperOrigin-RevId: 173127955 --- Commit 46ab25e4d authored by David Majnemer<majnemer@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add support for convolutions with no spatial dimensions PiperOrigin-RevId: 173126950 --- Commit fc56349b7 authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.data] Convert dataset arguments to tensors as early as possible. This change raises a `TypeError` earlier if (for example) the `batch_size` argument to `Dataset.batch()` has the incorrect type. PiperOrigin-RevId: 173126678 --- Commit 4f7503a87 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: Support for registering multiple minibatches with register_fully_connected() PiperOrigin-RevId: 173121735 --- Commit 2845bfcd6 authored by Tim Harley<tharley@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Avoid listing all modified Enter/RefEnter nodes on INFO, use VLOG(1) instead. Leave a single, simple, message on INFO. PiperOrigin-RevId: 173121726 --- Commit 434695921 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: _check_registration() supports multiple towers. PiperOrigin-RevId: 173115870 --- Commit 670dddf4a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Multi-minibatch support for tf.contrib.kfac.fisher_blocks.FullyConnectedKFACBasicFB. PiperOrigin-RevId: 173109677 --- Commit dc13a8e2f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix import of meta graphs with partitioned variables into a scope. Saver inspects SliceInfo to decide the variable name when creating a checkpoint. Before this fix even if a partitioned variable ("weights") was imported into a scope "a" it would still be checkpointed as ("weights") instead of ("a/weights") since import_scoped_meta_graph was not adjusting the SliceInfo. WARNING: if you use import_meta_graph on graphs with partitioned_variables WITH an import_scope argument AND then create a Saver to write/read checkpoints this change may break your checkpoint loading. PiperOrigin-RevId: 173105796 --- Commit eea089bdb authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: Multi-tower support for ConvDiagonalFB. PiperOrigin-RevId: 173105412 --- Commit 9b9cbbe2a authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 Tperm type support for `Transpose` (#13909) * Add int64 Tperm type support for `Transpose` This fix adds int64 Tperm support for `Transpose`. In `array_ops.cc`, `Transpose` and `ConjugateTranspose` have been specified as accepting int32 and int64 perm types. However, only int32 kernels has been registered. This fix adds the int64 perm support by removing the constraint on Tperm, resolve the type at runtime, and copying the data type accordingly to correctly handle the int64/int32 types. Additional tests have been added as well. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 of perm in Transpose. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add namespace to hide PermutationHelper Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable use_gpu=True for perm type test. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * extra // namespace annotation * Adding a comment about int32 casting that should be safe. Permutations only contain values that refer to dimensions, and the maximum number of dimensions we have is 254, so an int32 is always safe here. --- Commit ac0004e71 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 shape support on GPU for stateless random ops. (#13908) * Add int64 shape support on GPU for stateless random ops. This fix adds int64 shape support on GPU for stateless random ops `StatelessRandomUniform`, `StatelessRandomNormal`, `StatelessTruncatedNormal`. The int64 shape for stateless random ops is already supported on CPU with int32/int64 processed properly through `MakeShape`. However, on GPU a type constraint `.TypeConstraint<int32>("T")` has been improperly added. Such a type constraint actually prevents an int64 shape type to run on GPU. (As a comparision, no type constraint on CPU). This fix removes the type constraint and allows int64 shape to be run on GPU. This fix also adds test cases for int64 shape support on stateless random ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 shape support for stateless random ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int32 to shape types tested. --- Commit 0d437c3be authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 padding support for MirrorPad (#13907) * Add int64 padding support for MirrorPad This fix adds int64 padding support for `MirrorPad`. In the `array_ops.cc` the `MirrorPad`/`MirrorPadGrad` has been specified as supporting int64 padding. The related kernels does not have the int64 padding registered though. This fix adds the int64 padding support. This fix also adds additional test cases for coverage. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update template for CPU and GPU support of int64 paddings. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 padding support for MirrorPad Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Put eigen header first like before, just in case. --- Commit 690003cc0 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add `int64` type `multiples` support for `tf.tile` (#13884) * Add `int64` type `multiples` support for `tf.tile` In the doc of `tf.tile` (tf.tile.__doc__) both `int32` and `int64` are supported for `multiples`. However, the kernel for `int64` is not registered yet. This fix adds the support of `int64` `multiples` so that the behavior matches the description of the docs. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update functors for int64 multiples support in `tf.tile` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update test cases for int64 of multiples in `tf.tile` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add GPU and non GPU tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * format with clang-format -i Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Move Tmultiples after T (as it is auxilliary) And use `use_gpu=True` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit fd8d517b9 authored by Yunxing Dai<yunxing@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add tests for convolution 1D RELNOTES: n/a PiperOrigin-RevId: 173060283 --- Commit 40c475b48 authored by formath<jinpengliu@163.com> Committed by Vijay Vasudevan<vrv@google.com>: add segment_reduction_ops to tf_op_files (#13901) --- Commit bfa4ec194 authored by Tayo Oguntebi<10927929+tayo@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Update node_def.proto comments (#13874) The device field had outdated comments. Note: We could consider adding tpu as an example here, e.g. "gpu" | "cpu" | "tpu". Thoughts? --- Commit c9cb5a58d authored by formath<jinpengliu@163.com> Committed by Vijay Vasudevan<vrv@google.com>: protobuf lib path bug fix for benckmark on osx (#13878) --- Commit 1c1dad105 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 axis support for reduction ops. (#13891) * Add int64 axis support for reduction ops. This fix is a follow up to PR 13863. In PR 13863 the program crash is fixed if int64 axis is passed to reduction ops, e.g. reduce_sum, reduce_max, etc. However, 13863 does not process the case of int64 support, it merely fixes the crash. This fix adds the support for int64 axis of reduction ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for mean, prod, sum Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for min and max. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for reduce_all and reduce_any Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 axis support of reduce_any and reduce_all Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 17096081e authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Improve resize_bicubic performance by reorganizing loops (#13840) * Improve resize_bicubic performance by reorganizing loops This fix tries to address the issue raised in 13693 where performance of `resize_bicubic` is not on par with opencv. This fix rearranges the loops so that it is the same for num_channel=40 and num_channel=3: Pre-fix: ``` CHANNEL=40 opencv: 145.08ms tf: 314.26ms CHANNEL=3 opencv: 11.95ms tf: 8.95ms ``` Post-fix: ``` CHANNEL=40 opencv: 144.25ms tf: 214.55ms CHANNEL=3 opencv: 11.78ms tf: 14.07ms ``` This fix fixes 13693. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Keep special handling of `num_channels=3` for `resize_bicubic` This commit keeps special handling of `num_channels=3` for `resize_bicubic`: Without special handling: ``` opencv: 11.78ms tf: 14.07ms ``` With special handling: ``` opencv: 11.74ms tf: 9.46ms ``` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Expand Benchmark test for resize_bicubic Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update from review feedback. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit b927df57f authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Update protobuf.cmake to b04e5cba356212e4e8c66c61bbe0c3a20537c5b9 (#13893) This fix tries to address the issue raised in 8187 where protobuf.cmake used different version as bazel. The reason for discrepancy was due to the fact that a customerized protobuf was needed with Windows patch. Since the patch has been merged in (https://github.com/google/protobuf/pull/2203), it makes sense to update protobuf.cmake so that the same version of cmake is used. This fix fixes 8187. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit d1183ca6a authored by Vijay Vasudevan<vrv@google.com> Committed by GitHub<noreply@github.com>: Give each variable a unique name in accumulate_n_v2_eager_test. (#13886) --- Commit a69945810 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update pin for bazel-toolchains to latest version PiperOrigin-RevId: 173002530 --- Commit 9d55c249c authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix doc in TF_CALL_ when invoked in mobile platform (#13881) * Fix doc in TF_CALL_ when defined(IS_MOBILE_PLATFORM) && !defined(__ANDROID_TYPES_FULL__) This is a small doc fix that includes bool as part of the types that is supported in mobile (IS_MOBILE_PLATFORM && !__ANDROID_TYPES_FULL__), as bool is clearly invoked in the following define. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Also add bool to android full version. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit ba49d8583 authored by Bjarke Hammersholt Roune<broune@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Slight change to reduce_test to avoid generating inf, which was triggering an inf detector unnecessarily. PiperOrigin-RevId: 172965466 --- Commit 93e8f3c67 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adding Python ApiDef overrides. PiperOrigin-RevId: 172960496 --- Commit 0d6a2e353 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change. PiperOrigin-RevId: 172960439 --- Commit 62df65c72 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add dtype argument to Mean and Accuracy object-oriented metrics. PiperOrigin-RevId: 172957714 --- Commit d7409d32b authored by Simone Cirillo<my.accounts@gmx.se> Committed by Vijay Vasudevan<vrv@google.com>: Fix import of spatial_softmax from tensorflow.contrib.layers (#13833) --- Commit df8bce63d authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix crash when `int64` axis is passed to `tf.reduce_sum` (#13863) * Fix crash when `int64` axis is passed to `tf.reduce_sum` This fix tries to fix the crash triggered by `int64` axis passed to `tf.reduce_sum`: ``` ubuntu@ubuntu:~/tensorflow2$ (cd && python) Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> v = tf.reduce_sum([1,2,3], tf.constant(0, tf.int64)) 2017-10-20 15:55:06.993430: F tensorflow/core/framework/tensor.cc:601] Check failed: dtype() == expected_dtype (9 vs. 3) ubuntu@ubuntu:~/tensorflow2$ ``` The issue is caused by the fact that shape inference in `common_shape_fns.cc` only assumes int32 without proper handling of diffent types. In `math_ops.cc` both int32 and int64 are mentioned. NOTE that this fix does not address the issue that int64 is not supported. To allow int64 axis it is more than adding a template in `ReductionOp` as the type of the axis seems to be decided by some other ways in Eigen. This fix merely fixed the crash so that an error message will return without exit from the python program "No OpKernel was registered to support Op 'Sum' with these attrs". Still, I think its worth to at least allow the program to continue in case of unsupported kernel. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update implementation with a template helper function. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 29c7b4658 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adding the Stanford Tensorflow class to community resources. PiperOrigin-RevId: 172956049 --- Commit f758b24a8 authored by Alexandre Passos<apassos@google.com> Committed by Vijay Vasudevan<vrv@google.com>: Variable name for the eager test (#13873) --- Commit a5fe66b15 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Removed some unnecessary broadcasts in binary ops where only one input needs broadcasting (which is a fairly common case, even in the fallback path). PiperOrigin-RevId: 172950493 --- Commit c77090a0a authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix issues where int64 crops could not be passed to batch_to_space. (#13862) * Fix issues where int64 crops could not be passed to batch_to_space. This fix tries to address the issue where int64 `crops` could not be passed to `batch_to_space` even though both int32 and int64 are specified as supported in the docs (tf.batch_to_space.__doc__) The reason is that BatchToSpace kernel puts a constraint of int32 to crops data types. This fix removed the constraint so that int64 `crops` could be supported. NOTE: Just removing the constraint should work and it is not necessary to add specification to the kernel class template, as `SubtleMustCopyFlat` called in the class already correctly handled both int32 and int64 cases. Besides, other data types (e.g., float or double) will not be passed to the kernel as they are guarded by the specification in `array_ops.cc`. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Also remove int64/int32 type constraints for SpaceToBatch kernels Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 crops of batch_to_space and space_to_batch Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix test failures. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 494837936 authored by Joshua V. Dillon<jvdillon@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make `tf.contrib.distributions` quadrature family accept a `Tensor` for `quadrature_grid_and_probs` argument. PiperOrigin-RevId: 172950094 --- Commit 9c825d32c authored by Jinze Bai<baijinze1994@163.com> Committed by Vijay Vasudevan<vrv@google.com>: Merge two GPU kernel launching to one in DiagOp. (#13859) --- Commit c0ca50a47 authored by Yan Facai (???)<facai.yan@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: ENH: add Relu6GradGrad (#13268) * ENH: add Relu6GradGrad * TST: add test case * CLN: import nn_grad * TST: add init value --- Commit 8ff33271e authored by Justin Lebar<jlebar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Dump the computation's SessionModule as part of the tf_compile rule. PiperOrigin-RevId: 172946149 --- Commit ebcae4a5e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add streaming_precision_recall_at_equal_thresholds This helper method computes streaming tp, fp, tn, fp, precision, and recall for the user in a way that exhibits O(T + N) time and space complexity (instead of O(T * N)), where T is the number of thresholds and N is the size of the predictions tensor. Thanks to Frank Chu for the efficient algorithm! PiperOrigin-RevId: 172946073 --- Commit ccfd9c1e5 authored by Sanjoy Das<sanjoy@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Log Hlo IR during AOT compilation PiperOrigin-RevId: 172944165 --- Commit 985031a10 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allows tfe.enable_eager_execution(device_policy=tfe.DEVICE_POLICY_WARN). PiperOrigin-RevId: 172943398 --- Commit 703182d85 authored by Mingxing Tan<tanmingxing@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add performance guide for fused decode_and_crop_jpeg optimization. PiperOrigin-RevId: 172943116 --- Commit 66b1f4383 authored by Francois Chollet<fchollet@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make Network compatible with eager mode. Currently it only allows to instantiate a Network in eager mode using the regular Keras API, and call it on eager tensors. PiperOrigin-RevId: 172942569 --- Commit 41df2cec2 authored by ashankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Testing pending CL: 172939383 --- Commit 37fd95179 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Simplifies capturing code in graph_callable to use recent function improvements. PiperOrigin-RevId: 172937003 --- Commit d1e7382af authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 172924803 PiperOrigin-RevId: 173347587
* Internal change.Gravatar A. Unique TensorFlower2017-07-24
| | | | PiperOrigin-RevId: 162986106
* Internal cleanup.Gravatar A. Unique TensorFlower2017-07-21
| | | | PiperOrigin-RevId: 162809937
* Removes tolerate_dup_recv from LocaRendezvous.Gravatar A. Unique TensorFlower2017-07-21
| | | | PiperOrigin-RevId: 162782660
* Minor cleanup: Remove unused BUILD dependencies and unnecessary code.Gravatar A. Unique TensorFlower2017-06-02
| | | | PiperOrigin-RevId: 157837211
* Changed the second argument of WaitForNotificationWithTimeout() to mean ↵Gravatar Yuan Yu2017-02-16
| | | | | | microseconds from milliseconds. Change: 147764063
* Updated DirectSession::RecvOutputs to take into account the session timeouts.Gravatar Benoit Steiner2016-11-18
| | | | Change: 139591752
* Use gtl::FlatMap and gtl::FlatSet in several places instead ofGravatar A. Unique TensorFlower2016-10-31
| | | | | | | | | | std::unordered_map and std::unordered_set. Add default template argument of std::hash<Key> to gtl::FlatMap and gtl::FlatSet to better match std::unordered_{map,set} Improves performance on an RPC-intensive benchmark by ~0.4% Change: 137754417
* Fix data race in LocalRendezvousImpl - item was used outside the lock, butGravatar A. Unique TensorFlower2016-08-10
| | | | | could be deleted by the abort. Change: 129916678
* Several small changes to reduce the number of memory allocations alongGravatar A. Unique TensorFlower2016-07-16
| | | | | | | | | | | | | | Send/Recv paths: o Allow SendOp and RecvOp implementations to directly use the string buffer contained in a Rendezvous::ParsedKey object, rather than allocating their own string object. Saves two allocations per Send/Recv pair. o Use std::move in a few places to avoid copying a std::function object. o Eliminated unused ParsedName variable declaration in DeviceNameUtils::ParseLocalName. Change: 127630066
* A series of changes to significantly reduce the number of allocationsGravatar A. Unique TensorFlower2016-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | done by models distributed across many devices. A small microbenchmark model that runs two banks (A and B) of 30 nodes with a 30x30 full shuffle between them, where each of the nodes in A and in B run with one node on each of the 30 devices (so 30*29+30+30, or ~930 separate RPCs) was showing ~111,000 allocations per iteration of the graph. With the changes here, this is now down to ~64,300 allocations per iteration. Changes include: o DeviceContext::CopyDeviceTensorToCPU and related helper routines: use StringPiece instead of const string& for the tensor name (avoids creating a string in some cases where the caller only has a StringPiece available). o Change some Rendezvous and BaseRemoteRendezvous interfaces to take a 'const Rendezvous::ParsedKey& key', rather than 'const string& key'. In many cases, the callers were already having to parse the key into a ParsedKey, and so we were doing the parsing multiple times at different levels as we processed receiving or sending of a tensor. This reduces the number of times that we parse a key as it flows from a Send node through to a Recv node on another worker. o Changed Rendezvous::ParsedKey so that it makes a copy of the underlying full key, and then uses StringPiece objects to point into this copy for the src_device, dst_device, and edge_name pieces. This turns 3 string allocations into 1 per Rendezvous::ParseKey call. o Added new StringPiece Rendezvous::ParsedKey::FullKey() accessor to return a StringPiece for the underlying full key, and used that in a few places (mostly logging) where that is useful. o In many places, used std::move(function_variable) when assigning to an instance variable. This eliminates a very large number of excess std::function allocations/initializations (~56000 of the baseline allocations were related to std::function setup or cloning, and this is now down to ~11000 after this cl). o In the RPC-based remote workers (StubbyRemoteWorker and GrpcRemoteWorker), changed the code path in RecvTensorAsync to avoid creation of a std::function with 6 arguments unless necessary. There are three cases now handled separately: (a) We're not logging, and we didn't make a copy of the request that we need to free: just use the passed in 'StatusCallback done' object directly, without creating a wrapper std::function object at all (b) We're not logging, but we made a copy of the request that we need to free: we create a simple wrapper std::function that invokes the passed in 'done' callback, and then frees the req_copy request copy object. (c) We're logging: we create the std::function object with all the necessary state to log when the recv has finished. o Changed DeviceMgr::LookupDevice to take a StringPiece, rather than a const string&, and changed the hash table to use StringPiece keys. This allows clients that just have a StringPiece device name in their hand to avoid a string creation to lookup the Device* object. o Changed ExecutorState to use a specialized TaggedNodeReadyQueue that internally uses a gtl::InlinedVector<TaggedNode, 16>, rather than using a std::deque<TaggedNode> for keeping track of nodes ready to execute. This is faster because it avoids allocations entirely if the ready node queue doesn't get bigger than 16, and inlined vectors are generally faster than std::deque, at a minor risk of using more memory if this queue grows to very large numbers of ready nodes (mostly imaginable only in pathological graphs). o In ExecutorState::Process, allocated a single ExecutorState::AsyncState object to keep track of all the state we need to preserve for an asynchronously executed node, rather than keeping this state implicitly via a very large number of arguments to a lamda function. o Added new atomic std::atomic<bool> status_is_ok_ in BaseRemoteRendezvous. This allows us to avoid acquiring the lock when we just want to check if the status is non-OK in BaseRemoteRendezvous::Send and BaseRemoteRendezvous::ValidateDevices. o In GraphMgr::RunAllDone, changed assignment of args.runner to avoid one extra level of std::function indirection (binding the function directly to the ThreadPool::Schedule routine, rather than creating an intermediate lambda function that invokes this inside the body of the lambda. o Added freelist of RpcRecvTensorCall objects in third_party/tensorflow/core/distributed_runtime/rpc/rpc_rendezvous_mgr.cc o Changed third_party/tensorflow/core/framework/rendezvous.cc to keep the hashtable of Item* objects keyed by uint64 (hash of the tensor name), rather than the full-string tensor name. Collisions in the 64-bit hash space should basically never happen. o Sped up DeviceNameUtils::ParseFullName by optimizing for the common ordering of parts of /job, /replica, /task, /device. The parsing code was general enough to handle any order, but did so by comparing the prefixes 4, 3, 2, and 1 times, respectively, rather than 1, 1, 1, and 1 times. o Sped up DeviceNameUtils::SplitDeviceName to avoid extra string copies. Change: 125991891
* Update copyright for 3p/tf/core.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123900938
* Added more streamlined interfaces for converting rendezvous idsGravatar A. Unique TensorFlower2016-02-29
| | | | | | | to/from strings and used these in the rendezvous code. Improves performance for ptb_word_lm slightly (saves several allocations and an sscanf per CPU <-> GPU transfer). Change: 115852277
* TensorFlow: fix more int -> size_t warningsGravatar Vijay Vasudevan2016-01-26
| | | | Change: 113089380
* Global search & replace to move to the new location forGravatar Josh Levenberg2016-01-26
| | | | | tensorflow/core/ files and build targets. Change: 113080064
* Move #include <vector> out of port.h to users of std::vector<>.Gravatar Josh Levenberg2016-01-21
| | | | | After this we can replace port.h with types.h. Change: 112727463
* #include "tensorflow/core/platform/mutex.h"Gravatar Josh Levenberg2016-01-07
| | | | | directly so we can drop it from port.h. Change: 111613643
* #include third_party/tensorflow/core/platform/macros.hGravatar Josh Levenberg2016-01-06
| | | | | directly so we can drop it from port.h. Change: 111528649
* Added 'logging' import to control_flow_ops which is used in the file but not ↵Gravatar A. Unique TensorFlower2016-01-05
| | | | | | imported. Change: 110842260
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: Doc and linter fixes, some additional tests andGravatar Vijay Vasudevan2015-11-16
| | | | | | | | | | | | | | error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254
* TensorFlow: Minor updates to docs, BUILD, GPU config / perf, etc.Gravatar Vijay Vasudevan2015-11-12
| | | | | | | | | | | | | | | | | | Changes: - Updates to op documentation and index by Josh - More changes to BUILD files for python 3 support by @girving - Fix to Eigen to use DenseIndex everywhere by @jiayq - Enable configuration for cuda compute capability by @zheng-xq, including updates to docs. - Route aggregation method through optimizer by schuster - Updates to install instructions for bazel 0.1.1. Base CL: 107702099
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108