aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/framework
Commit message (Collapse)AuthorAge
* Allow the executor type for a function to be specified as an attr on a function.Gravatar Derek Murray2018-10-10
| | | | | | | | | | | This change complements the existing `InstantiateOptions::executor_type` option, which takes precedence over the attr if both are provided. It enables the choice of executor to be separated from both the calling op implementation and the function definition, which simplifies the use of custom executors in operations that take a function as an attr (e.g.) `tf.data` and the functional control-flow ops. PiperOrigin-RevId: 216532778
* Add 'remove' operation to MutableHashTable and MutableDenseHashTable.Gravatar A. Unique TensorFlower2018-10-09
| | | | PiperOrigin-RevId: 216443201
* [tf.data] NUMA-aware MapAndBatch dataset.Gravatar Brennan Saeta2018-10-09
| | | | PiperOrigin-RevId: 216395709
* Partial support tfe.defun in tf.gradients.Gravatar Alexandre Passos2018-10-08
| | | | | | | | Doesn't attempt to deal with cases where we might have already generated the functiondef for the parent function as in that case we cannot easily modify the forward pass. PiperOrigin-RevId: 216243224
* [data-stats] Sets user given `tag` and `counter_prefix` with ↵Gravatar Shivani Agrawal2018-10-03
| | | | | | | | `set_stats_aggregator`. `tag` would get prep-end with all the statistics recorded as summary and `counter_prefix` would set the prefix for the statistics recorded as counter. Note: `counter` defaults to `\tensorflow`, and `tag` and `prefix` gets associated with the dataset (not the stats_aggregator). PiperOrigin-RevId: 215609159
* [tf.data] More robust solution for input pipeline <--> performance model ↵Gravatar Jiri Simsa2018-10-01
| | | | | | coordination. PiperOrigin-RevId: 215309735
* Bunch of micro move optimizationsGravatar Piotr Padlewski2018-09-28
| | | | PiperOrigin-RevId: 215018984
* Add documentation of the ownership semantics to ↵Gravatar Derek Murray2018-09-28
| | | | | | {Lookup,Create,LookupOrCreate}Resource(). PiperOrigin-RevId: 215008650
* Handle noinline gradient function in control flow functionalization.Gravatar Tong Shen2018-09-28
| | | | PiperOrigin-RevId: 215003704
* Introduce the abstraction of RunHandler which each DirectSession can use forGravatar A. Unique TensorFlower2018-09-28
| | | | | | | | | the duration of a single RunInternal() call from RunHandlerPool. It is used for running inter-op closures with a global scheduler (which in the future) to improve both median and tail latency (for use-cases like CPU inference). In the case that global pools aren't used, this change should be a no-op. PiperOrigin-RevId: 214992852
* Automated rollback of commit 750466c6e6624d279de7f9a43accd682d487509cGravatar Revan Sopher2018-09-27
| | | | PiperOrigin-RevId: 214853846
* Introduce the abstraction of RunHandler which each DirectSession can use forGravatar A. Unique TensorFlower2018-09-27
| | | | | | | | | the duration of a single RunInternal() call from RunHandlerPool. We want to leverage this abstraction for improving the cross-session inter-op parallelism for lower latency inference in the future. In the case that global pools aren't used, this change should be a no-op. PiperOrigin-RevId: 214818187
* Merge pull request #22076 from Intel-tensorflow:feature/daoxin/sliceGravatar TensorFlower Gardener2018-09-26
|\ | | | | | | PiperOrigin-RevId: 214726180
* | [TF] Add new internal ops _VarHandlesOp and _ReadVariablesOp.Gravatar Peter Hawkins2018-09-26
| | | | | | | | | | | | | | | | The purpose of these ops is to fix a latency problem observed for an inference benchmark. Often a inference step starts by reading the value of many (hundreds) of weights. For a resource variable, this requires a VarHandleOp and a ReadVariableOp per variable. Running hundreds of trivial ops can add hundreds of microseconds of latency to the critical path of an inference step. The inter-op latency of the executor can be hundreds of nanoseconds, which rapidly adds up. This change introduces two fused ops _VarHandlesOp and _ReadVariablesOp that allow us to read many variables in a pair of larger ops, rather than many tiny ops. PiperOrigin-RevId: 214662338
* | Hoisting RandomUniform out of functionsGravatar Piotr Padlewski2018-09-26
| | | | | | | | | | | | | | This patch introduces optimization that hoists RandomUniform out of map functions. By doing it, we make function stateless, which is crucial for parallelization and vectorization. PiperOrigin-RevId: 214623178
* | Allow subslicing Tensors with a single dimension.Gravatar A. Unique TensorFlower2018-09-25
| | | | | | | | PiperOrigin-RevId: 214553359
* | Add functionality to SubSlice a tensor.Gravatar A. Unique TensorFlower2018-09-24
| | | | | | | | PiperOrigin-RevId: 214295534
* | Fix typo.Gravatar A. Unique TensorFlower2018-09-21
| | | | | | | | PiperOrigin-RevId: 213990950
| * Fix clang-format styles.Gravatar Pan Daoxin2018-09-21
| |
| * More changes.Gravatar Pan Daoxin2018-09-21
| |
* | [tf.data] Moving auto-tuning optimizations into a background thread, ↵Gravatar Jiri Simsa2018-09-20
| | | | | | | | | | | | refactoring the API for exposing tunable parameters, and removing `model::Node` from the public API. PiperOrigin-RevId: 213907565
* | [tf.data] Some vectorization cleanupGravatar Rachel Lim2018-09-20
| | | | | | | | PiperOrigin-RevId: 213886813
* | Internal change.Gravatar A. Unique TensorFlower2018-09-20
| | | | | | | | PiperOrigin-RevId: 213770000
* | This CL adds a new `tf.print` operator that more closely aligns with the ↵Gravatar A. Unique TensorFlower2018-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0). It follows the design doc specified in https://github.com/tensorflow/community/pull/14 and additionally incorporates the community feedback and design review decisions. This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level. The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed. As compared to the previous print function, the new print function: - Has an API that more closely aligns with the standard python3 print - Supports changing the print logging level/output stream - allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors - support printing sparse tensors - changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension). PiperOrigin-RevId: 213709924
* | Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions.Gravatar A. Unique TensorFlower2018-09-19
| | | | | | | | PiperOrigin-RevId: 213693027
| * Move location of Slice shape function.Gravatar Pan Daoxin2018-09-19
| |
* | Eliminate VisitableAllocator.Gravatar A. Unique TensorFlower2018-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213505655
* | Automated rollback of commit 185aa89912376d4088c22615908696cd30f9951bGravatar A. Unique TensorFlower2018-09-17
| | | | | | | | PiperOrigin-RevId: 213394522
* | [tf.data] Fixing an error in the optimization loop.Gravatar Jiri Simsa2018-09-17
| | | | | | | | PiperOrigin-RevId: 213386401
* | Eliminate VisitableAllocator.Gravatar A. Unique TensorFlower2018-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213371553
* | Changing `OpInputList` so that it is a forward iterator and taking advantage ↵Gravatar Jiri Simsa2018-09-17
| | | | | | | | | | | | of the fact in the tf.data kernels. PiperOrigin-RevId: 213361953
* | Fix GraphConstructor and import_graph_def bug with variadic ops.Gravatar Skye Wanderman-Milne2018-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change, GraphConstructor::PopulateMissingUnusedInputMapKey() didn't correctly compute the number of outputs for ops with variadic outputs. This meant that missing_unused_input_map_keys could contain spurious entries for unused variadic outputs, which could trigger a ValueError in import_graph_def. This also adds a new util method in node_def_util.h, NumOutputsForNode(). PiperOrigin-RevId: 213353158
* | [tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the ↵Gravatar Jiri Simsa2018-09-17
| | | | | | | | | | | | | | | | `num_parallel_calls` argument of `tf.data.Dataset.map()`, `tf.data.Dataset.interleave()`, and `tf.contrib.data.map_and_batch()`. When `tf.data.AUTOTUNE` is specified, the level of parallelism is determined at runtime. The underlying mechanism instruments the input pipeline to build a performance model and then uses the model to find the optimal values for the parallelism knobs. PiperOrigin-RevId: 213283297
* | [tf.data] Introducing an optimization that parallelizes map transformations.Gravatar Piotr Padlewski2018-09-14
| | | | | | | | | | | | | | | | Stateless MapDatasets can be paralellized by switching to ParallelMapDataset. We set `num_parallel_calls` to 2 for now, but in the future a special value will be used that result in the optimal value to be selected dynamically at runtime. This patch also exposed a memory leak which was fixed. PiperOrigin-RevId: 213015223
* | Mark the ResourceHandleOp as inexpensive.Gravatar Derek Murray2018-09-12
| | | | | | | | | | | | Previously, we would schedule a closure for each ResourceHandleOp, because it is erroneously considered to be "expensive". This would cost several microseconds per op, whereas the execution cost of this kernel is as little as 100ns. This change causes these kernels to execute inline at the beginning of a step. PiperOrigin-RevId: 212712378
* | Roll forward change "Move control flow functionalization as a graph ↵Gravatar Tong Shen2018-09-12
| | | | | | | | | | | | optimization pass, instead of a step in XlaCompiler.". PiperOrigin-RevId: 212657932
* | Change HandleFromInput() to return a `const ResourceHandle&` and avoid ↵Gravatar Derek Murray2018-09-12
| | | | | | | | | | | | | | | | copying that type. This avoids unnecessary string copies and deallocations in the ReadVariableOp, and similar ops. PiperOrigin-RevId: 212652588
* | [tf.data] Mechanism for collecting processing time information and modeling ↵Gravatar Jiri Simsa2018-09-11
| | | | | | | | | | | | performance. PiperOrigin-RevId: 212557406
* | [TF] Variant improvements.Gravatar Eugene Brevdo2018-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Change Variant Decode to accept VariantTensorData (non-ref). This should allow some optimization in the future. In the meantime it means removing the variant.h include from tensor.h, since variant_encode_decode.h now relies on tensor.h and variant.h now relies on that. It also means we found a bunch of places where tensor.proto.h, variant.h, and mutex.h were being imported through tensor.h (along with a bunch of other crap); so now we directly import them in order to compile. 2. Move Variant registry to use TypeIndex instead of a TypeName string; this should speed up registry lookups. PiperOrigin-RevId: 212478896
* | Add experimental grappler plugin to selection function implementation at run ↵Gravatar Scott Zhu2018-09-10
| | | | | | | | | | | | time. PiperOrigin-RevId: 212321238
* | Automated rollback of commit a3776a234f555213aafcf41f49a42a8a9448c4acGravatar Tong Shen2018-09-09
| | | | | | | | PiperOrigin-RevId: 212182923
* | Move control flow functionalization as a graph optimization pass, instead of ↵Gravatar Tong Shen2018-09-09
| | | | | | | | | | | | a step in XlaCompiler. PiperOrigin-RevId: 212164482
* | Fixed small nits in WhitelistedStatefulOpRegistryGravatar Piotr Padlewski2018-09-07
| | | | | | | | | | | | StringPiece has been changed to string to avoid static order destruction fiasco (we store pointers that might have shorter lifetime) and also to use unordered_set (there is hash specialization for StringPiece). PiperOrigin-RevId: 212059185
* | [tf.data] Move all C++ code inside the `tensorflow::data` namespace.Gravatar Derek Murray2018-09-05
|/ | | | PiperOrigin-RevId: 211733735
* [tf.data] Avoiding serialization of (potentially large) tensors during ↵Gravatar Jiri Simsa2018-08-31
| | | | | | optimization. PiperOrigin-RevId: 211179990
* Add lowering pass for functional While op.Gravatar Saurabh Saxena2018-08-30
| | | | | | | This will allow the functional tf.while_loop proposed in https://github.com/tensorflow/community/pull/13 to achieve feature parity with the current implementation. Lowering is performed only when the "_lower_using_switch_merge" attr is set to True. PiperOrigin-RevId: 210956432
* Remove (Mutable)ArraySlice implementation and alias them to absl::Span.Gravatar Tim Shen2018-08-30
| | | | | | | | There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
* Removed redundant std::string -> string conversions.Gravatar A. Unique TensorFlower2018-08-28
| | | | PiperOrigin-RevId: 210565027
* [tf.data] Enable optimizations for input pipelines with stateful functions.Gravatar Jiri Simsa2018-08-28
| | | | PiperOrigin-RevId: 210559796
* Refactor collectives to colocate implementation-specific code.Gravatar Ayush Dubey2018-08-27
| | | | | | | | | | | | | | Before this change, introducing a new collective algorithm required touching multiple files. CollectiveParams setup was in common_runtime/collective_param_resolver_local, and the data movement was in common_runtime/reducer and common_runtime/broadcaster. This change introduces CollectiveImplementationInterface. CollectiveImplementationInterface brings together param initialization and data movement for a collective algorithm. Every collective implementation will implement this interface and override the virtual methods. This should hopefully reduce obscurity and lead to code with fewer dependencies. PiperOrigin-RevId: 210430157