tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Allow the executor type for a function to be specified as an attr on a function.	Derek Murray	2018-10-10
\| \| \| \| \| \| \| \| \| \| \|	This change complements the existing `InstantiateOptions::executor_type` option, which takes precedence over the attr if both are provided. It enables the choice of executor to be separated from both the calling op implementation and the function definition, which simplifies the use of custom executors in operations that take a function as an attr (e.g.) `tf.data` and the functional control-flow ops. PiperOrigin-RevId: 216532778
*	Add 'remove' operation to MutableHashTable and MutableDenseHashTable.	A. Unique TensorFlower	2018-10-09
\| \| \| \|	PiperOrigin-RevId: 216443201
*	[tf.data] NUMA-aware MapAndBatch dataset.	Brennan Saeta	2018-10-09
\| \| \| \|	PiperOrigin-RevId: 216395709
*	Partial support tfe.defun in tf.gradients.	Alexandre Passos	2018-10-08
\| \| \| \| \| \| \| \|	Doesn't attempt to deal with cases where we might have already generated the functiondef for the parent function as in that case we cannot easily modify the forward pass. PiperOrigin-RevId: 216243224
*	[data-stats] Sets user given `tag` and `counter_prefix` with ↵	Shivani Agrawal	2018-10-03
\| \| \| \| \| \| \| \|	`set_stats_aggregator`. `tag` would get prep-end with all the statistics recorded as summary and `counter_prefix` would set the prefix for the statistics recorded as counter. Note: `counter` defaults to `\tensorflow`, and `tag` and `prefix` gets associated with the dataset (not the stats_aggregator). PiperOrigin-RevId: 215609159
*	[tf.data] More robust solution for input pipeline <--> performance model ↵	Jiri Simsa	2018-10-01
\| \| \| \| \| \|	coordination. PiperOrigin-RevId: 215309735
*	Bunch of micro move optimizations	Piotr Padlewski	2018-09-28
\| \| \| \|	PiperOrigin-RevId: 215018984
*	Add documentation of the ownership semantics to ↵	Derek Murray	2018-09-28
\| \| \| \| \| \|	{Lookup,Create,LookupOrCreate}Resource(). PiperOrigin-RevId: 215008650
*	Handle noinline gradient function in control flow functionalization.	Tong Shen	2018-09-28
\| \| \| \|	PiperOrigin-RevId: 215003704
*	Introduce the abstraction of RunHandler which each DirectSession can use for	A. Unique TensorFlower	2018-09-28
\| \| \| \| \| \| \| \| \|	the duration of a single RunInternal() call from RunHandlerPool. It is used for running inter-op closures with a global scheduler (which in the future) to improve both median and tail latency (for use-cases like CPU inference). In the case that global pools aren't used, this change should be a no-op. PiperOrigin-RevId: 214992852
*	Automated rollback of commit 750466c6e6624d279de7f9a43accd682d487509c	Revan Sopher	2018-09-27
\| \| \| \|	PiperOrigin-RevId: 214853846
*	Introduce the abstraction of RunHandler which each DirectSession can use for	A. Unique TensorFlower	2018-09-27
\| \| \| \| \| \| \| \| \|	the duration of a single RunInternal() call from RunHandlerPool. We want to leverage this abstraction for improving the cross-session inter-op parallelism for lower latency inference in the future. In the case that global pools aren't used, this change should be a no-op. PiperOrigin-RevId: 214818187
*	Merge pull request #22076 from Intel-tensorflow:feature/daoxin/slice	TensorFlower Gardener	2018-09-26
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 214726180
* \|	[TF] Add new internal ops _VarHandlesOp and _ReadVariablesOp.	Peter Hawkins	2018-09-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of these ops is to fix a latency problem observed for an inference benchmark. Often a inference step starts by reading the value of many (hundreds) of weights. For a resource variable, this requires a VarHandleOp and a ReadVariableOp per variable. Running hundreds of trivial ops can add hundreds of microseconds of latency to the critical path of an inference step. The inter-op latency of the executor can be hundreds of nanoseconds, which rapidly adds up. This change introduces two fused ops _VarHandlesOp and _ReadVariablesOp that allow us to read many variables in a pair of larger ops, rather than many tiny ops. PiperOrigin-RevId: 214662338
* \|	Hoisting RandomUniform out of functions	Piotr Padlewski	2018-09-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces optimization that hoists RandomUniform out of map functions. By doing it, we make function stateless, which is crucial for parallelization and vectorization. PiperOrigin-RevId: 214623178
* \|	Allow subslicing Tensors with a single dimension.	A. Unique TensorFlower	2018-09-25
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 214553359
* \|	Add functionality to SubSlice a tensor.	A. Unique TensorFlower	2018-09-24
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 214295534
* \|	Fix typo.	A. Unique TensorFlower	2018-09-21
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213990950
\| *	Fix clang-format styles.	Pan Daoxin	2018-09-21
\| \|
\| *	More changes.	Pan Daoxin	2018-09-21
\| \|
* \|	[tf.data] Moving auto-tuning optimizations into a background thread, ↵	Jiri Simsa	2018-09-20
\| \| \| \| \| \| \| \| \| \| \| \|	refactoring the API for exposing tunable parameters, and removing `model::Node` from the public API. PiperOrigin-RevId: 213907565
* \|	[tf.data] Some vectorization cleanup	Rachel Lim	2018-09-20
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213886813
* \|	Internal change.	A. Unique TensorFlower	2018-09-20
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213770000
* \|	This CL adds a new `tf.print` operator that more closely aligns with the ↵	A. Unique TensorFlower	2018-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0). It follows the design doc specified in https://github.com/tensorflow/community/pull/14 and additionally incorporates the community feedback and design review decisions. This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level. The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed. As compared to the previous print function, the new print function: - Has an API that more closely aligns with the standard python3 print - Supports changing the print logging level/output stream - allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors - support printing sparse tensors - changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension). PiperOrigin-RevId: 213709924
* \|	Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions.	A. Unique TensorFlower	2018-09-19
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213693027
\| *	Move location of Slice shape function.	Pan Daoxin	2018-09-19
\| \|
* \|	Eliminate VisitableAllocator.	A. Unique TensorFlower	2018-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213505655
* \|	Automated rollback of commit 185aa89912376d4088c22615908696cd30f9951b	A. Unique TensorFlower	2018-09-17
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213394522
* \|	[tf.data] Fixing an error in the optimization loop.	Jiri Simsa	2018-09-17
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 213386401
* \|	Eliminate VisitableAllocator.	A. Unique TensorFlower	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The visitor pattern is used to allow pre-registration of memory for DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking. The VisitableAllocator interface was introduced to support this use some time ago, prior to SubAllocators. Memory registration works best if it's done infrequently, on large pieces of memory, rather than on every piece that's dynamically allocated/freed. This usage pattern fits the SubAllocator better than a general Allocator. This change moves memory allocation visitor access to SubAllocator and eliminates the VisitableAllocator subclass of Allocator. This change also more rigorously enforces the requirement that all Visitors be declared prior to memory allocation begining. This is accomplished by requiring that Visitors be provided to the SubAllocator constructor. This refactoring will ease an upcoming CL introducing NUMA specific CPU devices. It also should fix some performance pitfalls (e.g. accidental use of PoolAllocator) introduced by an earlier refactoring of ProcessState that was also in preparation for NUMA. It restores the default use of the cpu_allocator() value (i.e. no SubAllocator) by model executions that don't use allocation visitors (since visitor registration must precede the first allocation, hence can be detected at that time). PiperOrigin-RevId: 213371553
* \|	Changing `OpInputList` so that it is a forward iterator and taking advantage ↵	Jiri Simsa	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \|	of the fact in the tf.data kernels. PiperOrigin-RevId: 213361953
* \|	Fix GraphConstructor and import_graph_def bug with variadic ops.	Skye Wanderman-Milne	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this change, GraphConstructor::PopulateMissingUnusedInputMapKey() didn't correctly compute the number of outputs for ops with variadic outputs. This meant that missing_unused_input_map_keys could contain spurious entries for unused variadic outputs, which could trigger a ValueError in import_graph_def. This also adds a new util method in node_def_util.h, NumOutputsForNode(). PiperOrigin-RevId: 213353158
* \|	[tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the ↵	Jiri Simsa	2018-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`num_parallel_calls` argument of `tf.data.Dataset.map()`, `tf.data.Dataset.interleave()`, and `tf.contrib.data.map_and_batch()`. When `tf.data.AUTOTUNE` is specified, the level of parallelism is determined at runtime. The underlying mechanism instruments the input pipeline to build a performance model and then uses the model to find the optimal values for the parallelism knobs. PiperOrigin-RevId: 213283297
* \|	[tf.data] Introducing an optimization that parallelizes map transformations.	Piotr Padlewski	2018-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stateless MapDatasets can be paralellized by switching to ParallelMapDataset. We set `num_parallel_calls` to 2 for now, but in the future a special value will be used that result in the optimal value to be selected dynamically at runtime. This patch also exposed a memory leak which was fixed. PiperOrigin-RevId: 213015223
* \|	Mark the ResourceHandleOp as inexpensive.	Derek Murray	2018-09-12
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, we would schedule a closure for each ResourceHandleOp, because it is erroneously considered to be "expensive". This would cost several microseconds per op, whereas the execution cost of this kernel is as little as 100ns. This change causes these kernels to execute inline at the beginning of a step. PiperOrigin-RevId: 212712378
* \|	Roll forward change "Move control flow functionalization as a graph ↵	Tong Shen	2018-09-12
\| \| \| \| \| \| \| \| \| \| \| \|	optimization pass, instead of a step in XlaCompiler.". PiperOrigin-RevId: 212657932
* \|	Change HandleFromInput() to return a `const ResourceHandle&` and avoid ↵	Derek Murray	2018-09-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copying that type. This avoids unnecessary string copies and deallocations in the ReadVariableOp, and similar ops. PiperOrigin-RevId: 212652588
* \|	[tf.data] Mechanism for collecting processing time information and modeling ↵	Jiri Simsa	2018-09-11
\| \| \| \| \| \| \| \| \| \| \| \|	performance. PiperOrigin-RevId: 212557406
* \|	[TF] Variant improvements.	Eugene Brevdo	2018-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Change Variant Decode to accept VariantTensorData (non-ref). This should allow some optimization in the future. In the meantime it means removing the variant.h include from tensor.h, since variant_encode_decode.h now relies on tensor.h and variant.h now relies on that. It also means we found a bunch of places where tensor.proto.h, variant.h, and mutex.h were being imported through tensor.h (along with a bunch of other crap); so now we directly import them in order to compile. 2. Move Variant registry to use TypeIndex instead of a TypeName string; this should speed up registry lookups. PiperOrigin-RevId: 212478896
* \|	Add experimental grappler plugin to selection function implementation at run ↵	Scott Zhu	2018-09-10
\| \| \| \| \| \| \| \| \| \| \| \|	time. PiperOrigin-RevId: 212321238
* \|	Automated rollback of commit a3776a234f555213aafcf41f49a42a8a9448c4ac	Tong Shen	2018-09-09
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 212182923
* \|	Move control flow functionalization as a graph optimization pass, instead of ↵	Tong Shen	2018-09-09
\| \| \| \| \| \| \| \| \| \| \| \|	a step in XlaCompiler. PiperOrigin-RevId: 212164482
* \|	Fixed small nits in WhitelistedStatefulOpRegistry	Piotr Padlewski	2018-09-07
\| \| \| \| \| \| \| \| \| \| \| \|	StringPiece has been changed to string to avoid static order destruction fiasco (we store pointers that might have shorter lifetime) and also to use unordered_set (there is hash specialization for StringPiece). PiperOrigin-RevId: 212059185
* \|	[tf.data] Move all C++ code inside the `tensorflow::data` namespace.	Derek Murray	2018-09-05
\|/ \| \| \|	PiperOrigin-RevId: 211733735
*	[tf.data] Avoiding serialization of (potentially large) tensors during ↵	Jiri Simsa	2018-08-31
\| \| \| \| \| \|	optimization. PiperOrigin-RevId: 211179990
*	Add lowering pass for functional While op.	Saurabh Saxena	2018-08-30
\| \| \| \| \| \| \|	This will allow the functional tf.while_loop proposed in https://github.com/tensorflow/community/pull/13 to achieve feature parity with the current implementation. Lowering is performed only when the "_lower_using_switch_merge" attr is set to True. PiperOrigin-RevId: 210956432
*	Remove (Mutable)ArraySlice implementation and alias them to absl::Span.	Tim Shen	2018-08-30
\| \| \| \| \| \| \| \|	There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
*	Removed redundant std::string -> string conversions.	A. Unique TensorFlower	2018-08-28
\| \| \| \|	PiperOrigin-RevId: 210565027
*	[tf.data] Enable optimizations for input pipelines with stateful functions.	Jiri Simsa	2018-08-28
\| \| \| \|	PiperOrigin-RevId: 210559796
*	Refactor collectives to colocate implementation-specific code.	Ayush Dubey	2018-08-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, introducing a new collective algorithm required touching multiple files. CollectiveParams setup was in common_runtime/collective_param_resolver_local, and the data movement was in common_runtime/reducer and common_runtime/broadcaster. This change introduces CollectiveImplementationInterface. CollectiveImplementationInterface brings together param initialization and data movement for a collective algorithm. Every collective implementation will implement this interface and override the virtual methods. This should hopefully reduce obscurity and lead to code with fewer dependencies. PiperOrigin-RevId: 210430157