tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Branch 183429339 (#16469)	Rasmus Munk Larsen	2018-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Change `reduce_logsumexp` to internally use `reshape` rather than `squeeze` since the latter requires the `axis` arg to be a Python `list`. PiperOrigin-RevId: 183396533 * Kernel utils to support broadcast add and mul. PiperOrigin-RevId: 183397494 * Updating sparsify_gather. PiperOrigin-RevId: 183402917 * [tf.data] Move slow-path-related code into the slow path in IteratorHandleOp::Compute(). This slightly reduces the amount of work performed when an iterator is accessed (after the first access), and potentially reduces contention if concurrent steps are accessing the same iterator. PiperOrigin-RevId: 183406221 * Cleanup: Ran clang-format on all .{cc,h} in under grappler. PiperOrigin-RevId: 183406440 Increase shard count of //third_party/tensorflow/python:nn_batchnorm_test to avoid timeouts When run under asan, the test runs for about 5 minutes, and sometimes longer, causing frequent timeouts. This change increases the shard count of the test to 4, which brings the run time of the longest running shard under asan to about 2 minutes. PiperOrigin-RevId: 183414888 * Add available choices to toco flags and fix minor formatting issues. PiperOrigin-RevId: 183415713 * Performance improvements to some GPU code to use shared locks instead of unique locks for some hotspot cases. PiperOrigin-RevId: 183418559 * [XLA] Improve error message for bad slices. PiperOrigin-RevId: 183420038 * Fix py3 build rules for all py tests under py2tf. PiperOrigin-RevId: 183422144 * Fix bug with Operation._control_inputs setter. PiperOrigin-RevId: 183422192 * Make softmax_op_test.py work with C API enabled. PiperOrigin-RevId: 183422829 * Cleanup: Ran clang-format on all .{cc,h} files in tensorflow/core/kernels. PiperOrigin-RevId: 183423961 Fix the documentation for the dense layer for how rank > 2 inputs are handled. PiperOrigin-RevId: 183425868 * Cleanup: Ran clang-format on all *.{cc,h} in tensorflow/core/ops. PiperOrigin-RevId: 183429339
*	Fix metagemm calls in quantized ops. Only use metagemm multiplication for	A. Unique TensorFlower	2017-07-18
\| \| \| \| \| \|	k <= 2048. PiperOrigin-RevId: 162356878
*	Fix metagemm quantization offsets in matmul.	A. Unique TensorFlower	2017-03-02
\| \| \| \|	Change: 149021831
*	Mark gemmlowp result as initialized.	Patrick Nguyen	2017-01-04
\| \| \| \|	Change: 143563367
*	Define GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK to allow building with	A. Unique TensorFlower	2016-12-22
\| \| \| \| \| \|	latest gemmlowp, which makes it an error by default to build without SSE4.1 on x86 or NEON on ARM. Change: 142758203
*	Arm32/64 kernel optimizations:	A. Unique TensorFlower	2016-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- QuantizeV2 - Dequantize - QuantizedBiasAdd - QuantizeDownAndShrinkRange - QuantizedRelu - QuantizedRelu6 - QuantizedMatMul - QuantizedConv The optimizations are controled by three knobs: meta::SetEnabled(bool) -- turns codepath on/off, on by default meta::SetUseLocalContext(bool) -- true -- codepath will use it's own internal fine grain workers pool that offers performance improvement over the standard tensorflow worker pool. This workers pool is not compatible with other ops. Per use-case performance testing recommended. -- false (default) -- use the standard tf worker pool instance meta::SetNumThreads(int) -- no. of compute threads when the internal worker pool is used. If 0 use intra_parallelism_count, if x > 0 then x threads. Change: 137448955
*	Automated rollback of change 137197327	A. Unique TensorFlower	2016-10-25
\| \| \| \|	Change: 137225083
*	Arm32/64 kernel optimizations:	A. Unique TensorFlower	2016-10-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- QuantizeV2 - Dequantize - QuantizedBiasAdd - QuantizeDownAndShrinkRange - QuantizedRelu - QuantizedRelu6 - QuantizedMatMul - QuantizedConv The optimizations are controled by three knobs: meta::SetEnabled(bool) -- turns codepath on/off, on by default meta::SetUseLocalContext(bool) -- true -- codepath will use it's own internal fine grain workers pool that offers performance improvement over the standard tensorflow worker pool. This workers pool is not compatible with other ops. Per use-case performance testing recommended. -- false (default) -- use the standard tf worker pool instance meta::SetNumThreads(int) -- no. of compute threads when the internal worker pool is used. If 0 use intra_parallelism_count, if x > 0 then x threads. Change: 137197327
*	Move contrib/quantization ops to tensorflow/core	Andrew Harp	2016-10-17
\| \| \| \|	Change: 136410307
*	Automated rollback of change 134501895	A. Unique TensorFlower	2016-09-28
\| \| \| \|	Change: 134506649
*	Move contrib/quantization ops to tensorflow/core	Andrew Harp	2016-09-28
	Change: 134501895