tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Use cub::ReduceByKey to count partition indices (#14665)	codrut3	2017-11-20
\| \| \| \| \| \| \| \| \| \|	* Use cub::ReduceByKey to count how many times each partition index appears. This implements a suggestion by @ekelsen. It replaces the previously custom-made counting method and is likely more efficient. * Remove CubReduceAdd and use instead cub::Sum.
*	Automated g4 rollback of changelist 166396680	A. Unique TensorFlower	2017-09-01
\| \| \| \|	PiperOrigin-RevId: 167333886
*	Automated g4 rollback of changelist 166294015	A. Unique TensorFlower	2017-08-24
\| \| \| \|	PiperOrigin-RevId: 166396680
*	Automated g4 rollback of changelist 165887626	A. Unique TensorFlower	2017-08-23
\| \| \| \|	PiperOrigin-RevId: 166294015
*	Automated g4 rollback of changelist 165773305	A. Unique TensorFlower	2017-08-20
\| \| \| \|	PiperOrigin-RevId: 165887626
*	Switch implementation of tf.reduce_* to new implementations that are	A. Unique TensorFlower	2017-08-18
	deterministic and mostly faster (some cases use the CUB library). Speed for the types bool, std::complex<float>, std::complex<double>, Eigen::half is anywhere from 50-1000x faster. For floats, speed for row and column reductions increases by ~20-100%. For floats, performance on reductions to a scalar is performance neutral. On P100 the float reduction to scalar is slightly slower. Mean reductions are more numerically stable - by doing the division after the sum instead of before, we avoid changing mantissa bits which can introduce a bias. Accurate mean reductions for half (in addition to fast) - the reduction is done internally in fp32 and cast just before the final write. Also switch the l2loss op to use the new reductions as a demonstration of how to use them in code. Replacement of other major users of Eigen's reductions will follow to enable the most common training cases to be deterministic wrt reductions. PiperOrigin-RevId: 165773305