Switch the softmax to use the new deterministic reductions on the GPU,

results in a speed up of 10-40x on the existing ImageNet benchmarks and 2-3x on the newly added transformer benchmarks. Update the benchmark to also run on the GPU. Remove duplicate cpu tests. PiperOrigin-RevId: 168596693
author: A. Unique TensorFlower <gardener@tensorflow.org> 2017-09-13 14:32:25 -0700
committer: TensorFlower Gardener <gardener@tensorflow.org> 2017-09-13 14:36:05 -0700
commit: d6f9d6109474a9162ef4d99520a2d4ef0becfb14 (patch)
tree: 697b9ef4b78b4ab570d7f1a074219844813cc9ed /tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc
parent: f445958edbca3ad292c9ed8c9de0c7e047b1d2bd (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc b/tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc
index 3e7a33ba3f..79ec1d59df 100644
--- a/tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc
+++ b/tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc
@@ -17,7 +17,7 @@ limitations under the License.
 
 #define EIGEN_USE_GPU
 
-#include "tensorflow/core/kernels/reduction_ops_gpu_kernels.h"
+#include "tensorflow/core/kernels/reduction_gpu_kernels.cu.h"
 
 namespace tensorflow {
 namespace functor {
author	A. Unique TensorFlower <gardener@tensorflow.org>	2017-09-13 14:32:25 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-09-13 14:36:05 -0700
commit	d6f9d6109474a9162ef4d99520a2d4ef0becfb14 (patch)
tree	697b9ef4b78b4ab570d7f1a074219844813cc9ed /tensorflow/core/kernels/reduction_ops_gpu_bool.cu.cc
parent	f445958edbca3ad292c9ed8c9de0c7e047b1d2bd (diff)