aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/kernels/batch_matmul_op.cc
Commit message (Collapse)AuthorAge
* Shard batch_matmul_op across files to speed up build.Gravatar A. Unique TensorFlower2016-09-30
| | | | Change: 134820471
* Speed up BatchMatMul when either side is a vector or batch of vectors.Gravatar A. Unique TensorFlower2016-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the same basic workaround as cl/128009436, namely to use the regular Eigen matmul kernel instead of the Eigen tensor contraction when not parallelizing the inner matrix products in BatchMatMul. Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2016-09-22T14:22:52.507929557-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_BatchMatmul_1_1_1024_1024_false_false_DT_FLOAT_cpu 179300 122010 +32.0% BM_BatchMatmul_2_1_1024_1024_false_false_DT_FLOAT_cpu 209037 162153 +22.4% BM_BatchMatmul_8_1_1024_1024_false_false_DT_FLOAT_cpu 906019 946502 -4.5% BM_BatchMatmul_32_1_1024_1024_false_false_DT_FLOAT_cpu 3814403 4473018 -17.3% BM_BatchMatmul_1_10000_200_1_false_false_DT_FLOAT_cpu 322285 252677 +21.6% BM_BatchMatmul_8_10000_200_1_false_false_DT_FLOAT_cpu 2370631 2028039 +14.5% BM_BatchMatmul_32_10000_200_1_false_false_DT_FLOAT_cpu 8994979 12904697 -43.5% BM_BatchMatmul_1_10000_200_1_true_false_DT_FLOAT_cpu 663253 223017 +66.4% BM_BatchMatmul_8_10000_200_1_true_false_DT_FLOAT_cpu 5731654 2266151 +60.5% BM_BatchMatmul_32_10000_200_1_true_false_DT_FLOAT_cpu 18692987 12063885 +35.5% BM_BatchMatmul_1_10000_200_1_false_true_DT_FLOAT_cpu 318234 251075 +21.1% BM_BatchMatmul_8_10000_200_1_false_true_DT_FLOAT_cpu 2355295 2032887 +13.7% BM_BatchMatmul_32_10000_200_1_false_true_DT_FLOAT_cpu 8997442 11618660 -29.1% BM_BatchMatmul_1_10000_200_1_true_true_DT_FLOAT_cpu 652865 225256 +65.5% BM_BatchMatmul_8_10000_200_1_true_true_DT_FLOAT_cpu 5700875 2383607 +58.2% BM_BatchMatmul_32_10000_200_1_true_true_DT_FLOAT_cpu 18957878 12451622 +34.3% BM_BatchMatmul_1_1_200_10000_false_false_DT_FLOAT_cpu 288420 226135 +21.6% BM_BatchMatmul_8_1_200_10000_false_false_DT_FLOAT_cpu 2155747 2406166 -11.6% BM_BatchMatmul_32_1_200_10000_false_false_DT_FLOAT_cpu 10031700 12248817 -22.1% BM_BatchMatmul_1_1_200_10000_true_false_DT_FLOAT_cpu 298456 226108 +24.2% BM_BatchMatmul_8_1_200_10000_true_false_DT_FLOAT_cpu 2096256 2409435 -14.9% BM_BatchMatmul_32_1_200_10000_true_false_DT_FLOAT_cpu 10259905 12408712 -20.9% BM_BatchMatmul_1_1_200_10000_false_true_DT_FLOAT_cpu 1657311 254414 +84.6% BM_BatchMatmul_8_1_200_10000_false_true_DT_FLOAT_cpu 5976722 2031486 +66.0% BM_BatchMatmul_32_1_200_10000_false_true_DT_FLOAT_cpu 23514286 11622619 +50.6% BM_BatchMatmul_1_1_200_10000_true_true_DT_FLOAT_cpu 1653482 250161 +84.9% BM_BatchMatmul_8_1_200_10000_true_true_DT_FLOAT_cpu 5951562 2032097 +65.9% BM_BatchMatmul_32_1_200_10000_true_true_DT_FLOAT_cpu 23587247 11633259 +50.7% Change: 134814248
* Parallelize inner matrix multiplications of BatchMatMul on CPU when appropriate.Gravatar A. Unique TensorFlower2016-09-23
| | | | | | | | | * Uses simple heuristics to choose between parallelizing outer (batch), inner (matmul) or both. * Adds benchmarks for BatchMatMul. * Switches matmul benchmark to use real time so GFlops reported are w.r.t. walltime and measure the effect of multi-threading. * Refactors the code to avoid a lot of extra instantiations of the Eigen contraction code, which bloats code size (size of test binary drop from 9.3MB to 5.5MB) and compilation time (time binary compilation time drop from ~210 to ~75 seconds). * Fixes bug in cost_per_unit calculation. The old code calculated B*M*N instead of M*N*K. Change: 134138821
* Automated rollback of change 134025273Gravatar A. Unique TensorFlower2016-09-22
| | | | Change: 134037266
* Parallelize inner matrix multiplications of BatchMatMul on CPU when appropriate.Gravatar A. Unique TensorFlower2016-09-22
| | | | | | | | * Uses simple heuristics to choose between parallelizing outer (batch), inner (matmul) or both. * Adds benchmarks for BatchMatMul. * Switches matmul benchmark to use real time so GFlops reported are w.r.t. walltime and measure the effect of multi-threading. * Fixes bug in cost_per_unit calculation. The old code calculated B*M*N instead of M*N*K. Change: 134025273
* Support half floats in BatchMatMul.Gravatar A. Unique TensorFlower2016-09-20
| | | | | Clean up tests and extend coverage to all supported types. Change: 133766358
* Enable complex GPU kernels for tf.matmul and tf.batch_matmul.Gravatar RJ Ryan2016-07-13
| | | | Change: 127386123
* Merge changes from github.Gravatar Martin Wicke2016-06-03
| | | | Change: 124012080
* Update copyright for 3p/tf/core.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123900938
* Rollback of Rollback of "Add MultivariateNormal to tf.contrib.distributions."Gravatar Eugene Brevdo2016-04-20
| | | | | Also fix overly stringent constraints on batchwise linalg ops & batch_matmul. Change: 120387462
* Rollback of "Add MultivariateNormal to tf.contrib.distributions."Gravatar A. Unique TensorFlower2016-04-19
| | | | | Also fix overly stringent constraints on batchwise linalg ops & batch_matmul. Change: 120289843
* Add MultivariateNormal to tf.contrib.distributions.Gravatar Eugene Brevdo2016-04-19
| | | | | Also fix overly stringent constraints on batchwise linalg ops & batch_matmul. Change: 120279428
* Support ScratchAllocator in BLAS Batched GEMMGravatar A. Unique TensorFlower2016-03-18
| | | | Change: 117590857
* Use the untemplated version of OpKernelContext::op_device_context() soGravatar Josh Levenberg2016-03-11
| | | | | | we don't need to #include tensorflow/core/common_runtime/gpu_device_context.h. Change: 117019921
* Global search & replace to move to the new location forGravatar Josh Levenberg2016-01-26
| | | | | tensorflow/core/ files and build targets. Change: 113075177
* Change uses of TensorShape::ShortDebugString to DebugStringGravatar Geoffrey Irving2016-01-22
| | | | | | The two functions already have the same behavior, and ShortDebugString will disappear soon. Change: 112793490
* Move #include <vector> out of port.h to users of std::vector<>.Gravatar Josh Levenberg2016-01-21
| | | | | After this we can replace port.h with types.h. Change: 112727463
* Many tensorflow/core build clean ups.Gravatar Josh Levenberg2016-01-20
| | | | Change: 112523833
* Replacing reference 'names' variable with 'example_names' variable.Gravatar A. Unique TensorFlower2016-01-20
| | | | Change: 112481326
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: Doc and linter fixes, some additional tests andGravatar Vijay Vasudevan2015-11-16
| | | | | | | | | | | | | | error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108