aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/python/kernel_tests/batch_matmul_op_test.py
Commit message (Collapse)AuthorAge
* Improves performance of tf.matmul(a, b, ...) for dense tensors on NVIDIA ↵Gravatar A. Unique TensorFlower2017-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | GPUs in the following cases: a) If the inner-most dimension of b is 1, i.e. the operation is (possibly a batch of) matrix*vector multiplication(s). This is accomplished by calling Cublas GEMV rather than GEMM. This speeds up large matrix-vector products by about 4x. b) If one or more dimensions are unknown at graph construction time but the operation is in fact either a single matrix*matrix or matrix*vector multiplication. The following benchmark numbers illustrating the improvements for matrix * vector products were measured on a NVIDIA Titan X (Maxwell) card. Benchmark Base (ns) New (ns) Improvement ---------------------------------------------------------------------------- BM_Matmul_50_50_1_false_false_DT_FLOAT_gpu 18102 17056 +5.8% BM_Matmul_50_50_1_true_false_DT_FLOAT_gpu 18108 16374 +9.6% BM_Matmul_50_50_1_false_true_DT_FLOAT_gpu 18153 17173 +5.4% BM_Matmul_50_50_1_true_true_DT_FLOAT_gpu 18150 15950 +12.1% BM_Matmul_500_500_1_false_false_DT_FLOAT_gpu 64605 16874 +73.9% BM_Matmul_500_500_1_true_false_DT_FLOAT_gpu 62810 17298 +72.5% BM_Matmul_500_500_1_false_true_DT_FLOAT_gpu 60447 17014 +71.9% BM_Matmul_500_500_1_true_true_DT_FLOAT_gpu 58443 16934 +71.0% BM_Matmul_2000_2000_1_false_false_DT_FLOAT_gpu 343298 81898 +76.1% BM_Matmul_2000_2000_1_true_false_DT_FLOAT_gpu 294738 63723 +78.4% BM_Matmul_2000_2000_1_false_true_DT_FLOAT_gpu 300671 83650 +72.2% BM_Matmul_2000_2000_1_true_true_DT_FLOAT_gpu 284540 63742 +77.6% Change: 150456725
* Clean up matmul test suites and make sure that complex tests set the ↵Gravatar A. Unique TensorFlower2017-03-02
| | | | | | imaginary part. Fix a bug in test util function assertAllCloseAccordingToType, which wasn't picking up the right values for complex64. Change: 149016078
* Remove hourglass imports from kernel_testsGravatar Justine Tunney2016-12-14
| | | | Change: 142080137
* Add support for adjoint and batch matmul to tf.matmul.Gravatar A. Unique TensorFlower2016-11-16
| | | | Change: 139370036
* Speed up BatchMatMul when either side is a vector or batch of vectors.Gravatar A. Unique TensorFlower2016-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the same basic workaround as cl/128009436, namely to use the regular Eigen matmul kernel instead of the Eigen tensor contraction when not parallelizing the inner matrix products in BatchMatMul. Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2016-09-22T14:22:52.507929557-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_BatchMatmul_1_1_1024_1024_false_false_DT_FLOAT_cpu 179300 122010 +32.0% BM_BatchMatmul_2_1_1024_1024_false_false_DT_FLOAT_cpu 209037 162153 +22.4% BM_BatchMatmul_8_1_1024_1024_false_false_DT_FLOAT_cpu 906019 946502 -4.5% BM_BatchMatmul_32_1_1024_1024_false_false_DT_FLOAT_cpu 3814403 4473018 -17.3% BM_BatchMatmul_1_10000_200_1_false_false_DT_FLOAT_cpu 322285 252677 +21.6% BM_BatchMatmul_8_10000_200_1_false_false_DT_FLOAT_cpu 2370631 2028039 +14.5% BM_BatchMatmul_32_10000_200_1_false_false_DT_FLOAT_cpu 8994979 12904697 -43.5% BM_BatchMatmul_1_10000_200_1_true_false_DT_FLOAT_cpu 663253 223017 +66.4% BM_BatchMatmul_8_10000_200_1_true_false_DT_FLOAT_cpu 5731654 2266151 +60.5% BM_BatchMatmul_32_10000_200_1_true_false_DT_FLOAT_cpu 18692987 12063885 +35.5% BM_BatchMatmul_1_10000_200_1_false_true_DT_FLOAT_cpu 318234 251075 +21.1% BM_BatchMatmul_8_10000_200_1_false_true_DT_FLOAT_cpu 2355295 2032887 +13.7% BM_BatchMatmul_32_10000_200_1_false_true_DT_FLOAT_cpu 8997442 11618660 -29.1% BM_BatchMatmul_1_10000_200_1_true_true_DT_FLOAT_cpu 652865 225256 +65.5% BM_BatchMatmul_8_10000_200_1_true_true_DT_FLOAT_cpu 5700875 2383607 +58.2% BM_BatchMatmul_32_10000_200_1_true_true_DT_FLOAT_cpu 18957878 12451622 +34.3% BM_BatchMatmul_1_1_200_10000_false_false_DT_FLOAT_cpu 288420 226135 +21.6% BM_BatchMatmul_8_1_200_10000_false_false_DT_FLOAT_cpu 2155747 2406166 -11.6% BM_BatchMatmul_32_1_200_10000_false_false_DT_FLOAT_cpu 10031700 12248817 -22.1% BM_BatchMatmul_1_1_200_10000_true_false_DT_FLOAT_cpu 298456 226108 +24.2% BM_BatchMatmul_8_1_200_10000_true_false_DT_FLOAT_cpu 2096256 2409435 -14.9% BM_BatchMatmul_32_1_200_10000_true_false_DT_FLOAT_cpu 10259905 12408712 -20.9% BM_BatchMatmul_1_1_200_10000_false_true_DT_FLOAT_cpu 1657311 254414 +84.6% BM_BatchMatmul_8_1_200_10000_false_true_DT_FLOAT_cpu 5976722 2031486 +66.0% BM_BatchMatmul_32_1_200_10000_false_true_DT_FLOAT_cpu 23514286 11622619 +50.6% BM_BatchMatmul_1_1_200_10000_true_true_DT_FLOAT_cpu 1653482 250161 +84.9% BM_BatchMatmul_8_1_200_10000_true_true_DT_FLOAT_cpu 5951562 2032097 +65.9% BM_BatchMatmul_32_1_200_10000_true_true_DT_FLOAT_cpu 23587247 11633259 +50.7% Change: 134814248
* Support half floats in BatchMatMul.Gravatar A. Unique TensorFlower2016-09-20
| | | | | Clean up tests and extend coverage to all supported types. Change: 133766358
* Clean up another batch of tensorflow tests that are using use_gpu.Gravatar Gunhan Gulsoy2016-09-09
| | | | Change: 132750089
* Enable complex GPU kernels for tf.matmul and tf.batch_matmul.Gravatar RJ Ryan2016-07-13
| | | | Change: 127386123
* Update copyright for 3p/tf/python.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123900456
* Get rid of some import cruft.Gravatar Josh Levenberg2016-02-10
| | | | Change: 114374558
* TensorFlow: upstream changes to git.Gravatar Vijay Vasudevan2015-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change 109321497 Move all images to images directory to make docs versioning easier - adjust all paths in the docs to point to the new locations - remove some now redundant section-order tags added for the old website Change 109317807 Added a kernel op to compute the eigendecomposition of a self-adjoint matrix. Added a new kernel op called self_adjoint_eig (and a batch_self_adjoint_eig) that computes the eigendecomposition of a self-adjoint matrix. The return value is the concatenation of the eigenvalues as a row vector, and the eigenvectors. Change 109310773 Change `_read32()` in the MNIST input example to return an int. Currently we return a 1-D numpy array with 1 element. Numpy has recently deprecated the ability to treat this as a scalar, and as a result this tutorial fails. The fix returns the 0th element of the array instead. Change 109301269 Re-arrange TensorBoard demo files. Change 109273589 add ci_build for ci.tensorflow.org Change 109260293 Speed up NodeDef -> OpKernel process by not spending time generating an error message for missing "_kernel" attr that will be thrown away. Change 109257179 TensorFlow:make event_file_loader_test hermetic by using tempfile instead of fixed filenames. Without this change, running event_file_loader_test twice in the same client (locally) causes it to fail, because it writes into the same file and appends another event, instead of starting from scratch. Change 109256464 Minor cleanup in TensorBoard server code Change 109255382 Change to reduce critical section times in gpu_event_mgr.h: (1) Call stream->ThenRecordEvent outside the EventMgr critical section (2) Do memory deallocation outside the critical section Speeds up one configuration of ptb_word_lm from 2924 words per second (wps) to 3278 wps on my desktop machine with a Titan X. Change 109254843 Fix use of uninitialized memory in test. Change 109250995 python_config.sh needs a license header Otherwise the license test fails. Change 109249914 add ci_build for ci.tensorflow.org Change 109249397 Fixes reduce_sum (complex) on GPU segfaults. Fixes #357 Change 109245652 add ci_build for ci.tensorflow.org Base CL: 109321563
* TensorFlow: upstream changes to gitGravatar Vijay Vasudevan2015-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change: Clean up documentation for ReverseSequence Change: Updated several tensorflow operations to use 32bit indices on GPU. Change: Add attribute batch_dim to ReverseSequenceOp. Change: Fix error in convert_to_records.py. As reported in https://github.com/tensorflow/tensorflow/issues/370 by AlexUnderMicrocontRoll. Change: Update TensorBoard README. Change: Fixes to boolean flags reported in https://github.com/tensorflow/tensorflow/issues/379. Supports: --bool_flag=True --> True --bool_flag=False --> False --bool_flag=gibberish --> False --bool_flag --> True --nobool_flag --> False Fixes #379 Change: Update generated Op docs. Change: Enable local development of TensorBoard using gulp Also make tf-tensorboard a regular component rather than special case This is mostly effected by creating tfserve.js, which is a small server with clever routing to load from bower_components/ and components/ using the paths that work within google3. Workflow: `gulp serve` Change: Add a full working code example to the tensorboard and summaries tutorial Change: Fix seq2seq_test when running on GPU. The "proj_w" and "proj_b" variables were being created before the `test_session()`'s device function took effect, which pushed the placement algorithm into making an incorrect decision. Change: Add a sentence in TensorBoard README on how to serialize summary data to logs and provide link to the how-to tutorial on the TensorFlow website. Change: Add error-catching code if string_input_producer is supplied a null input. Before this change, it would die with an opaque shape error from inside the queue. This change catches (most) python null lists being passed directly in, and at runtime detects null tensors. Adds two tests for this to input_test.py Change: Speed up for models that use the same variable multiple times in the case where variables must be copied across devices: - Have Variables wrap the Variable op in an Identity op when converted to Tensor. This avoids multiple copies across devices if a variable is used multiple time in a computation. - Add Variable.mutable() to return the non-wrapped Variable op for used when assigning new values. - Add an as_ref parameter to convert_to_tensor() to allow code to specify if they plan to assign a new value to the result of the conversion. Make Variable return the result of Variable.mutable() when as_ref is True. - Make all ops that assign values to variables pass as_ref=True when converting their arguments. Change: Change to reduce critical section times in gpu_event_mgr.h: (1) Call stream->ThenRecordEvent outside the EventMgr critical section (2) Do memory deallocation outside the critical section Speeds up one configuration of ptb_word_lm from 2924 words per second (wps) to 3278 wps on my desktop machine with a Titan X. Change: Remove some colons that break the open source build ::tensorflow::StringPiece breaks for @raingo, see https://github.com/tensorflow/tensorflow/issues/358. tensorflow::StringPiece (without the leading colons) seems to fix the problem. Change: Added check that inputs to Operation is a list and make a defensive copy of the input. This is for cases where the input list is changed such as in _add_input. Change: Use standard names for TensorFlow dtypes in the tutorial. Change: Add tests for tensor inputs. Change: Fix build after declaring more types for ops Change: Switch to 32 bit indexing to speedup convolutions and concatenations. Change: Add convert_image op to convert between types for images (similar to OpenCV's cvtScale). Change: Make cast work between numeric types (bool, uint8, int16, int32, int64, float, double). Change: Padding input data for odd number of paddings, so we can use cudnn anyway. + Fix total padding computation when padding==VALID. + This CL makes the Googlenet benchmark run 5x faster. Change: Support IndexedSlices in ConcatGrad Change: * sampled softmax op uses one embedding lookup for positive and negative samples * float64 support for sampled softmax Change: Move RNN code out of models.rnn (without breaking existing code). The API may still undergo minor changes, until full documentation as added. Change: Changed to use per-step stacks for the accumulators used in while-loop gradient computation. This addresses the problem caused by using concat without sufficient static shape information. It should also improve performance as we avoided those expensive concats. Change: Update generated Op docs. Change: Improve error messages when the optimizer finds no variables to minimize or when none of the variables has gradients. Change: Say that -1 isn't just for flattening in reshape docs Also add scalar reshape (reshape(t, [])) as an example. This fixes https://github.com/tensorflow/tensorflow/issues/281. Change: This is a test. Base CL: 109118714
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: upstream changes from the afternoon.Gravatar Vijay Vasudevan2015-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - futurize --stage2 changes for Python 3 compatibility by @girving. - Small updates to documentation by @vrv, schuster and others - Account for failure of std::thread::hardware_concurrency by @ebrevdo. - More changes for backwards-compatibility tests by Josh - Updates to python op doc generation by Josh - Added support for using the best-fit allocator via ConfigProto by @vrv. - Rename LocalSession to DirectSession, since local was a bad name for it. - Enable tf.nn.moments() to work with tensors of unknown shape by @mrry. GITHUB_ISSUE: 139 - Changes for Android build by Andrew. Base CL: 107645181
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108