| Commit message (Collapse) | Author | Age |
|
|
|
| |
Change: 134820471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the same basic workaround as cl/128009436, namely to use the regular Eigen matmul kernel instead of the Eigen tensor contraction when not parallelizing the inner matrix products in BatchMatMul.
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2016-09-22T14:22:52.507929557-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_BatchMatmul_1_1_1024_1024_false_false_DT_FLOAT_cpu 179300 122010 +32.0%
BM_BatchMatmul_2_1_1024_1024_false_false_DT_FLOAT_cpu 209037 162153 +22.4%
BM_BatchMatmul_8_1_1024_1024_false_false_DT_FLOAT_cpu 906019 946502 -4.5%
BM_BatchMatmul_32_1_1024_1024_false_false_DT_FLOAT_cpu 3814403 4473018 -17.3%
BM_BatchMatmul_1_10000_200_1_false_false_DT_FLOAT_cpu 322285 252677 +21.6%
BM_BatchMatmul_8_10000_200_1_false_false_DT_FLOAT_cpu 2370631 2028039 +14.5%
BM_BatchMatmul_32_10000_200_1_false_false_DT_FLOAT_cpu 8994979 12904697 -43.5%
BM_BatchMatmul_1_10000_200_1_true_false_DT_FLOAT_cpu 663253 223017 +66.4%
BM_BatchMatmul_8_10000_200_1_true_false_DT_FLOAT_cpu 5731654 2266151 +60.5%
BM_BatchMatmul_32_10000_200_1_true_false_DT_FLOAT_cpu 18692987 12063885 +35.5%
BM_BatchMatmul_1_10000_200_1_false_true_DT_FLOAT_cpu 318234 251075 +21.1%
BM_BatchMatmul_8_10000_200_1_false_true_DT_FLOAT_cpu 2355295 2032887 +13.7%
BM_BatchMatmul_32_10000_200_1_false_true_DT_FLOAT_cpu 8997442 11618660 -29.1%
BM_BatchMatmul_1_10000_200_1_true_true_DT_FLOAT_cpu 652865 225256 +65.5%
BM_BatchMatmul_8_10000_200_1_true_true_DT_FLOAT_cpu 5700875 2383607 +58.2%
BM_BatchMatmul_32_10000_200_1_true_true_DT_FLOAT_cpu 18957878 12451622 +34.3%
BM_BatchMatmul_1_1_200_10000_false_false_DT_FLOAT_cpu 288420 226135 +21.6%
BM_BatchMatmul_8_1_200_10000_false_false_DT_FLOAT_cpu 2155747 2406166 -11.6%
BM_BatchMatmul_32_1_200_10000_false_false_DT_FLOAT_cpu 10031700 12248817 -22.1%
BM_BatchMatmul_1_1_200_10000_true_false_DT_FLOAT_cpu 298456 226108 +24.2%
BM_BatchMatmul_8_1_200_10000_true_false_DT_FLOAT_cpu 2096256 2409435 -14.9%
BM_BatchMatmul_32_1_200_10000_true_false_DT_FLOAT_cpu 10259905 12408712 -20.9%
BM_BatchMatmul_1_1_200_10000_false_true_DT_FLOAT_cpu 1657311 254414 +84.6%
BM_BatchMatmul_8_1_200_10000_false_true_DT_FLOAT_cpu 5976722 2031486 +66.0%
BM_BatchMatmul_32_1_200_10000_false_true_DT_FLOAT_cpu 23514286 11622619 +50.6%
BM_BatchMatmul_1_1_200_10000_true_true_DT_FLOAT_cpu 1653482 250161 +84.9%
BM_BatchMatmul_8_1_200_10000_true_true_DT_FLOAT_cpu 5951562 2032097 +65.9%
BM_BatchMatmul_32_1_200_10000_true_true_DT_FLOAT_cpu 23587247 11633259 +50.7%
Change: 134814248
|
|
|
|
|
|
|
|
|
| |
* Uses simple heuristics to choose between parallelizing outer (batch), inner (matmul) or both.
* Adds benchmarks for BatchMatMul.
* Switches matmul benchmark to use real time so GFlops reported are w.r.t. walltime and measure the effect of multi-threading.
* Refactors the code to avoid a lot of extra instantiations of the Eigen contraction code, which bloats code size (size of test binary drop from 9.3MB to 5.5MB) and compilation time (time binary compilation time drop from ~210 to ~75 seconds).
* Fixes bug in cost_per_unit calculation. The old code calculated B*M*N instead of M*N*K.
Change: 134138821
|
|
|
|
| |
Change: 134037266
|
|
|
|
|
|
|
|
| |
* Uses simple heuristics to choose between parallelizing outer (batch), inner (matmul) or both.
* Adds benchmarks for BatchMatMul.
* Switches matmul benchmark to use real time so GFlops reported are w.r.t. walltime and measure the effect of multi-threading.
* Fixes bug in cost_per_unit calculation. The old code calculated B*M*N instead of M*N*K.
Change: 134025273
|
|
|
|
|
| |
Clean up tests and extend coverage to all supported types.
Change: 133766358
|
|
|
|
| |
Change: 127386123
|
|
|
|
| |
Change: 124012080
|
|
|
|
| |
Change: 123900938
|
|
|
|
|
| |
Also fix overly stringent constraints on batchwise linalg ops & batch_matmul.
Change: 120387462
|
|
|
|
|
| |
Also fix overly stringent constraints on batchwise linalg ops & batch_matmul.
Change: 120289843
|
|
|
|
|
| |
Also fix overly stringent constraints on batchwise linalg ops & batch_matmul.
Change: 120279428
|
|
|
|
| |
Change: 117590857
|
|
|
|
|
|
| |
we don't need to #include
tensorflow/core/common_runtime/gpu_device_context.h.
Change: 117019921
|
|
|
|
|
| |
tensorflow/core/ files and build targets.
Change: 113075177
|
|
|
|
|
|
| |
The two functions already have the same behavior, and ShortDebugString
will disappear soon.
Change: 112793490
|
|
|
|
|
| |
After this we can replace port.h with types.h.
Change: 112727463
|
|
|
|
| |
Change: 112523833
|
|
|
|
| |
Change: 112481326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes:
* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command
Base CL: 108349164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
error handling, updates to website.
Changes:
- Removes redundant reshape from image models by @mrry
- Default TensorBoard to localhost by @danmane
- Reformatting of tensorflow/core by @josh11b
- Make tutorials backwards compatible to 0.5.0 by @girving
- Improve print documentation (md files not updated).
- Add proper scrolling to sitemap by @martinwicke
Base CL: 107956254
|
|
TensorFlow is an open source software library for numerical computation
using data flow graphs.
Base CL: 107276108
|