| Commit message (Collapse) | Author | Age |
|
|
|
| |
python bindings is disabled in CMake (#9660)
|
|
|
| |
information => information
|
|\
| |
| | |
Branch 155159972
|
| |
| |
| |
| |
| |
| | |
was removed from the exclusion list. Because of this the number of
symbols in the def file was close to 64K for gpu builds and yesterday
a few added symbols pushed us over the 64K limit for the windows linker.
Adding RTTI back to the exclusion list.
|
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ClusterSpec propagation is a capability upgrade for TensorFlow that should make
it much easier to (1) build distributed TensorFlow clusters, and (2) handle
node failures. The ClusterSpec propagation capability allows TensorFlow workers
to be booted independently of each other, and with no knowledge about others.
The client can then construct a ClusterDef (ClusterSpec), and then send it
to the TF master at session creation. The master in turn then propagates the
ClusterDef along to all of the workers.
Change: 155159972
|
| |
| |
| |
| |
| |
| |
| | |
ObjectTracker support is found. If libtensorflow_demo.so is not found in the APK, rendered boxes will simply be stationary and will be replaced whenever new results come in.
Partially addresses #6385
Change: 155159326
|
| |
| |
| |
| | |
Change: 155158477
|
| |
| |
| |
| | |
Change: 155158042
|
| |
| |
| |
| | |
Change: 155156366
|
| | |
|
| |
| |
| |
| |
| |
| | |
#6268
#9150
Change: 155146664
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Also added an additional GPU int32::max check that was missing.
Performance seems to be between 1x-10x faster on average. The likely culprit on CPU slowdown was probably the unnecessary temp allocation for scratch space.
Performance on a k40, compiled -c opt --config cuda --copt=-mavx:
**BEFORE**
Matrix sizes:
A sparse [m, k] with % nonzero values between 1% and 80%
B dense [k, n]
% nnz n gpu m k dt(dense) dt(sparse) dt(sparse)/dt(dense)
0.01 50 True 100 100 0.000319954 0.000275495 0.861045
0.01 50 True 100 1000 0.000469565 0.000290895 0.619497
0.01 50 True 1000 100 0.000572815 0.000271131 0.473331
0.01 50 True 1000 1000 0.00133119 0.00042006 0.315554
0.01 50 False 100 100 0.00034191 0.000289171 0.845751
0.01 50 False 100 1000 0.0004796 0.00028483 0.593891
0.01 50 False 1000 100 0.000632371 0.000300461 0.475134
0.01 50 False 1000 1000 0.00134726 0.000576285 0.427746
0.01 100 True 100 100 0.000353755 0.00027729 0.783849
0.01 100 True 100 1000 0.000536649 0.00028337 0.528036
0.01 100 True 1000 100 0.000661941 0.00027933 0.421987
0.01 100 True 1000 1000 0.0014109 0.0006698 0.474732
0.01 100 False 100 100 0.00039546 0.00030159 0.762631
0.01 100 False 100 1000 0.00054909 0.00027276 0.49675
0.01 100 False 1000 100 0.000631344 0.00028231 0.447157
0.01 100 False 1000 1000 0.00141789 0.000657049 0.463398
0.2 50 True 100 100 0.00033689 0.000280155 0.831591
0.2 50 True 100 1000 0.000563495 0.00064159 1.13859
0.2 50 True 1000 100 0.00058635 0.00067611 1.15308
0.2 50 True 1000 1000 0.00153552 0.00486242 3.16662
0.2 50 False 100 100 0.000333545 0.000267555 0.802154
0.2 50 False 100 1000 0.000544 0.00066272 1.21824
0.2 50 False 1000 100 0.00058253 0.000670955 1.15179
0.2 50 False 1000 1000 0.00153017 0.00480928 3.14298
0.2 100 True 100 100 0.00036919 0.000288659 0.781872
0.2 100 True 100 1000 0.00067063 0.00110059 1.64113
0.2 100 True 1000 100 0.00066443 0.00108547 1.63369
0.2 100 True 1000 1000 0.00180991 0.00961579 5.31286
0.2 100 False 100 100 0.00040061 0.000325365 0.812174
0.2 100 False 100 1000 0.00066774 0.00111843 1.67494
0.2 100 False 1000 100 0.000696205 0.00108078 1.55239
0.2 100 False 1000 1000 0.00179788 0.00960569 5.34278
0.5 50 True 100 100 0.00034819 0.00033425 0.959963
0.5 50 True 100 1000 0.00075176 0.00134084 1.78359
0.5 50 True 1000 100 0.000642445 0.00133641 2.08019
0.5 50 True 1000 1000 0.00233791 0.0124282 5.31597
0.5 50 False 100 100 0.000345069 0.000334586 0.96962
0.5 50 False 100 1000 0.00071701 0.00135879 1.89508
0.5 50 False 1000 100 0.000632119 0.00134036 2.12043
0.5 50 False 1000 1000 0.00240216 0.0126202 5.25368
0.5 100 True 100 100 0.000393934 0.00040344 1.02413
0.5 100 True 100 1000 0.000957675 0.002709 2.82873
0.5 100 True 1000 100 0.000756125 0.00242428 3.20619
0.5 100 True 1000 1000 0.00298202 0.0241416 8.09572
0.5 100 False 100 100 0.000395606 0.000433675 1.09623
0.5 100 False 100 1000 0.000963565 0.00248293 2.57682
0.5 100 False 1000 100 0.00079523 0.0024281 3.05333
0.5 100 False 1000 1000 0.00299668 0.0242615 8.09614
0.8 50 True 100 100 0.00036806 0.00040923 1.11186
0.8 50 True 100 1000 0.00091419 0.00207383 2.26848
0.8 50 True 1000 100 0.000684329 0.00196612 2.87307
0.8 50 True 1000 1000 0.00302433 0.0199798 6.60637
0.8 50 False 100 100 0.000368149 0.000615025 1.67058
0.8 50 False 100 1000 0.0008786 0.00205821 2.3426
0.8 50 False 1000 100 0.00067889 0.00195498 2.87967
0.8 50 False 1000 1000 0.00290009 0.0191242 6.59434
0.8 100 True 100 100 0.000452549 0.00063767 1.40906
0.8 100 True 100 1000 0.00126929 0.00391422 3.08378
0.8 100 True 1000 100 0.000919235 0.00386167 4.20096
0.8 100 True 1000 1000 0.00423295 0.0431824 10.2015
0.8 100 False 100 100 0.000428261 0.000626891 1.46381
0.8 100 False 100 1000 0.00120801 0.00395877 3.27711
0.8 100 False 1000 100 0.00080466 0.00385143 4.78641
0.8 100 False 1000 1000 0.00370808 0.0403527 10.8824
**AFTER**
Matrix sizes:
A sparse [m, k] with % nonzero values between 1% and 80%
B dense [k, n]
% nnz n gpu m k dt(dense) dt(sparse) dt(sparse)/dt(dense)
0.01 50 True 100 100 0.000312485 0.00020528 0.656927
0.01 50 True 100 1000 0.0004655 0.00020095 0.431686
0.01 50 True 1000 100 0.000567449 0.000203935 0.359389
0.01 50 True 1000 1000 0.00132323 0.00027171 0.205339
0.01 50 False 100 100 0.000319945 0.000197511 0.617328
0.01 50 False 100 1000 0.000466419 0.000210185 0.450635
0.01 50 False 1000 100 0.0005581 0.000199865 0.358117
0.01 50 False 1000 1000 0.00129479 0.000451496 0.348702
0.01 100 True 100 100 0.000364131 0.000196835 0.540561
0.01 100 True 100 1000 0.00053398 0.000206494 0.386708
0.01 100 True 1000 100 0.00062722 0.000203185 0.323946
0.01 100 True 1000 1000 0.00138674 0.000335904 0.242227
0.01 100 False 100 100 0.000361339 0.000195 0.53966
0.01 100 False 100 1000 0.000531831 0.000207155 0.389513
0.01 100 False 1000 100 0.00062245 0.000197015 0.316515
0.01 100 False 1000 1000 0.0014007 0.000328825 0.234757
0.2 50 True 100 100 0.00033185 0.000262895 0.792209
0.2 50 True 100 1000 0.00054391 0.000586189 1.07773
0.2 50 True 1000 100 0.000581805 0.000531535 0.913597
0.2 50 True 1000 1000 0.00153913 0.00142783 0.927687
0.2 50 False 100 100 0.00033572 0.000266831 0.794803
0.2 50 False 100 1000 0.000534315 0.000585151 1.09514
0.2 50 False 1000 100 0.000580961 0.00033344 0.573947
0.2 50 False 1000 1000 0.0015055 0.00143968 0.956284
0.2 100 True 100 100 0.000371666 0.00026337 0.708621
0.2 100 True 100 1000 0.000667235 0.00056811 0.851439
0.2 100 True 1000 100 0.000671356 0.000400575 0.596666
0.2 100 True 1000 1000 0.00178568 0.00250393 1.40222
0.2 100 False 100 100 0.000370425 0.000254935 0.688223
0.2 100 False 100 1000 0.000661175 0.000601134 0.909191
0.2 100 False 1000 100 0.0006944 0.00039817 0.573401
0.2 100 False 1000 1000 0.00176969 0.0024947 1.40968
0.5 50 True 100 100 0.000346885 0.000263295 0.759028
0.5 50 True 100 1000 0.00073113 0.00107669 1.47263
0.5 50 True 1000 100 0.000672774 0.000493085 0.732914
0.5 50 True 1000 1000 0.00260436 0.003335 1.28054
0.5 50 False 100 100 0.00036242 0.000273196 0.753809
0.5 50 False 100 1000 0.000753295 0.00107086 1.42157
0.5 50 False 1000 100 0.00064886 0.000501654 0.773132
0.5 50 False 1000 1000 0.00241105 0.0033146 1.37475
0.5 100 True 100 100 0.000401269 0.00027831 0.693573
0.5 100 True 100 1000 0.00094245 0.00111468 1.18275
0.5 100 True 1000 100 0.00075719 0.00074962 0.990003
0.5 100 True 1000 1000 0.00297528 0.00601445 2.02147
0.5 100 False 100 100 0.000408576 0.00026246 0.642377
0.5 100 False 100 1000 0.00094272 0.00112762 1.19613
0.5 100 False 1000 100 0.000762925 0.00074343 0.974446
0.5 100 False 1000 1000 0.00314936 0.00604122 1.91824
0.8 50 True 100 100 0.00036589 0.000331376 0.905669
0.8 50 True 100 1000 0.00086403 0.00171248 1.98197
0.8 50 True 1000 100 0.00067048 0.000715261 1.06679
0.8 50 True 1000 1000 0.00284684 0.00527865 1.85422
0.8 50 False 100 100 0.000357161 0.000540144 1.51233
0.8 50 False 100 1000 0.000884765 0.00170428 1.92625
0.8 50 False 1000 100 0.000666975 0.000737065 1.10509
0.8 50 False 1000 1000 0.0028149 0.00530442 1.88441
0.8 100 True 100 100 0.00041237 0.00034323 0.832335
0.8 100 True 100 1000 0.00122102 0.00179725 1.47192
0.8 100 True 1000 100 0.000807976 0.00111246 1.37684
0.8 100 True 1000 1000 0.00379081 0.00968211 2.5541
0.8 100 False 100 100 0.000426315 0.000339085 0.795386
0.8 100 False 100 1000 0.00144096 0.00179819 1.2479
0.8 100 False 1000 100 0.000951196 0.0011155 1.17274
0.8 100 False 1000 1000 0.0039524 0.00980128 2.47983
Change: 155142876
|
| |
| |
| |
| | |
Change: 155140112
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* relu grad and maxpooling grad fixes for perf
* Graph layout pass and conversion pass changes
This commit makes following changes:
- Enables support for ReluGrad and BiasAddGrad
- Adds support for detecting depthwise/batchwise pooling
- Adds more unit tests for Graph rewrite pass
- Improvements to handling control-flow edges
- Bug fixes
* Defaulting to Eigen when LRN depth_radius!=2
* Fixed mkl_conv_grad_filter.cc for conv_ops_tests.py
* Style fix to mkl_matmul and remove unnecessary 'MKL' label on matmul kernel
* Style fixes based on clang-format to mkl_conv_* and mkl_matmul
* Bug fixes
* Adding OP_REQUIRES_OK check in Concat
* Making some style changes
* Enabled the configuration of MKL settings
* relu grad and maxpooling grad fixes for perf
* Graph layout pass and conversion pass changes
This commit makes following changes:
- Enables support for ReluGrad and BiasAddGrad
- Adds support for detecting depthwise/batchwise pooling
- Adds more unit tests for Graph rewrite pass
- Improvements to handling control-flow edges
- Bug fixes
* Defaulting to Eigen when LRN depth_radius!=2
* Fixed mkl_conv_grad_filter.cc for conv_ops_tests.py
* Style fix to mkl_matmul and remove unnecessary 'MKL' label on matmul kernel
* Style fixes based on clang-format to mkl_conv_* and mkl_matmul
* Bug fixes
* Adding OP_REQUIRES_OK check in Concat
* Making some style changes
* Enabled the configuration of MKL settings
* Fixing graph unit tests with Mkl op name change to _Mkl; Fixed missing _ in MklToTf op
* Fixed missing libdl.so.2 in BUILD file
* Fixes for unit test build failures.
* Changes in mkl_conv_grad_filter_ops.cc for Google code style
* Fixes to remove dead code
* removed the dead code and added a TODO for mkl implementation to handle this case in the future
* Enabling MklIdentityOp
* Calling MKL for all values of depth radius in LRN
* Fixed buildifier sanity check error
* Adding support for google's CI automation
* Updated link to new MKL version
* Enabling MklIdentityOp
* Calling MKL for all values of depth radius in LRN
* Fix for missing locate binary
* Fix for missing locate command in CI
* Adding updatedb to populate the database after installing mlocate
* Fixed buildifier issue
* setting tf_need_mkl=0 in libtf files
* Added third_party/mkl/* to .gitignore
* Added third_party/eigen3/mkl_include to .gitignore
* In configured, set MKL-enabling options only for Linux.
* Enabling MklIdentityOp
* Calling MKL for all values of depth radius in LRN
* Making style fix in LRN
* Fixed Indentation
|
| |
| |
| |
| | |
Change: 155140054
|
| |
| |
| |
| | |
Change: 155136555
|
| |
| |
| |
| |
| |
| | |
line number as the last element in each traceback tuple.
Change: 155136334
|
| |
| |
| |
| | |
Change: 155135670
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
move transformed into core.
(I accidentally moved the wrong file in the previous change!)
Also move the Identity bijector & test into core.
I can't move the TransformedDistribution test into core since it relies on linalg.
Change: 155130709
|
| |
| |
| |
| | |
Change: 155130334
|
| |
| |
| |
| | |
Change: 155123817
|
| |
| |
| |
| | |
Change: 155122342
|
| |
| |
| |
| | |
Change: 155121754
|
| |
| |
| |
| |
| |
| | |
logits.
Change: 155121560
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
native implementation is not found. This means that compiling libtensorflow_demo.so will only be strictly necessary for the Detection example (which uses native object tracking). A followup change will add graceful degradation in that case too.
Java conversion may be slower depending on the device, but should still be acceptable for demo purposes as the majority of the compute time will still be spent on TF inference passes.
Note that this has no effect on the necessity of libtensorflow_inference.so, which provides the actual TF support. However libtensorflow_inference.so may be added to applications via the prebuilt AAR, so no native compilation is necessary.
Partially addresses #6385
Change: 155121431
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* relu grad and maxpooling grad fixes for perf
* Graph layout pass and conversion pass changes
This commit makes following changes:
- Enables support for ReluGrad and BiasAddGrad
- Adds support for detecting depthwise/batchwise pooling
- Adds more unit tests for Graph rewrite pass
- Improvements to handling control-flow edges
- Bug fixes
* Defaulting to Eigen when LRN depth_radius!=2
* Fixed mkl_conv_grad_filter.cc for conv_ops_tests.py
* Style fix to mkl_matmul and remove unnecessary 'MKL' label on matmul kernel
* Style fixes based on clang-format to mkl_conv_* and mkl_matmul
* Bug fixes
* Adding OP_REQUIRES_OK check in Concat
* Making some style changes
* Enabled the configuration of MKL settings
* relu grad and maxpooling grad fixes for perf
* Graph layout pass and conversion pass changes
This commit makes following changes:
- Enables support for ReluGrad and BiasAddGrad
- Adds support for detecting depthwise/batchwise pooling
- Adds more unit tests for Graph rewrite pass
- Improvements to handling control-flow edges
- Bug fixes
* Defaulting to Eigen when LRN depth_radius!=2
* Fixed mkl_conv_grad_filter.cc for conv_ops_tests.py
* Style fix to mkl_matmul and remove unnecessary 'MKL' label on matmul kernel
* Style fixes based on clang-format to mkl_conv_* and mkl_matmul
* Bug fixes
* Adding OP_REQUIRES_OK check in Concat
* Making some style changes
* Enabled the configuration of MKL settings
* Fixing graph unit tests with Mkl op name change to _Mkl; Fixed missing _ in MklToTf op
* Fixed missing libdl.so.2 in BUILD file
* Fixes for unit test build failures.
* Changes in mkl_conv_grad_filter_ops.cc for Google code style
* Fixes to remove dead code
* removed the dead code and added a TODO for mkl implementation to handle this case in the future
* Fixed buildifier sanity check error
* Adding support for google's CI automation
* Updated link to new MKL version
* Fix for missing locate command in CI
* Adding updatedb to populate the database after installing mlocate
* Fixed buildifier issue
* setting tf_need_mkl=0 in libtf files
* Added third_party/mkl/* to .gitignore
* Added third_party/eigen3/mkl_include to .gitignore
* In configured, set MKL-enabling options only for Linux.
|
| |
| |
| |
| |
| |
| | |
* Java API to get the size of output list operations
* Java API to get the size of output list operations
|
| |
| |
| |
| |
| |
| | |
data_flow_ops.cc
Change: 155119120
|
| |
| |
| |
| | |
Change: 155117831
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Starting with the next release (1.2) GPU support for Mac will be dropped in
release binaries since the configuration of NVIDIA GPUs on a Mac is somewhat
esoteric (and we currently do not have the bandwidth to debug test failures
on that platform).
While at it, change to version 1.1.0 from 1.1.0-rc2
Change: 155117808
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change:
1. updates common_env.sh to export PYTHON_LIB_PATH
along with PYTHON_BIN_PATH so the configure
scripts doesn't have to guess
2. writes these paths to bazelrc with quotes
around, to guard agains spaces in the path (e.g.
"C:/Program Files/Anaconda3/python")
Fixes https://github.com/bazelbuild/bazel/issues/2892
|
| |
| |
| |
| | |
Change: 155115134
|
| |
| |
| |
| |
| |
| | |
some compilers don't like that.
Change: 155115034
|
| |
| |
| |
| |
| |
| |
| | |
Previously, the code made the assumption that the two operands have matching
shapes, but did not enforce the equality. This could lead to invalid memory
access in some cases.
Change: 155109464
|
| |
| |
| |
| | |
Change: 155102455
|
| |
| |
| |
| | |
Change: 155098574
|
| |
| |
| |
| | |
Change: 155096835
|
| |
| |
| |
| | |
Change: 155094164
|
| |
| |
| |
| |
| |
| | |
valid values.
Change: 155094112
|
| |
| |
| |
| | |
Change: 155092161
|
| |
| |
| |
| |
| |
| |
| | |
* Add input function for training and testing
Estimator is decoupled from Scikit Learn interface by moving into separate class SKCompat. Arguments x, y and batch_size are only available in the SKCompat class, Estimator will only accept input_fn
* remove extra comma
|
| |
| |
| |
| |
| |
| | |
builders.
Change: 155090692
|
| |
| |
| |
| |
| |
| |
| | |
Conv2DBackpropFilter, and Conv2DBackpropInput, because in such cases, NHWC is usually faster than NCHW. The cost of enabling this option is the overhead of more non-cancellable layout conversion nodes. We added auto tuning to choose a better option by estimating the overhead using the number of added layout conversion nodes.
Don't Convert the layout for Sum, because reduction along dimension 0, 2, 3 (in NCHW) is about 10x slower than along 0, 1, 2 (in NHWC).
Change: 155089805
|
| |
| |
| |
| |
| | |
Fixes #9651.
Change: 155089799
|
| |
| |
| |
| | |
Change: 155089162
|
| |
| |
| |
| | |
Change: 155087824
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* GLSTM cell from https://openreview.net/forum?id=ByxWXyNFg¬eId=ByxWXyNFg
* Responding to comments on PR#9606
* Update comments according to review.
* More fixes on users' behalf.
|
| |
| |
| |
| |
| | |
Move lookup_ops implementation from tensorflow/contrib/lookup to tensorflow/python/feature_column.
Change: 155079825
|
| |
| |
| |
| | |
Change: 155070869
|