| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
a) EIGEN_USE_NONBLOCKING_THREAD_POOL no longer exist. The non-blocking threadpool is the default (It can be disabled by the symbol EIGEN_USE_SIMPLE_THREAD_POOL).
b) An eigen_initialize.cc file is no longer needed due to recent fixes for thread safety in Eigen.
Change: 138074328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change does the following:
- Always use {,new_}http_archive rather than git_repository
- Make liberal use of strip_prefix
- Clarify licenses() in BUILD files
- On POSIX include headers like a normal C/C++ program
This change accomplishes the following:
- Reduce download size >100MB: The biggest culprit is grpc which has
tens of thousands of commits in its GitHub repository.
- Reduce disk size >200MB: On disk, grpc takes up 250MB when cloned even
though the tarball of the git repo is 3.2MB. By never using git
externals, we save on network.
- Consume less cpu: Cloning git repositories is much slower than
downloading and extracting a tarball.
Change: 133895791
|
|
|
|
| |
Change: 128401884
|
|
|
|
|
|
| |
improvements for fp16
Added SpecialFunctions to the list of eigen headers TensorFlow depends on
Change: 127264575
|
|
|
|
| |
Change: 127253427
|
|
|
|
|
| |
improvements for fp16
Change: 127233960
|
|
|
|
|
|
| |
handle per-thread buffer allocation for the tileable executor without resorting to thread_local that is not fully supported on Android.
Change: 126009029
|
|
|
|
|
|
| |
will enable the implementation of the cumsum operation in TensorFlow
Change: 125697517
|
|
|
|
|
| |
performance of the toy mnist training by 1 order of magnitude
Change: 124374286
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NEW
BM_fullReduction/10 4591 4595 153149 20.8M items/s
BM_fullReduction/64 5073 5075 100000 770.0M items/s
BM_fullReduction/512 9067 9070 75263 26.9G items/s
BM_fullReduction/4k 243984 244125 2868 64.0G items/s
BM_fullReduction/5k 359125 359273 1951 64.8G items/s
OLD
BM_fullReduction/10 9085 9087 74395 10.5M items/s
BM_fullReduction/64 9478 9478 72014 412.1M items/s
BM_fullReduction/512 14643 14646 46902 16.7G items/s
BM_fullReduction/4k 260338 260384 2678 60.0G items/s
BM_fullReduction/5k 385076 385178 1818 60.5G items/s
Change: 124290852
|
|
|
|
|
| |
gradients, some variants etc.).
Change: 124197406
|
|
|
|
|
| |
gradients, some variants etc.).
Change: 123967787
|
|
|
|
|
| |
gradients, some variants etc.).
Change: 123967117
|
|
|
|
| |
Change: 123659102
|
|
|
|
| |
Change: 123238579
|
|
|
|
|
|
|
|
| |
with many cpu cores
For example, the wall time for the following tutorial went down from 13m35 to 5m27:
bazel run -c opt --copt=-mavx tensorflow/examples/tutorials/word2vec/word2vec_basic
Change: 122462177
|
|
|
|
| |
Change: 122192081
|
|
|
|
|
| |
by about 3 orders of magnitude as well as some partial reductions by 30% when using cuda 7.5 or above
Change: 122191448
|
|
|
|
|
|
| |
gpus
Updated the check numerics code to make it compatible with fp16
Change: 120980302
|
|
|
|
| |
Change: 120739269
|
|
|
|
|
|
|
|
| |
tensorflow: switch to eigen thread pool
This is first step of switching tensorflow to the new
non-blocking thread pool in eigen.
Change: 120510292
|
|
|
|
|
|
| |
on GPU
Change: 120505517
|
|
|
|
|
| |
offered by AWS
Change: 120369420
|
|
|
|
|
| |
sigmoid of fp16 and introduces a condition estimator.
Change: 119907721
|
|
|
|
| |
Change: 119850987
|
|
|
|
|
| |
improvements for fp16
Change: 119771118
|
|
|
|
| |
Change: 119458778
|
|
|
|
|
| |
as well as fp16
Change: 119398881
|
|
|
|
|
|
|
| |
the zeta
and polygamma functions, as well as improved support for float16.
Change: 119279101
|
|
|
|
|
| |
and fixes the computation of absolute values on gpu.
Change: 119001808
|
|
|
|
| |
Change: 118414762
|
|
|
|
|
| |
Use Eigen mod functors directly instead of duplicating them.
Change: 118362359
|
|
|
|
|
| |
tensorflow/core/kernel.
Change: 117941211
|
|
|
|
| |
Change: 117570343
|
|
|
|
| |
Change: 117506296
|
|
|
|
|
| |
Also fixed compilation issues with cuda devices that support the compute model 5.3
Change: 117493644
|
|
|
|
| |
Change: 116831720
|
|
|
|
|
| |
both avx instructions and cuda to run as fast as possible.
Change: 116775924
|
|
|
|
|
| |
other than 0 and significantly speeds up a number of computations on GPUs.
Change: 116607765
|
|
|
|
|
| |
GPUs
Change: 116409601
|
|
|
|
| |
Change: 116063261
|
|
|
|
| |
Change: 115889721
|
|
|
|
| |
Change: 115280348
|
|
|
|
| |
Change: 115268843
|
|
|
|
|
| |
floats in TensorFlow. The code was tested on Tegra x1.
Change: 115253733
|
|
|
|
| |
Change: 114585944
|
|
|
|
|
|
|
| |
Eigen::CompleteOrthogonalDecomposition (COD), which I recently contributed to Eigen in https://bitbucket.org/eigen/eigen/pull-requests/163/implement-complete-orthogonal/diff
The advantage of COD over column pivoted QR is that it is able to compute the minimum-norm solution when the matrix is rank-deficient, which is usually the desired behavior and makes it consistent with the fast path.
Change: 114483303
|
|
|
|
| |
Change: 114243879
|
|
|
|
|
| |
stability improvements
Change: 113791782
|
|
|
|
| |
Change: 113371678
|