index
:
eigen
master
C++ library for linear algebra
about
summary
refs
log
tree
commit
diff
homepage
log msg
author
committer
range
path:
root
/
unsupported
/
Eigen
/
CXX11
/
src
/
Tensor
/
TensorReductionCuda.h
Commit message (
Expand
)
Author
Age
*
Made the Tensor code compile with clang 3.9
Benoit Steiner
2017-03-02
*
Fix remaining CUDA >= 300 checks
Igor Babuschkin
2016-08-18
*
Add the necessary CUDA >= 300 checks back
Igor Babuschkin
2016-08-18
*
Remove CUDA >= 300 checks and enable outer reductin for doubles
Igor Babuschkin
2016-08-06
*
Make use of atomicExch for atomicExchCustom
Igor Babuschkin
2016-08-05
*
Enable efficient Tensor reduction for doubles
Igor Babuschkin
2016-07-01
*
Made it possible to compile reductions for an old cuda architecture and run t...
Benoit Steiner
2016-06-29
*
Made the code compile when using CUDA architecture < 300
Benoit Steiner
2016-06-29
*
Simplified the code that dispatches vectorized reductions on GPU
Benoit Steiner
2016-06-09
*
Improved support for vectorization of 16-bit floats
Benoit Steiner
2016-06-09
*
Misc small improvements to the reduction code.
Benoit Steiner
2016-06-06
*
Improved the performance of full reductions.
Benoit Steiner
2016-06-03
*
Silenced compilation warning generated by nvcc.
Benoit Steiner
2016-06-01
*
Added support for mean reductions on fp16
Benoit Steiner
2016-06-01
*
Only enable optimized reductions of fp16 if the reduction functor supports them
Benoit Steiner
2016-05-31
*
Resolved merge conflicts
Benoit Steiner
2016-05-26
*
Merged latest reduction improvements
Benoit Steiner
2016-05-26
|
\
*
|
Improved the performance of inner reductions.
Benoit Steiner
2016-05-26
*
|
There is no need to make the fp16 full reduction kernel a static function.
Benoit Steiner
2016-05-24
|
*
Allow vectorized padding on GPU. This helps speed things up a little.
Benoit Steiner
2016-05-17
|
/
*
Turnon the new thread pool by default since it scales much better over multip...
Benoit Steiner
2016-05-13
*
Removed unnecessary thread synchronization
Benoit Steiner
2016-05-13
*
Fixed a typo in my previous commit
Benoit Steiner
2016-05-11
*
Fix potential race condition in the CUDA reduction code.
Benoit Steiner
2016-05-11
*
Properly gate the use of half2.
Benoit Steiner
2016-05-10
*
Small improvement to the full reduction of fp16
Benoit Steiner
2016-05-10
*
Simplified the reduction code a little.
Benoit Steiner
2016-05-10
*
Improved the performance of full reductions on GPU:
Benoit Steiner
2016-05-09
*
Don't crash when attempting to reduce empty tensors.
Benoit Steiner
2016-04-20
*
Fixed a compilation error with nvcc 7.
Benoit Steiner
2016-04-19
*
Simplified the code that launches cuda kernels.
Benoit Steiner
2016-04-19
*
Use numext::ceil instead of std::ceil
Benoit Steiner
2016-04-19
*
Fixed compilation warning
Benoit Steiner
2016-03-18
*
Improved the performance of large outer reductions on cuda
Benoit Steiner
2016-02-29
*
Made the signature of the inner and outer reducers consistent
Benoit Steiner
2016-02-29
*
Optimized the performance of narrow reductions on CUDA devices
Benoit Steiner
2016-02-29
*
Fixed a race condition that could affect some reductions on CUDA devices.
Benoit Steiner
2016-01-15
*
Use warp shuffles instead of shared memory access to speedup the inner reduct...
Benoit Steiner
2016-01-14
*
Fixed a boundary condition bug in the outer reduction kernel
Benoit Steiner
2016-01-14
*
Silenced a few compilation warnings.
Benoit Steiner
2016-01-11
*
Deleted unused variable.
Benoit Steiner
2016-01-11
*
Silenced a nvcc compilation warning
Benoit Steiner
2016-01-11
*
Silenced several compilation warnings triggered by nvcc.
Benoit Steiner
2016-01-11
*
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
Benoit Steiner
2016-01-11
|
\
*
|
Re-enabled the optimized reduction CUDA code.
Benoit Steiner
2016-01-11
|
*
Alternative way of forcing instantiation of device kernels without
Jeremy Barnes
2016-01-10
|
/
*
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintr...
Benoit Steiner
2016-01-08
*
Improved the performance of reductions on CUDA devices
Benoit Steiner
2016-01-04
*
Optimized the configuration of the outer reduction cuda kernel
Benoit Steiner
2015-12-22
*
Added missing define
Benoit Steiner
2015-12-22
[next]