aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h
Commit message (Collapse)AuthorAge
...
| * Alternative way of forcing instantiation of device kernels withoutGravatar Jeremy Barnes2016-01-10
|/ | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
* Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵Gravatar Benoit Steiner2016-01-08
| | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
* Improved the performance of reductions on CUDA devicesGravatar Benoit Steiner2016-01-04
|
* Optimized the configuration of the outer reduction cuda kernelGravatar Benoit Steiner2015-12-22
|
* Added missing defineGravatar Benoit Steiner2015-12-22
|
* Made sure the optimized gpu reduction code is actually compiled.Gravatar Benoit Steiner2015-12-22
|
* Optimized outer reduction on GPUs.Gravatar Benoit Steiner2015-12-22
|
* Doubled the speed of full reductions on GPUs.Gravatar Benoit Steiner2015-12-18
|
* Code cleanupGravatar Benoit Steiner2015-11-06