aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h
Commit message (Expand)AuthorAge
* Made the Tensor code compile with clang 3.9Gravatar Benoit Steiner2017-03-02
* Fix remaining CUDA >= 300 checksGravatar Igor Babuschkin2016-08-18
* Add the necessary CUDA >= 300 checks backGravatar Igor Babuschkin2016-08-18
* Remove CUDA >= 300 checks and enable outer reductin for doublesGravatar Igor Babuschkin2016-08-06
* Make use of atomicExch for atomicExchCustomGravatar Igor Babuschkin2016-08-05
* Enable efficient Tensor reduction for doublesGravatar Igor Babuschkin2016-07-01
* Made it possible to compile reductions for an old cuda architecture and run t...Gravatar Benoit Steiner2016-06-29
* Made the code compile when using CUDA architecture < 300Gravatar Benoit Steiner2016-06-29
* Simplified the code that dispatches vectorized reductions on GPUGravatar Benoit Steiner2016-06-09
* Improved support for vectorization of 16-bit floatsGravatar Benoit Steiner2016-06-09
* Misc small improvements to the reduction code.Gravatar Benoit Steiner2016-06-06
* Improved the performance of full reductions.Gravatar Benoit Steiner2016-06-03
* Silenced compilation warning generated by nvcc.Gravatar Benoit Steiner2016-06-01
* Added support for mean reductions on fp16Gravatar Benoit Steiner2016-06-01
* Only enable optimized reductions of fp16 if the reduction functor supports themGravatar Benoit Steiner2016-05-31
* Resolved merge conflictsGravatar Benoit Steiner2016-05-26
* Merged latest reduction improvementsGravatar Benoit Steiner2016-05-26
|\
* | Improved the performance of inner reductions.Gravatar Benoit Steiner2016-05-26
* | There is no need to make the fp16 full reduction kernel a static function.Gravatar Benoit Steiner2016-05-24
| * Allow vectorized padding on GPU. This helps speed things up a little.Gravatar Benoit Steiner2016-05-17
|/
* Turnon the new thread pool by default since it scales much better over multip...Gravatar Benoit Steiner2016-05-13
* Removed unnecessary thread synchronizationGravatar Benoit Steiner2016-05-13
* Fixed a typo in my previous commitGravatar Benoit Steiner2016-05-11
* Fix potential race condition in the CUDA reduction code.Gravatar Benoit Steiner2016-05-11
* Properly gate the use of half2.Gravatar Benoit Steiner2016-05-10
* Small improvement to the full reduction of fp16Gravatar Benoit Steiner2016-05-10
* Simplified the reduction code a little.Gravatar Benoit Steiner2016-05-10
* Improved the performance of full reductions on GPU:Gravatar Benoit Steiner2016-05-09
* Don't crash when attempting to reduce empty tensors.Gravatar Benoit Steiner2016-04-20
* Fixed a compilation error with nvcc 7.Gravatar Benoit Steiner2016-04-19
* Simplified the code that launches cuda kernels.Gravatar Benoit Steiner2016-04-19
* Use numext::ceil instead of std::ceilGravatar Benoit Steiner2016-04-19
* Fixed compilation warningGravatar Benoit Steiner2016-03-18
* Improved the performance of large outer reductions on cudaGravatar Benoit Steiner2016-02-29
* Made the signature of the inner and outer reducers consistentGravatar Benoit Steiner2016-02-29
* Optimized the performance of narrow reductions on CUDA devicesGravatar Benoit Steiner2016-02-29
* Fixed a race condition that could affect some reductions on CUDA devices.Gravatar Benoit Steiner2016-01-15
* Use warp shuffles instead of shared memory access to speedup the inner reduct...Gravatar Benoit Steiner2016-01-14
* Fixed a boundary condition bug in the outer reduction kernelGravatar Benoit Steiner2016-01-14
* Silenced a few compilation warnings.Gravatar Benoit Steiner2016-01-11
* Deleted unused variable.Gravatar Benoit Steiner2016-01-11
* Silenced a nvcc compilation warningGravatar Benoit Steiner2016-01-11
* Silenced several compilation warnings triggered by nvcc.Gravatar Benoit Steiner2016-01-11
* Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)Gravatar Benoit Steiner2016-01-11
|\
* | Re-enabled the optimized reduction CUDA code.Gravatar Benoit Steiner2016-01-11
| * Alternative way of forcing instantiation of device kernels withoutGravatar Jeremy Barnes2016-01-10
|/
* Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintr...Gravatar Benoit Steiner2016-01-08
* Improved the performance of reductions on CUDA devicesGravatar Benoit Steiner2016-01-04
* Optimized the configuration of the outer reduction cuda kernelGravatar Benoit Steiner2015-12-22
* Added missing defineGravatar Benoit Steiner2015-12-22