Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Add deprecated header files for TensorFlow | Gael Guennebaud | 2018-07-12 |
| | |||
* | renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and ↵ | Deven Desai | 2018-06-20 |
| | | | | unsupported/test directories | ||
* | Added support for CUDA 9.0. | Benoit Steiner | 2017-08-31 |
| | |||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | Gael Guennebaud | 2017-07-17 |
| | | | | aliases | ||
* | Add labels to #ifdef, in TensorReductionCuda.h | Hugh Perkins | 2017-06-06 |
| | |||
* | Made the Tensor code compile with clang 3.9 | Benoit Steiner | 2017-03-02 |
| | |||
* | Fix remaining CUDA >= 300 checks | Igor Babuschkin | 2016-08-18 |
| | |||
* | Add the necessary CUDA >= 300 checks back | Igor Babuschkin | 2016-08-18 |
| | |||
* | Remove CUDA >= 300 checks and enable outer reductin for doubles | Igor Babuschkin | 2016-08-06 |
| | |||
* | Make use of atomicExch for atomicExchCustom | Igor Babuschkin | 2016-08-05 |
| | |||
* | Enable efficient Tensor reduction for doubles | Igor Babuschkin | 2016-07-01 |
| | |||
* | Made it possible to compile reductions for an old cuda architecture and run ↵ | Benoit Steiner | 2016-06-29 |
| | | | | them on a recent gpu. | ||
* | Made the code compile when using CUDA architecture < 300 | Benoit Steiner | 2016-06-29 |
| | |||
* | Simplified the code that dispatches vectorized reductions on GPU | Benoit Steiner | 2016-06-09 |
| | |||
* | Improved support for vectorization of 16-bit floats | Benoit Steiner | 2016-06-09 |
| | |||
* | Misc small improvements to the reduction code. | Benoit Steiner | 2016-06-06 |
| | |||
* | Improved the performance of full reductions. | Benoit Steiner | 2016-06-03 |
| | | | | | | | | | | | | | | | | AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s | ||
* | Silenced compilation warning generated by nvcc. | Benoit Steiner | 2016-06-01 |
| | |||
* | Added support for mean reductions on fp16 | Benoit Steiner | 2016-06-01 |
| | |||
* | Only enable optimized reductions of fp16 if the reduction functor supports them | Benoit Steiner | 2016-05-31 |
| | |||
* | Resolved merge conflicts | Benoit Steiner | 2016-05-26 |
| | |||
* | Merged latest reduction improvements | Benoit Steiner | 2016-05-26 |
|\ | |||
* | | Improved the performance of inner reductions. | Benoit Steiner | 2016-05-26 |
| | | |||
* | | There is no need to make the fp16 full reduction kernel a static function. | Benoit Steiner | 2016-05-24 |
| | | |||
| * | Allow vectorized padding on GPU. This helps speed things up a little. | Benoit Steiner | 2016-05-17 |
|/ | | | | | | | | | | | | | Before: BM_padding/10 5000000 460 217.03 MFlops/s BM_padding/80 5000000 460 13899.40 MFlops/s BM_padding/640 5000000 461 888421.17 MFlops/s BM_padding/4K 5000000 460 54316322.55 MFlops/s After: BM_padding/10 5000000 454 220.20 MFlops/s BM_padding/80 5000000 455 14039.86 MFlops/s BM_padding/640 5000000 452 904968.83 MFlops/s BM_padding/4K 5000000 411 60750049.21 MFlops/s | ||
* | Turnon the new thread pool by default since it scales much better over ↵ | Benoit Steiner | 2016-05-13 |
| | | | | multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define. | ||
* | Removed unnecessary thread synchronization | Benoit Steiner | 2016-05-13 |
| | |||
* | Fixed a typo in my previous commit | Benoit Steiner | 2016-05-11 |
| | |||
* | Fix potential race condition in the CUDA reduction code. | Benoit Steiner | 2016-05-11 |
| | |||
* | Properly gate the use of half2. | Benoit Steiner | 2016-05-10 |
| | |||
* | Small improvement to the full reduction of fp16 | Benoit Steiner | 2016-05-10 |
| | |||
* | Simplified the reduction code a little. | Benoit Steiner | 2016-05-10 |
| | |||
* | Improved the performance of full reductions on GPU: | Benoit Steiner | 2016-05-09 |
| | | | | | | | | | | | | | | Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s | ||
* | Don't crash when attempting to reduce empty tensors. | Benoit Steiner | 2016-04-20 |
| | |||
* | Fixed a compilation error with nvcc 7. | Benoit Steiner | 2016-04-19 |
| | |||
* | Simplified the code that launches cuda kernels. | Benoit Steiner | 2016-04-19 |
| | |||
* | Use numext::ceil instead of std::ceil | Benoit Steiner | 2016-04-19 |
| | |||
* | Fixed compilation warning | Benoit Steiner | 2016-03-18 |
| | |||
* | Improved the performance of large outer reductions on cuda | Benoit Steiner | 2016-02-29 |
| | |||
* | Made the signature of the inner and outer reducers consistent | Benoit Steiner | 2016-02-29 |
| | |||
* | Optimized the performance of narrow reductions on CUDA devices | Benoit Steiner | 2016-02-29 |
| | |||
* | Fixed a race condition that could affect some reductions on CUDA devices. | Benoit Steiner | 2016-01-15 |
| | |||
* | Use warp shuffles instead of shared memory access to speedup the inner ↵ | Benoit Steiner | 2016-01-14 |
| | | | | reduction kernel. | ||
* | Fixed a boundary condition bug in the outer reduction kernel | Benoit Steiner | 2016-01-14 |
| | |||
* | Silenced a few compilation warnings. | Benoit Steiner | 2016-01-11 |
| | |||
* | Deleted unused variable. | Benoit Steiner | 2016-01-11 |
| | |||
* | Silenced a nvcc compilation warning | Benoit Steiner | 2016-01-11 |
| | |||
* | Silenced several compilation warnings triggered by nvcc. | Benoit Steiner | 2016-01-11 |
| | |||
* | Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) | Benoit Steiner | 2016-01-11 |
|\ | | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. | ||
* | | Re-enabled the optimized reduction CUDA code. | Benoit Steiner | 2016-01-11 |
| | |