Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Add deprecated header files for TensorFlow | Gael Guennebaud | 2018-07-12 |
| | |||
* | renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and ↵ | Deven Desai | 2018-06-20 |
| | | | | unsupported/test directories | ||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | Gael Guennebaud | 2017-07-17 |
| | | | | aliases | ||
* | Silenced clang compilation warning. | Benoit Steiner | 2017-02-28 |
| | |||
* | Introduce a portable EIGEN_SLEEP macro. | Benoit Steiner | 2016-12-09 |
| | |||
* | Made TensorDeviceCuda.h compile on windows | Benoit Steiner | 2016-11-17 |
| | |||
* | Made the initialization of a CUDA device thread safe. | Benoit Steiner | 2016-09-26 |
| | |||
* | Deleted some unecessary and confusing EIGEN_DEVICE_FUNC | Benoit Steiner | 2016-09-19 |
| | |||
* | Improved the performance of full reductions. | Benoit Steiner | 2016-06-03 |
| | | | | | | | | | | | | | | | | AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s | ||
* | Fixed compilation warning | Benoit Steiner | 2016-05-24 |
| | |||
* | Added the ability to use a scratch buffer in cuda kernels | Benoit Steiner | 2016-05-09 |
| | |||
* | Simplified the code that launches cuda kernels. | Benoit Steiner | 2016-04-19 |
| | |||
* | Don't take the address of a kernel on CUDA devices that don't support this ↵ | Benoit Steiner | 2016-04-19 |
| | | | | feature. | ||
* | Print some information to stderr when a CUDA kernel fails | Benoit Steiner | 2016-02-27 |
| | |||
* | Print an error message to stderr when the initialization of the CUDA runtime ↵ | Benoit Steiner | 2016-02-19 |
| | | | | fails. This helps debugging setup issues. | ||
* | Added the ability to query the minor version of a cuda device | Benoit Steiner | 2016-02-19 |
| | |||
* | Made it possible to limit the number of blocks that will be used to evaluate ↵ | Benoit Steiner | 2016-02-01 |
| | | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. | ||
* | Silenced a few compilation warnings. | Benoit Steiner | 2016-01-11 |
| | |||
* | Silenced several compilation warnings triggered by nvcc. | Benoit Steiner | 2016-01-11 |
| | |||
* | Cleaned up double-defined macro from last commit | Jeremy Barnes | 2016-01-10 |
| | |||
* | Alternative way of forcing instantiation of device kernels without | Jeremy Barnes | 2016-01-10 |
| | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | ||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | Benoit Steiner | 2016-01-08 |
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | ||
* | Silenced some compilation warnings triggered by nvcc | Benoit Steiner | 2015-12-17 |
| | |||
* | Made it possible to refer t oa GPUDevice from code compile with a regular ↵ | Benoit Steiner | 2015-11-23 |
| | | | | C++ compiler | ||
* | Split TensorDeviceType.h in 3 files to make it more manageable | Benoit Steiner | 2015-11-20 |