Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Add deprecated header files for TensorFlow | 2018-07-12 | |
| | |||
* | renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and ↵ | 2018-06-20 | |
| | | | | unsupported/test directories | ||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | 2017-07-17 | |
| | | | | aliases | ||
* | Silenced clang compilation warning. | 2017-02-28 | |
| | |||
* | Introduce a portable EIGEN_SLEEP macro. | 2016-12-09 | |
| | |||
* | Made TensorDeviceCuda.h compile on windows | 2016-11-17 | |
| | |||
* | Made the initialization of a CUDA device thread safe. | 2016-09-26 | |
| | |||
* | Deleted some unecessary and confusing EIGEN_DEVICE_FUNC | 2016-09-19 | |
| | |||
* | Improved the performance of full reductions. | 2016-06-03 | |
| | | | | | | | | | | | | | | | | AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s | ||
* | Fixed compilation warning | 2016-05-24 | |
| | |||
* | Added the ability to use a scratch buffer in cuda kernels | 2016-05-09 | |
| | |||
* | Simplified the code that launches cuda kernels. | 2016-04-19 | |
| | |||
* | Don't take the address of a kernel on CUDA devices that don't support this ↵ | 2016-04-19 | |
| | | | | feature. | ||
* | Print some information to stderr when a CUDA kernel fails | 2016-02-27 | |
| | |||
* | Print an error message to stderr when the initialization of the CUDA runtime ↵ | 2016-02-19 | |
| | | | | fails. This helps debugging setup issues. | ||
* | Added the ability to query the minor version of a cuda device | 2016-02-19 | |
| | |||
* | Made it possible to limit the number of blocks that will be used to evaluate ↵ | 2016-02-01 | |
| | | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. | ||
* | Silenced a few compilation warnings. | 2016-01-11 | |
| | |||
* | Silenced several compilation warnings triggered by nvcc. | 2016-01-11 | |
| | |||
* | Cleaned up double-defined macro from last commit | 2016-01-10 | |
| | |||
* | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | |
| | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | ||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | |
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | ||
* | Silenced some compilation warnings triggered by nvcc | 2015-12-17 | |
| | |||
* | Made it possible to refer t oa GPUDevice from code compile with a regular ↵ | 2015-11-23 | |
| | | | | C++ compiler | ||
* | Split TensorDeviceType.h in 3 files to make it more manageable | 2015-11-20 | |