aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceCuda.h
Commit message (Collapse)AuthorAge
* Add deprecated header files for TensorFlowGravatar Gael Guennebaud2018-07-12
|
* renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and ↵Gravatar Deven Desai2018-06-20
| | | | unsupported/test directories
* Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵Gravatar Gael Guennebaud2017-07-17
| | | | aliases
* Silenced clang compilation warning.Gravatar Benoit Steiner2017-02-28
|
* Introduce a portable EIGEN_SLEEP macro.Gravatar Benoit Steiner2016-12-09
|
* Made TensorDeviceCuda.h compile on windowsGravatar Benoit Steiner2016-11-17
|
* Made the initialization of a CUDA device thread safe.Gravatar Benoit Steiner2016-09-26
|
* Deleted some unecessary and confusing EIGEN_DEVICE_FUNCGravatar Benoit Steiner2016-09-19
|
* Improved the performance of full reductions.Gravatar Benoit Steiner2016-06-03
| | | | | | | | | | | | | | | | AFTER: BM_fullReduction/10 4541 4543 154017 21.0M items/s BM_fullReduction/64 5191 5193 100000 752.5M items/s BM_fullReduction/512 9588 9588 71361 25.5G items/s BM_fullReduction/4k 244314 244281 2863 64.0G items/s BM_fullReduction/5k 359382 359363 1946 64.8G items/s BEFORE: BM_fullReduction/10 9085 9087 74395 10.5M items/s BM_fullReduction/64 9478 9478 72014 412.1M items/s BM_fullReduction/512 14643 14646 46902 16.7G items/s BM_fullReduction/4k 260338 260384 2678 60.0G items/s BM_fullReduction/5k 385076 385178 1818 60.5G items/s
* Fixed compilation warningGravatar Benoit Steiner2016-05-24
|
* Added the ability to use a scratch buffer in cuda kernelsGravatar Benoit Steiner2016-05-09
|
* Simplified the code that launches cuda kernels.Gravatar Benoit Steiner2016-04-19
|
* Don't take the address of a kernel on CUDA devices that don't support this ↵Gravatar Benoit Steiner2016-04-19
| | | | feature.
* Print some information to stderr when a CUDA kernel failsGravatar Benoit Steiner2016-02-27
|
* Print an error message to stderr when the initialization of the CUDA runtime ↵Gravatar Benoit Steiner2016-02-19
| | | | fails. This helps debugging setup issues.
* Added the ability to query the minor version of a cuda deviceGravatar Benoit Steiner2016-02-19
|
* Made it possible to limit the number of blocks that will be used to evaluate ↵Gravatar Benoit Steiner2016-02-01
| | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
* Silenced a few compilation warnings.Gravatar Benoit Steiner2016-01-11
|
* Silenced several compilation warnings triggered by nvcc.Gravatar Benoit Steiner2016-01-11
|
* Cleaned up double-defined macro from last commitGravatar Jeremy Barnes2016-01-10
|
* Alternative way of forcing instantiation of device kernels withoutGravatar Jeremy Barnes2016-01-10
| | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
* Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵Gravatar Benoit Steiner2016-01-08
| | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
* Silenced some compilation warnings triggered by nvccGravatar Benoit Steiner2015-12-17
|
* Made it possible to refer t oa GPUDevice from code compile with a regular ↵Gravatar Benoit Steiner2015-11-23
| | | | C++ compiler
* Split TensorDeviceType.h in 3 files to make it more manageableGravatar Benoit Steiner2015-11-20