Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Enabled the use of fixed dimensions from within a cuda kernel. | 2016-01-11 | |
| | |||
* | Deleted unused variable. | 2016-01-11 | |
| | |||
* | Silenced a nvcc compilation warning | 2016-01-11 | |
| | |||
* | Silenced several compilation warnings triggered by nvcc. | 2016-01-11 | |
| | |||
* | Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) | 2016-01-11 | |
|\ | | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. | ||
* | | Fixed a bug in the dispatch of optimized reduction kernels. | 2016-01-11 | |
| | | |||
* | | Re-enabled the optimized reduction CUDA code. | 2016-01-11 | |
| | | |||
| * | Cleaned up double-defined macro from last commit | 2016-01-10 | |
| | | |||
| * | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | |
|/ | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | ||
* | Simplified the dispatch code. | 2016-01-08 | |
| | |||
* | Made it possible to use array of size 0 on CUDA devices | 2016-01-08 | |
| | |||
* | Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵ | 2016-01-08 | |
| | | | | nvcc bug that prevented the code from compiling in optimized mode in some cases | ||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | |
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | ||
* | Removed a couple of partial specialization that confuse nvcc and result in ↵ | 2016-01-07 | |
| | | | | | | | | errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>" | ||
* | Fixed a typo. | 2016-01-06 | |
| | |||
* | Optimized the performance of broadcasting of scalars. | 2016-01-06 | |
| | |||
* | Improved the performance of reductions on CUDA devices | 2016-01-04 | |
| | |||
* | Added a 'divup' util to compute the floor of the quotient of two integers | 2016-01-04 | |
| | |||
* | Add missing ctor from uint | 2015-12-30 | |
| | |||
* | Don't attempt to vectorize mean reductions of integers since we can't use | 2015-12-22 | |
| | | | | SSE or AVX instructions to divide 2 integers. | ||
* | Optimized the configuration of the outer reduction cuda kernel | 2015-12-22 | |
| | |||
* | Added missing define | 2015-12-22 | |
| | |||
* | Made sure the optimized gpu reduction code is actually compiled. | 2015-12-22 | |
| | |||
* | Optimized outer reduction on GPUs. | 2015-12-22 | |
| | |||
* | Added missing const | 2015-12-21 | |
| | |||
* | Add alignment requirement for local buffer used by the slicing op. | 2015-12-18 | |
| | |||
* | Doubled the speed of full reductions on GPUs. | 2015-12-18 | |
| | |||
* | Fixed a clang compilation warning triggered by the use of arrays of size 0. | 2015-12-17 | |
| | |||
* | Silenced some compilation warnings triggered by nvcc | 2015-12-17 | |
| | |||
* | Made it possible to run tensor chipping operations on CUDA devices | 2015-12-17 | |
| | |||
* | Made the entire TensorFixedSize api callable from a CUDA kernel. | 2015-12-14 | |
| | |||
* | Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible ↵ | 2015-12-14 | |
| | | | | to call them from a CUDA kernel. | ||
* | Merged in ebrevdo/eigen (pull request PR-148) | 2015-12-11 | |
|\ | | | | | | | Add special functions to eigen: lgamma, erf, erfc. | ||
* | | Fixed a typo in the constructor of tensors of rank 5. | 2015-12-10 | |
| | | |||
* | | Fixed the coefficient accessors use for the 2d and 3d case when compiling ↵ | 2015-12-10 | |
| | | | | | | | | without cxx11 support. | ||
| * | Add special functions to Eigen: lgamma, erf, erfc. | 2015-12-07 | |
| | | | | | | | | Includes CUDA support and unit tests. | ||
* | | Fixed another compilation warning | 2015-12-07 | |
|/ | |||
* | Fixed compilation warnings | 2015-12-07 | |
| | |||
* | Use signed integers instead of unsigned ones more consistently in the codebase. | 2015-12-04 | |
| | |||
* | Use integers instead of std::size_t to encode the number of dimensions in ↵ | 2015-12-04 | |
| | | | | the Tensor class since most of the code currently already use integers. | ||
* | Made it possible to use the sigmoid functor within a CUDA kernel. | 2015-12-04 | |
| | |||
* | Deleted redundant code | 2015-12-03 | |
| | |||
* | added scalar_sign_op (both real,complex) | 2015-11-24 | |
| | |||
* | Fixed a bug in TensorArgMax.h | 2015-11-23 | |
| | |||
* | Fixed the implementation of Eigen::internal::count_leading_zeros for MSVC. | 2015-11-23 | |
| | | | | Also updated the code to silence bogux warnings generated by nvcc when compilining this function. | ||
* | Don't create more cuda blocks than necessary | 2015-11-23 | |
| | |||
* | Made it possible to refer t oa GPUDevice from code compile with a regular ↵ | 2015-11-23 | |
| | | | | C++ compiler | ||
* | Deleted unused variable. | 2015-11-23 | |
| | |||
* | Split TensorDeviceType.h in 3 files to make it more manageable | 2015-11-20 | |
| | |||
* | Added option to force the usage of the Eigen array class instead of the ↵ | 2015-11-20 | |
| | | | | std::array class. |