aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen
Commit message (Collapse)AuthorAge
* Moved the contraction mapping code to its own file to make the code more ↵Gravatar Benoit Steiner2016-01-19
| | | | manageable.
* Improved code indentationGravatar Benoit Steiner2016-01-19
|
* Record whether the underlying tensor storage can be accessed directly during ↵Gravatar Benoit Steiner2016-01-19
| | | | the evaluation of an expression.
* Fixed a race condition that could affect some reductions on CUDA devices.Gravatar Benoit Steiner2016-01-15
|
* Made it possible to compare tensor dimensions inside a CUDA kernel.Gravatar Benoit Steiner2016-01-15
|
* Use warp shuffles instead of shared memory access to speedup the inner ↵Gravatar Benoit Steiner2016-01-14
| | | | reduction kernel.
* Fixed a boundary condition bug in the outer reduction kernelGravatar Benoit Steiner2016-01-14
|
* Properly record the rank of reduced tensors in the tensor traits.Gravatar Benoit Steiner2016-01-13
|
* Trigger the optimized matrix vector path more conservatively.Gravatar Benoit Steiner2016-01-12
|
* Improved the performance of the contraction of a 2d tensor with a 1d tensor ↵Gravatar Benoit Steiner2016-01-12
| | | | by a factor of 3 or more. This helps speedup LSTM neural networks.
* Reverted a previous change that tripped nvcc when compiling in debug mode.Gravatar Benoit Steiner2016-01-11
|
* Silenced a few compilation warnings.Gravatar Benoit Steiner2016-01-11
|
* Updated the tensor traits: the alignment is not part of the Flags enum anymoreGravatar Benoit Steiner2016-01-11
|
* Enabled the use of fixed dimensions from within a cuda kernel.Gravatar Benoit Steiner2016-01-11
|
* Deleted unused variable.Gravatar Benoit Steiner2016-01-11
|
* Silenced a nvcc compilation warningGravatar Benoit Steiner2016-01-11
|
* Silenced several compilation warnings triggered by nvcc.Gravatar Benoit Steiner2016-01-11
|
* Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)Gravatar Benoit Steiner2016-01-11
|\ | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
* | Fixed a bug in the dispatch of optimized reduction kernels.Gravatar Benoit Steiner2016-01-11
| |
* | Re-enabled the optimized reduction CUDA code.Gravatar Benoit Steiner2016-01-11
| |
| * Cleaned up double-defined macro from last commitGravatar Jeremy Barnes2016-01-10
| |
| * Alternative way of forcing instantiation of device kernels withoutGravatar Jeremy Barnes2016-01-10
|/ | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
* Simplified the dispatch code.Gravatar Benoit Steiner2016-01-08
|
* Made it possible to use array of size 0 on CUDA devicesGravatar Benoit Steiner2016-01-08
|
* Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵Gravatar Benoit Steiner2016-01-08
| | | | nvcc bug that prevented the code from compiling in optimized mode in some cases
* Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵Gravatar Benoit Steiner2016-01-08
| | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
* Removed a couple of partial specialization that confuse nvcc and result in ↵Gravatar Benoit Steiner2016-01-07
| | | | | | | | errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
* Fixed a typo.Gravatar Benoit Steiner2016-01-06
|
* Optimized the performance of broadcasting of scalars.Gravatar Benoit Steiner2016-01-06
|
* Improved the performance of reductions on CUDA devicesGravatar Benoit Steiner2016-01-04
|
* Added a 'divup' util to compute the floor of the quotient of two integersGravatar Benoit Steiner2016-01-04
|
* Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation ↵Gravatar Gael Guennebaud2016-01-01
| | | | warnings
* Add missing ctor from uintGravatar Gael Guennebaud2015-12-30
|
* Don't attempt to vectorize mean reductions of integers since we can't useGravatar Benoit Steiner2015-12-22
| | | | SSE or AVX instructions to divide 2 integers.
* Optimized the configuration of the outer reduction cuda kernelGravatar Benoit Steiner2015-12-22
|
* Added missing defineGravatar Benoit Steiner2015-12-22
|
* Made sure the optimized gpu reduction code is actually compiled.Gravatar Benoit Steiner2015-12-22
|
* Optimized outer reduction on GPUs.Gravatar Benoit Steiner2015-12-22
|
* Added missing constGravatar Benoit Steiner2015-12-21
|
* Add alignment requirement for local buffer used by the slicing op.Gravatar Benoit Steiner2015-12-18
|
* Doubled the speed of full reductions on GPUs.Gravatar Benoit Steiner2015-12-18
|
* Fixed a clang compilation warning triggered by the use of arrays of size 0.Gravatar Benoit Steiner2015-12-17
|
* Silenced some compilation warnings triggered by nvccGravatar Benoit Steiner2015-12-17
|
* Made it possible to run tensor chipping operations on CUDA devicesGravatar Benoit Steiner2015-12-17
|
* Fixed some compilation error triggered by the tensor code with msvc 2008Gravatar Benoit Steiner2015-12-16
|
* Disable AutoDiffScalar generic copy ctor for non compatible scalar types ↵Gravatar Gael Guennebaud2015-12-16
| | | | (fix ambiguous template instantiation)
* Made the entire TensorFixedSize api callable from a CUDA kernel.Gravatar Benoit Steiner2015-12-14
|
* Marked the tensor constructors as EIGEN_DEVICE_FUNC: This makes it possible ↵Gravatar Benoit Steiner2015-12-14
| | | | to call them from a CUDA kernel.
* Merged in ebrevdo/eigen (pull request PR-148)Gravatar Gael Guennebaud2015-12-11
|\ | | | | | | Add special functions to eigen: lgamma, erf, erfc.
* | Fixed a typo in the constructor of tensors of rank 5.Gravatar Benoit Steiner2015-12-10
| |