Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fixed minor typo in SplineFitting. | 2016-01-25 | |
| | |||
* | Don't explicitely evaluate the subexpression from ↵ | 2016-01-24 | |
| | | | | TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression | ||
* | Added missing EIGEN_DEVICE_FUNC qualifier | 2016-01-24 | |
| | |||
* | Merged in ville-k/eigen/tensorflow_fix (pull request PR-153) | 2016-01-22 | |
|\ | | | | | | | Add ctor for long | ||
| * | Re-add executable flags to minimize changeset. | 2016-01-22 | |
| | | |||
* | | Leverage the new blocking code in the tensor contraction code. | 2016-01-22 | |
| | | |||
* | | Created a mechanism to enable contraction mappers to determine the best ↵ | 2016-01-22 | |
| | | | | | | | | blocking strategy. | ||
* | | Backout changeset 690bc950f70c61075d396671e63480bbd64bb297 | 2016-01-22 | |
| | | |||
| * | Update to latest default branch | 2016-01-21 | |
| |\ | |/ |/| | |||
* | | Fixed a constness bug | 2016-01-21 | |
| | | |||
* | | fix clang warnings | 2016-01-20 | |
| | | | | | | | | "braces around scalar initializer" | ||
* | | Small cleanup and small fix to the contraction of row major tensors | 2016-01-20 | |
| | | |||
* | | Reduce the register pressure exerted by the tensor mappers whenever ↵ | 2016-01-20 | |
| | | | | | | | | possible. This improves the performance of the contraction of a matrix with a vector by about 35%. | ||
| * | Remove executable bit from header files | 2016-01-19 | |
| | | |||
| * | Use explicitly 32 bit integer types in constructors. | 2016-01-19 | |
| | | |||
* | | Improved the formatting of the code | 2016-01-19 | |
| | | |||
* | | Moved the contraction mapping code to its own file to make the code more ↵ | 2016-01-19 | |
| | | | | | | | | manageable. | ||
* | | Improved code indentation | 2016-01-19 | |
| | | |||
* | | Record whether the underlying tensor storage can be accessed directly during ↵ | 2016-01-19 | |
| | | | | | | | | the evaluation of an expression. | ||
| * | Add ctor for long | 2016-01-17 | |
| | | |||
* | | Fixed a race condition that could affect some reductions on CUDA devices. | 2016-01-15 | |
| | | |||
* | | Made it possible to compare tensor dimensions inside a CUDA kernel. | 2016-01-15 | |
| | | |||
* | | Use warp shuffles instead of shared memory access to speedup the inner ↵ | 2016-01-14 | |
| | | | | | | | | reduction kernel. | ||
* | | Fixed a boundary condition bug in the outer reduction kernel | 2016-01-14 | |
| | | |||
* | | Properly record the rank of reduced tensors in the tensor traits. | 2016-01-13 | |
| | | |||
* | | Trigger the optimized matrix vector path more conservatively. | 2016-01-12 | |
| | | |||
* | | Improved the performance of the contraction of a 2d tensor with a 1d tensor ↵ | 2016-01-12 | |
| | | | | | | | | by a factor of 3 or more. This helps speedup LSTM neural networks. | ||
* | | Reverted a previous change that tripped nvcc when compiling in debug mode. | 2016-01-11 | |
| | | |||
* | | Silenced a few compilation warnings. | 2016-01-11 | |
| | | |||
* | | Updated the tensor traits: the alignment is not part of the Flags enum anymore | 2016-01-11 | |
| | | |||
* | | Enabled the use of fixed dimensions from within a cuda kernel. | 2016-01-11 | |
| | | |||
* | | Deleted unused variable. | 2016-01-11 | |
| | | |||
* | | Silenced a nvcc compilation warning | 2016-01-11 | |
| | | |||
* | | Silenced several compilation warnings triggered by nvcc. | 2016-01-11 | |
| | | |||
* | | Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) | 2016-01-11 | |
|\ \ | | | | | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. | ||
* | | | Fixed a bug in the dispatch of optimized reduction kernels. | 2016-01-11 | |
| | | | |||
* | | | Re-enabled the optimized reduction CUDA code. | 2016-01-11 | |
| | | | |||
| * | | Cleaned up double-defined macro from last commit | 2016-01-10 | |
| | | | |||
| * | | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | |
|/ / | | | | | | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | ||
* | | Simplified the dispatch code. | 2016-01-08 | |
| | | |||
* | | Made it possible to use array of size 0 on CUDA devices | 2016-01-08 | |
| | | |||
* | | Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵ | 2016-01-08 | |
| | | | | | | | | nvcc bug that prevented the code from compiling in optimized mode in some cases | ||
* | | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | |
| | | | | | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | ||
* | | Removed a couple of partial specialization that confuse nvcc and result in ↵ | 2016-01-07 | |
| | | | | | | | | | | | | | | | | errors such as this: error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>" "Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>" | ||
* | | Fixed a typo. | 2016-01-06 | |
| | | |||
* | | Optimized the performance of broadcasting of scalars. | 2016-01-06 | |
| | | |||
* | | Improved the performance of reductions on CUDA devices | 2016-01-04 | |
| | | |||
* | | Added a 'divup' util to compute the floor of the quotient of two integers | 2016-01-04 | |
| | | |||
* | | Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation ↵ | 2016-01-01 | |
|/ | | | | warnings | ||
* | Add missing ctor from uint | 2015-12-30 | |
| |