Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | | | Fixed some compilation problems with nvcc + clang | 2016-01-27 | ||
| | | | ||||
| | * | Add constructor for long types. | 2016-01-26 | ||
| | | | ||||
* | | | Don't explicitely evaluate the subexpression from ↵ | 2016-01-24 | ||
| | | | | | | | | | | | | TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression | |||
* | | | Added missing EIGEN_DEVICE_FUNC qualifier | 2016-01-24 | ||
| | | | ||||
* | | | Merged in ville-k/eigen/tensorflow_fix (pull request PR-153) | 2016-01-22 | ||
|\ \ \ | | | | | | | | | | | | | Add ctor for long | |||
* | | | | Leverage the new blocking code in the tensor contraction code. | 2016-01-22 | ||
| |_|/ |/| | | ||||
* | | | Created a mechanism to enable contraction mappers to determine the best ↵ | 2016-01-22 | ||
| | | | | | | | | | | | | blocking strategy. | |||
* | | | Backout changeset 690bc950f70c61075d396671e63480bbd64bb297 | 2016-01-22 | ||
| | | | ||||
| * | | Update to latest default branch | 2016-01-21 | ||
| |\ \ | |/ / |/| | | ||||
* | | | Fixed a constness bug | 2016-01-21 | ||
| | | | ||||
* | | | fix clang warnings | 2016-01-20 | ||
| | | | | | | | | | | | | "braces around scalar initializer" | |||
* | | | Small cleanup and small fix to the contraction of row major tensors | 2016-01-20 | ||
| | | | ||||
* | | | Reduce the register pressure exerted by the tensor mappers whenever ↵ | 2016-01-20 | ||
| | | | | | | | | | | | | possible. This improves the performance of the contraction of a matrix with a vector by about 35%. | |||
| * | | Use explicitly 32 bit integer types in constructors. | 2016-01-19 | ||
| | | | ||||
* | | | Improved the formatting of the code | 2016-01-19 | ||
| | | | ||||
* | | | Moved the contraction mapping code to its own file to make the code more ↵ | 2016-01-19 | ||
| | | | | | | | | | | | | manageable. | |||
* | | | Improved code indentation | 2016-01-19 | ||
| | | | ||||
* | | | Record whether the underlying tensor storage can be accessed directly during ↵ | 2016-01-19 | ||
| | | | | | | | | | | | | the evaluation of an expression. | |||
| * | | Add ctor for long | 2016-01-17 | ||
| | | | ||||
* | | | Fixed a race condition that could affect some reductions on CUDA devices. | 2016-01-15 | ||
| | | | ||||
* | | | Made it possible to compare tensor dimensions inside a CUDA kernel. | 2016-01-15 | ||
| | | | ||||
* | | | Use warp shuffles instead of shared memory access to speedup the inner ↵ | 2016-01-14 | ||
| | | | | | | | | | | | | reduction kernel. | |||
* | | | Fixed a boundary condition bug in the outer reduction kernel | 2016-01-14 | ||
| | | | ||||
* | | | Properly record the rank of reduced tensors in the tensor traits. | 2016-01-13 | ||
| | | | ||||
* | | | Trigger the optimized matrix vector path more conservatively. | 2016-01-12 | ||
| | | | ||||
* | | | Improved the performance of the contraction of a 2d tensor with a 1d tensor ↵ | 2016-01-12 | ||
| | | | | | | | | | | | | by a factor of 3 or more. This helps speedup LSTM neural networks. | |||
* | | | Reverted a previous change that tripped nvcc when compiling in debug mode. | 2016-01-11 | ||
| | | | ||||
* | | | Silenced a few compilation warnings. | 2016-01-11 | ||
| | | | ||||
* | | | Updated the tensor traits: the alignment is not part of the Flags enum anymore | 2016-01-11 | ||
| | | | ||||
* | | | Enabled the use of fixed dimensions from within a cuda kernel. | 2016-01-11 | ||
| | | | ||||
* | | | Deleted unused variable. | 2016-01-11 | ||
| | | | ||||
* | | | Silenced a nvcc compilation warning | 2016-01-11 | ||
| | | | ||||
* | | | Silenced several compilation warnings triggered by nvcc. | 2016-01-11 | ||
| | | | ||||
* | | | Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) | 2016-01-11 | ||
|\ \ \ | | | | | | | | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. | |||
* | | | | Fixed a bug in the dispatch of optimized reduction kernels. | 2016-01-11 | ||
| | | | | ||||
* | | | | Re-enabled the optimized reduction CUDA code. | 2016-01-11 | ||
| | | | | ||||
| * | | | Cleaned up double-defined macro from last commit | 2016-01-10 | ||
| | | | | ||||
| * | | | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | ||
|/ / / | | | | | | | | | | | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | |||
* | | | Simplified the dispatch code. | 2016-01-08 | ||
| | | | ||||
* | | | Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵ | 2016-01-08 | ||
| | | | | | | | | | | | | nvcc bug that prevented the code from compiling in optimized mode in some cases | |||
* | | | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | ||
| | | | | | | | | | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | |||
* | | | Fixed a typo. | 2016-01-06 | ||
| | | | ||||
* | | | Optimized the performance of broadcasting of scalars. | 2016-01-06 | ||
| | | | ||||
* | | | Improved the performance of reductions on CUDA devices | 2016-01-04 | ||
| | | | ||||
* | | | Added a 'divup' util to compute the floor of the quotient of two integers | 2016-01-04 | ||
|/ / | ||||
* | | Add missing ctor from uint | 2015-12-30 | ||
| | | ||||
| * | Add digamma for CPU + CUDA. Includes tests. | 2015-12-24 | ||
|/ | ||||
* | Don't attempt to vectorize mean reductions of integers since we can't use | 2015-12-22 | ||
| | | | | SSE or AVX instructions to divide 2 integers. | |||
* | Optimized the configuration of the outer reduction cuda kernel | 2015-12-22 | ||
| | ||||
* | Added missing define | 2015-12-22 | ||
| |