Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵ | 2016-04-14 | |
| | | | | and reductions. The cost model is turned off by default. | ||
* | Eigen cost model part 1. This implements a basic recursive framework to ↵ | 2016-04-14 | |
| | | | | estimate the cost of evaluating tensor expressions. | ||
* | Fixed compilation warnings on arm | 2016-03-28 | |
| | |||
* | Avoid unnecessary conversions | 2016-03-23 | |
| | |||
* | Fixed compilation warning | 2016-03-23 | |
| | |||
* | Use a single Barrier instead of a collection of Notifications to reduce the ↵ | 2016-03-22 | |
| | | | | thread synchronization overhead | ||
* | Avoid implicit cast | 2016-03-09 | |
| | |||
* | Avoid unnecessary conversion from 32bit int to 64bit unsigned int | 2016-03-09 | |
| | |||
* | Replace std::vector with our own implementation, as using the stl when ↵ | 2016-03-08 | |
| | | | | compiling with nvcc and avx enabled leads to many issues. | ||
* | Simplified the full reduction code | 2016-03-08 | |
| | |||
* | Decoupled the packet type definition from the definition of the tensor ops. ↵ | 2016-03-08 | |
| | | | | All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit. | ||
* | Made the signature of the inner and outer reducers consistent | 2016-02-29 | |
| | |||
* | Optimized the performance of narrow reductions on CUDA devices | 2016-02-29 | |
| | |||
* | Fixed a typo in the reduction code that could prevent large full reductionsx ↵ | 2016-02-24 | |
| | | | | from running properly on old cuda devices. | ||
* | Fixed a number of compilation warnings generated by the cuda tests | 2016-01-31 | |
| | |||
* | Fixed a couple of compilation warnings. | 2016-01-28 | |
| | |||
* | Fixed some compilation problems with nvcc + clang | 2016-01-27 | |
| | |||
* | Record whether the underlying tensor storage can be accessed directly during ↵ | 2016-01-19 | |
| | | | | the evaluation of an expression. | ||
* | Properly record the rank of reduced tensors in the tensor traits. | 2016-01-13 | |
| | |||
* | Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152) | 2016-01-11 | |
|\ | | | | | | | Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations. | ||
* | | Fixed a bug in the dispatch of optimized reduction kernels. | 2016-01-11 | |
| | | |||
* | | Re-enabled the optimized reduction CUDA code. | 2016-01-11 | |
| | | |||
| * | Alternative way of forcing instantiation of device kernels without | 2016-01-10 | |
|/ | | | | | | causing warnings or requiring device to device kernel invocations. This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines. | ||
* | Simplified the dispatch code. | 2016-01-08 | |
| | |||
* | Reworked the dispatch of optimized cuda reduction kernels to workaround a ↵ | 2016-01-08 | |
| | | | | nvcc bug that prevented the code from compiling in optimized mode in some cases | ||
* | Improved the performance of reductions on CUDA devices | 2016-01-04 | |
| | |||
* | Optimized outer reduction on GPUs. | 2015-12-22 | |
| | |||
* | Silenced some compilation warnings triggered by nvcc | 2015-12-17 | |
| | |||
* | Simplified more of the IndexList code. | 2015-11-12 | |
| | |||
* | Started to make the IndexList code compile by more compilers | 2015-11-12 | |
| | |||
* | Fixed CUDA compilation errors | 2015-11-11 | |
| | |||
* | Code cleanup | 2015-11-06 | |
| | |||
* | Misc fixes to full reductions | 2015-11-05 | |
| | |||
* | Updated the reduction code so that full reductions now return a tensor of ↵ | 2015-11-04 | |
| | | | | rank 0. | ||
* | Many files were missing in previous changeset. | 2015-07-29 | |
| | |||
* | Silenced a number of compilation warnings | 2015-06-29 | |
| | |||
* | Improved performance of full reduction by 2 order of magnitude on CPU and 3 ↵ | 2015-06-29 | |
| | | | | orders of magnitude on GPU | ||
* | Worked around some constexpr related bugs in nvcc 7 | 2015-05-28 | |
| | |||
* | Silenced a few compilation warnings generated by nvcc | 2015-02-10 | |
| | |||
* | Silcenced a few compilation warnings | 2015-02-10 | |
| | |||
* | Added the EIGEN_HAS_CONSTEXPR define | 2015-02-06 | |
| | | | | Gate the tensor index list code based on the value of EIGEN_HAS_CONSTEXPR | ||
* | Silenced some compilation warnings | 2015-01-30 | |
| | |||
* | mproved the performance of tensor reductions that preserve the inner most ↵ | 2015-01-27 | |
| | | | | dimension(s). | ||
* | Improved the performance of tensor reductions | 2015-01-14 | |
| | | | | | Added the ability to generate random numbers following a normal distribution Created a test to validate the ability to generate random numbers. | ||
* | Silenced a few compilation warnings | 2014-10-16 | |
| | | | | Generalized a TensorMap constructor | ||
* | Added support for tensor reductions and concatenations | 2014-10-01 | |