Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Optimized the non blocking thread pool: | 2016-05-09 | |
| | | | | | | | | | * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks. | ||
* | Marked a few tensor operations as read only | 2016-05-05 | |
| | |||
* | Relaxed an assertion that was tighter that necessary. | 2016-05-05 | |
| | |||
* | Strongly hint but don't force the compiler to unroll a some loops in the ↵ | 2016-05-05 | |
| | | | | tensor executor. This results in up to 27% faster code. | ||
* | Added tests for full contractions using thread pools and gpu devices. | 2016-05-05 | |
| | | | | Fixed a couple of issues in the corresponding code. | ||
* | Updated the contraction code to ensure that full contraction return a tensor ↵ | 2016-05-05 | |
| | | | | of rank 0 | ||
* | Removed extraneous 'explicit' keywords | 2016-05-04 | |
| | |||
* | Use numext::isfinite instead of std::isfinite | 2016-05-03 | |
| | |||
* | Deleted superfluous explicit keyword. | 2016-05-03 | |
| | |||
* | Fixed compilation error | 2016-05-01 | |
| | |||
* | Added missing accessors to fixed sized tensors | 2016-04-29 | |
| | |||
* | Deleted trailing commas | 2016-04-29 | |
| | |||
* | Deleted useless trailing commas | 2016-04-29 | |
| | |||
* | Deleted unnecessary trailing commas. | 2016-04-29 | |
| | |||
* | Return the proper size (ie 1) for tensors of rank 0 | 2016-04-29 | |
| | |||
* | Deleted unused default values for template parameters | 2016-04-29 | |
| | |||
* | Restore Tensor support for non c++11 compilers | 2016-04-29 | |
| | |||
* | Fixed include path | 2016-04-29 | |
| | |||
* | Use computeProductBlockingSizes to compute blocking for both ShardByCol and ↵ | 2016-04-27 | |
| | | | | ShardByRow cases. | ||
* | Refactor the unsupported CXX11/Core module to internal headers only. | 2016-04-26 | |
| | |||
* | Fixed the partial evaluation of non vectorizable tensor subexpressions | 2016-04-25 | |
| | |||
* | Refined the cost of the striding operation. | 2016-04-25 | |
| | |||
* | Provide access to the base threadpool classes | 2016-04-21 | |
| | |||
* | Added the ability to switch to the new thread pool with a #define | 2016-04-21 | |
| | |||
* | Fixed several compilation warnings | 2016-04-21 | |
| | |||
* | Don't crash when attempting to reduce empty tensors. | 2016-04-20 | |
| | |||
* | Started to implement a portable way to yield. | 2016-04-19 | |
| | |||
* | Implemented a more portable version of thread local variables | 2016-04-19 | |
| | |||
* | Fixed a compilation error with nvcc 7. | 2016-04-19 | |
| | |||
* | Simplified the code that launches cuda kernels. | 2016-04-19 | |
| | |||
* | Don't take the address of a kernel on CUDA devices that don't support this ↵ | 2016-04-19 | |
| | | | | feature. | ||
* | Use numext::ceil instead of std::ceil | 2016-04-19 | |
| | |||
* | Avoid an unnecessary copy of the evaluator. | 2016-04-19 | |
| | |||
* | Use DenseIndex in the MeanReducer to avoid overflows when processing very ↵ | 2016-04-19 | |
| | | | | large tensors. | ||
* | Move the evalGemm method into the TensorContractionEvaluatorBase class to ↵ | 2016-04-15 | |
| | | | | make it accessible from both the single and multithreaded contraction evaluators. | ||
* | Deleted unnecessary variable | 2016-04-15 | |
| | |||
* | Fixed a few compilation warnings | 2016-04-15 | |
| | |||
* | Merged in rmlarsen/eigen (pull request PR-178) | 2016-04-15 | |
|\ | | | | | | | Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. | ||
| * | Get rid of void* casting when calling EvalRange::run. | 2016-04-15 | |
| | | |||
* | | Added ability to access the cache sizes from the tensor devices | 2016-04-14 | |
| | | |||
* | | Added support for exclusive or | 2016-04-14 | |
| | | |||
| * | Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵ | 2016-04-14 | |
| | | | | | | | | and reductions. The cost model is turned off by default. | ||
* | | Added missing definition of PacketSize in the gpu evaluator of convolution | 2016-04-14 | |
| | | |||
* | | Merged in rmlarsen/eigen (pull request PR-177) | 2016-04-14 | |
|\| | | | | | | | Eigen Tensor cost model part 1. | ||
* | | Prepared the migration to the new non blocking thread pool | 2016-04-14 | |
| | | |||
| * | Improvements to cost model. | 2016-04-14 | |
| | | |||
* | | Added a more scalable non blocking thread pool | 2016-04-14 | |
| | | |||
| * | Merge upstream updates. | 2016-04-14 | |
| |\ | |/ |/| | |||
| * | Eigen cost model part 1. This implements a basic recursive framework to ↵ | 2016-04-14 | |
| | | | | | | | | estimate the cost of evaluating tensor expressions. | ||
* | | Silenced a compilation warning | 2016-04-14 | |
| | |