Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Added the ability to use a scratch buffer in cuda kernels | 2016-05-09 | |
| | |||
* | Added a new parallelFor api to the thread pool device. | 2016-05-09 | |
| | |||
* | Optimized the non blocking thread pool: | 2016-05-09 | |
| | | | | | | | | | * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks. | ||
* | Pulled latest updates from trunk | 2016-05-07 | |
|\ | |||
* | | Worked around a bug in nvcc on tegra x1 | 2016-05-07 | |
| | | |||
* | | Merged latest updates from trunk | 2016-05-06 | |
|\ \ | |||
* | | | Added support for packet processing of fp16 on kepler and maxwell gpus | 2016-05-06 | |
| | | | |||
| * | | Avoid double promotion | 2016-05-06 | |
| | | | |||
* | | | Marked a few tensor operations as read only | 2016-05-05 | |
|/ / | |||
* | | Added a test to validate full reduction on tensor of half floats | 2016-05-05 | |
| | | |||
* | | Made the testing of contractions on fp16 more robust | 2016-05-05 | |
| | | |||
* | | Refined the testing of log and exp on fp16 | 2016-05-05 | |
| | | |||
* | | Further improved the testing of fp16 | 2016-05-05 | |
| | | |||
* | | Relaxed the dummy precision for fp16 | 2016-05-05 | |
| | | |||
* | | Relaxed an assertion that was tighter that necessary. | 2016-05-05 | |
| | | |||
* | | Added a benchmark to measure the performance of full reductions of 16 bit floats | 2016-05-05 | |
| | | |||
* | | Fixed some incorrect assertions | 2016-05-05 | |
| | | |||
* | | Avoid unecessary type promotion | 2016-05-05 | |
| | | |||
* | | Strongly hint but don't force the compiler to unroll a some loops in the ↵ | 2016-05-05 | |
| | | | | | | | | tensor executor. This results in up to 27% faster code. | ||
* | | Avoided unecessary type promotion | 2016-05-05 | |
| | | |||
* | | Added tests for full contractions using thread pools and gpu devices. | 2016-05-05 | |
| | | | | | | | | Fixed a couple of issues in the corresponding code. | ||
* | | Updated the contraction code to ensure that full contraction return a tensor ↵ | 2016-05-05 | |
| | | | | | | | | of rank 0 | ||
* | | Fixed some singed/unsigned comparison warnings | 2016-05-05 | |
| | | |||
* | | Enable and fix -Wdouble-conversion warnings | 2016-05-05 | |
| | | |||
* | | Reduced the memory footprint of the cxx11_tensor_image_patch test | 2016-05-04 | |
| | | |||
* | | Removed extraneous 'explicit' keywords | 2016-05-04 | |
| | | |||
* | | fix double-promotion/float-conversion in Core/SpecialFunctions.h | 2016-05-04 | |
| | | |||
* | | Improve documentation of BDCSVD | 2016-05-04 | |
| | | |||
* | | Use numext::isfinite instead of std::isfinite | 2016-05-03 | |
| | | |||
* | | bug #1214: consider denormals as zero in D&C SVD. This also workaround ↵ | 2016-05-03 | |
| | | | | | | | | infinite binary search when compiling with ICC's unsafe optimizations. | ||
* | | Added a test to validate the computation of exp and log on 16bit floats | 2016-05-03 | |
| | | |||
* | | Fixed compilation error with cuda >= 7.5 | 2016-05-03 | |
| | | |||
* | | Deleted superfluous explicit keyword. | 2016-05-03 | |
| | | |||
* | | Made a cast explicit | 2016-05-02 | |
| | | |||
* | | Pulled latest updates from trunk | 2016-05-01 | |
|\ \ | |||
* | | | Fixed compilation error | 2016-05-01 | |
| | | | |||
| * | | Fix performance regression: with AVX, unaligned stores were emitted instead ↵ | 2016-05-01 | |
|/ / | | | | | | | of aligned ones for fixed size assignement. | ||
* | | Added missing accessors to fixed sized tensors | 2016-04-29 | |
| | | |||
* | | Deleted trailing commas | 2016-04-29 | |
| | | |||
* | | Deleted useless trailing commas | 2016-04-29 | |
| | | |||
* | | Deleted unnecessary trailing commas. | 2016-04-29 | |
| | | |||
* | | Fixed compilation errors generated by clang | 2016-04-29 | |
| | | |||
* | | Added a few tests to ensure that the dimensions of rank 0 tensors are ↵ | 2016-04-29 | |
| | | | | | | | | correctly computed | ||
* | | Return the proper size (ie 1) for tensors of rank 0 | 2016-04-29 | |
| | | |||
* | | Made several tensor tests compatible with cxx03 | 2016-04-29 | |
| | | |||
* | | Moved a number of tensor tests that don't require cxx11 to work properly ↵ | 2016-04-29 | |
| | | | | | | | | outside the EIGEN_TEST_CXX11 test section | ||
* | | Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support | 2016-04-29 | |
| | | |||
* | | Deleted unused default values for template parameters | 2016-04-29 | |
| | | |||
* | | Made a coupe of tensor tests compile without requiring c++11 support. | 2016-04-29 | |
| | | |||
* | | Made the cxx11_tensor_forced_eval compile without c++11. | 2016-04-29 | |
| | |