Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix help output of buildtests and check scripts | Christoph Hertzberg | 2016-05-11 |
| | |||
* | bug #1207: Add and fix logical-op warnings | Christoph Hertzberg | 2016-05-11 |
| | |||
* | bug #1213: Give names to anonymous enums | Christoph Hertzberg | 2016-05-06 |
| | |||
* | Fixed a typo in my previous commit | Benoit Steiner | 2016-05-11 |
| | |||
* | Fix potential race condition in the CUDA reduction code. | Benoit Steiner | 2016-05-11 |
| | |||
* | Added a few tests to validate the generation of random tensors on GPU. | Benoit Steiner | 2016-05-11 |
| | |||
* | Explicitely initialize all the atomic variables. | Benoit Steiner | 2016-05-11 |
| | |||
* | Workaround maybe-uninitialized warning | Christoph Hertzberg | 2016-05-11 |
| | |||
* | Workaround "misleading-indentation" warnings | Christoph Hertzberg | 2016-05-11 |
| | |||
* | Properly gate the use of half2. | Benoit Steiner | 2016-05-10 |
| | |||
* | Extended the tests for ptanh | Benoit Steiner | 2016-05-10 |
| | |||
* | Added support for fp16 to the sigmoid functor. | Benoit Steiner | 2016-05-10 |
| | |||
* | Small improvement to the full reduction of fp16 | Benoit Steiner | 2016-05-10 |
| | |||
* | Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This ↵ | Benoit Steiner | 2016-05-10 |
| | | | | improves the performance by 10 to 30%. | ||
* | Added a test to validate the new non blocking thread pool | Benoit Steiner | 2016-05-10 |
| | |||
* | Simplified the reduction code a little. | Benoit Steiner | 2016-05-10 |
| | |||
* | Fixed compilation warning | Benoit Steiner | 2016-05-09 |
| | |||
* | Improved the performance of full reductions on GPU: | Benoit Steiner | 2016-05-09 |
| | | | | | | | | | | | | | | Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s | ||
* | Added the ability to use a scratch buffer in cuda kernels | Benoit Steiner | 2016-05-09 |
| | |||
* | Added a new parallelFor api to the thread pool device. | Benoit Steiner | 2016-05-09 |
| | |||
* | Optimized the non blocking thread pool: | Benoit Steiner | 2016-05-09 |
| | | | | | | | | | * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks. | ||
* | Pulled latest updates from trunk | Benoit Steiner | 2016-05-07 |
|\ | |||
* | | Worked around a bug in nvcc on tegra x1 | Benoit Steiner | 2016-05-07 |
| | | |||
* | | Merged latest updates from trunk | Benoit Steiner | 2016-05-06 |
|\ \ | |||
* | | | Added support for packet processing of fp16 on kepler and maxwell gpus | Benoit Steiner | 2016-05-06 |
| | | | |||
| * | | Avoid double promotion | Benoit Steiner | 2016-05-06 |
| | | | |||
* | | | Marked a few tensor operations as read only | Benoit Steiner | 2016-05-05 |
|/ / | |||
* | | Added a test to validate full reduction on tensor of half floats | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Made the testing of contractions on fp16 more robust | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Refined the testing of log and exp on fp16 | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Further improved the testing of fp16 | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Relaxed the dummy precision for fp16 | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Relaxed an assertion that was tighter that necessary. | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Added a benchmark to measure the performance of full reductions of 16 bit floats | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Fixed some incorrect assertions | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Avoid unecessary type promotion | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Strongly hint but don't force the compiler to unroll a some loops in the ↵ | Benoit Steiner | 2016-05-05 |
| | | | | | | | | tensor executor. This results in up to 27% faster code. | ||
* | | Avoided unecessary type promotion | Benoit Steiner | 2016-05-05 |
| | | |||
* | | Added tests for full contractions using thread pools and gpu devices. | Benoit Steiner | 2016-05-05 |
| | | | | | | | | Fixed a couple of issues in the corresponding code. | ||
* | | Updated the contraction code to ensure that full contraction return a tensor ↵ | Benoit Steiner | 2016-05-05 |
| | | | | | | | | of rank 0 | ||
* | | Fixed some singed/unsigned comparison warnings | Christoph Hertzberg | 2016-05-05 |
| | | |||
* | | Enable and fix -Wdouble-conversion warnings | Christoph Hertzberg | 2016-05-05 |
| | | |||
* | | Reduced the memory footprint of the cxx11_tensor_image_patch test | Benoit Steiner | 2016-05-04 |
| | | |||
* | | Removed extraneous 'explicit' keywords | Benoit Steiner | 2016-05-04 |
| | | |||
* | | fix double-promotion/float-conversion in Core/SpecialFunctions.h | Ola Røer Thorsen | 2016-05-04 |
| | | |||
* | | Improve documentation of BDCSVD | Gael Guennebaud | 2016-05-04 |
| | | |||
* | | Use numext::isfinite instead of std::isfinite | Benoit Steiner | 2016-05-03 |
| | | |||
* | | bug #1214: consider denormals as zero in D&C SVD. This also workaround ↵ | Gael Guennebaud | 2016-05-03 |
| | | | | | | | | infinite binary search when compiling with ICC's unsafe optimizations. | ||
* | | Added a test to validate the computation of exp and log on 16bit floats | Benoit Steiner | 2016-05-03 |
| | | |||
* | | Fixed compilation error with cuda >= 7.5 | Benoit Steiner | 2016-05-03 |
| | |