aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Fix help output of buildtests and check scriptsGravatar Christoph Hertzberg2016-05-11
|
* bug #1207: Add and fix logical-op warningsGravatar Christoph Hertzberg2016-05-11
|
* bug #1213: Give names to anonymous enumsGravatar Christoph Hertzberg2016-05-06
|
* Fixed a typo in my previous commitGravatar Benoit Steiner2016-05-11
|
* Fix potential race condition in the CUDA reduction code.Gravatar Benoit Steiner2016-05-11
|
* Added a few tests to validate the generation of random tensors on GPU.Gravatar Benoit Steiner2016-05-11
|
* Explicitely initialize all the atomic variables.Gravatar Benoit Steiner2016-05-11
|
* Workaround maybe-uninitialized warningGravatar Christoph Hertzberg2016-05-11
|
* Workaround "misleading-indentation" warningsGravatar Christoph Hertzberg2016-05-11
|
* Properly gate the use of half2.Gravatar Benoit Steiner2016-05-10
|
* Extended the tests for ptanhGravatar Benoit Steiner2016-05-10
|
* Added support for fp16 to the sigmoid functor.Gravatar Benoit Steiner2016-05-10
|
* Small improvement to the full reduction of fp16Gravatar Benoit Steiner2016-05-10
|
* Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This ↵Gravatar Benoit Steiner2016-05-10
| | | | improves the performance by 10 to 30%.
* Added a test to validate the new non blocking thread poolGravatar Benoit Steiner2016-05-10
|
* Simplified the reduction code a little.Gravatar Benoit Steiner2016-05-10
|
* Fixed compilation warningGravatar Benoit Steiner2016-05-09
|
* Improved the performance of full reductions on GPU:Gravatar Benoit Steiner2016-05-09
| | | | | | | | | | | | | | Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
* Added the ability to use a scratch buffer in cuda kernelsGravatar Benoit Steiner2016-05-09
|
* Added a new parallelFor api to the thread pool device.Gravatar Benoit Steiner2016-05-09
|
* Optimized the non blocking thread pool:Gravatar Benoit Steiner2016-05-09
| | | | | | | | | * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.
* Pulled latest updates from trunkGravatar Benoit Steiner2016-05-07
|\
* | Worked around a bug in nvcc on tegra x1Gravatar Benoit Steiner2016-05-07
| |
* | Merged latest updates from trunkGravatar Benoit Steiner2016-05-06
|\ \
* | | Added support for packet processing of fp16 on kepler and maxwell gpusGravatar Benoit Steiner2016-05-06
| | |
| * | Avoid double promotionGravatar Benoit Steiner2016-05-06
| | |
* | | Marked a few tensor operations as read onlyGravatar Benoit Steiner2016-05-05
|/ /
* | Added a test to validate full reduction on tensor of half floatsGravatar Benoit Steiner2016-05-05
| |
* | Made the testing of contractions on fp16 more robustGravatar Benoit Steiner2016-05-05
| |
* | Refined the testing of log and exp on fp16Gravatar Benoit Steiner2016-05-05
| |
* | Further improved the testing of fp16Gravatar Benoit Steiner2016-05-05
| |
* | Relaxed the dummy precision for fp16Gravatar Benoit Steiner2016-05-05
| |
* | Relaxed an assertion that was tighter that necessary.Gravatar Benoit Steiner2016-05-05
| |
* | Added a benchmark to measure the performance of full reductions of 16 bit floatsGravatar Benoit Steiner2016-05-05
| |
* | Fixed some incorrect assertionsGravatar Benoit Steiner2016-05-05
| |
* | Avoid unecessary type promotionGravatar Benoit Steiner2016-05-05
| |
* | Strongly hint but don't force the compiler to unroll a some loops in the ↵Gravatar Benoit Steiner2016-05-05
| | | | | | | | tensor executor. This results in up to 27% faster code.
* | Avoided unecessary type promotionGravatar Benoit Steiner2016-05-05
| |
* | Added tests for full contractions using thread pools and gpu devices.Gravatar Benoit Steiner2016-05-05
| | | | | | | | Fixed a couple of issues in the corresponding code.
* | Updated the contraction code to ensure that full contraction return a tensor ↵Gravatar Benoit Steiner2016-05-05
| | | | | | | | of rank 0
* | Fixed some singed/unsigned comparison warningsGravatar Christoph Hertzberg2016-05-05
| |
* | Enable and fix -Wdouble-conversion warningsGravatar Christoph Hertzberg2016-05-05
| |
* | Reduced the memory footprint of the cxx11_tensor_image_patch testGravatar Benoit Steiner2016-05-04
| |
* | Removed extraneous 'explicit' keywordsGravatar Benoit Steiner2016-05-04
| |
* | fix double-promotion/float-conversion in Core/SpecialFunctions.hGravatar Ola Røer Thorsen2016-05-04
| |
* | Improve documentation of BDCSVDGravatar Gael Guennebaud2016-05-04
| |
* | Use numext::isfinite instead of std::isfiniteGravatar Benoit Steiner2016-05-03
| |
* | bug #1214: consider denormals as zero in D&C SVD. This also workaround ↵Gravatar Gael Guennebaud2016-05-03
| | | | | | | | infinite binary search when compiling with ICC's unsafe optimizations.
* | Added a test to validate the computation of exp and log on 16bit floatsGravatar Benoit Steiner2016-05-03
| |
* | Fixed compilation error with cuda >= 7.5Gravatar Benoit Steiner2016-05-03
| |