aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen
Commit message (Collapse)AuthorAge
* Address comments by bsteiner.Gravatar Rasmus Munk Larsen2016-05-12
|
* Improvements to parallelFor.Gravatar Rasmus Munk Larsen2016-05-12
| | | | Move some scalar functors from TensorFunctors. to Eigen core.
* Worked around a compilation error triggered by nvcc when compiling a tensor ↵Gravatar Benoit Steiner2016-05-12
| | | | concatenation kernel.
* Fixed potential race condition in the non blocking thread poolGravatar Benoit Steiner2016-05-12
|
* Replace implicit cast with an explicit oneGravatar Benoit Steiner2016-05-12
|
* Worked around compilation errors with older versions of gccGravatar Benoit Steiner2016-05-11
|
* Improved the portability of the tensor codeGravatar Benoit Steiner2016-05-11
|
* Added the ability to load fp16 using the texture path.Gravatar Benoit Steiner2016-05-11
| | | | Improved the performance of some reductions on fp16
* Removed deprecated flag (which apparently was ignored anyway)Gravatar Christoph Hertzberg2016-05-11
|
* fixed some double-promotion and sign-compare warningsGravatar Christoph Hertzberg2016-05-11
|
* Fixed a typo in my previous commitGravatar Benoit Steiner2016-05-11
|
* Fix potential race condition in the CUDA reduction code.Gravatar Benoit Steiner2016-05-11
|
* Explicitely initialize all the atomic variables.Gravatar Benoit Steiner2016-05-11
|
* Properly gate the use of half2.Gravatar Benoit Steiner2016-05-10
|
* Added support for fp16 to the sigmoid functor.Gravatar Benoit Steiner2016-05-10
|
* Small improvement to the full reduction of fp16Gravatar Benoit Steiner2016-05-10
|
* Simplified the reduction code a little.Gravatar Benoit Steiner2016-05-10
|
* Improved the performance of full reductions on GPU:Gravatar Benoit Steiner2016-05-09
| | | | | | | | | | | | | | Before: BM_fullReduction/10 200000 11751 8.51 MFlops/s BM_fullReduction/80 5000 523385 12.23 MFlops/s BM_fullReduction/640 50 36179326 11.32 MFlops/s BM_fullReduction/4K 1 2173517195 11.50 MFlops/s After: BM_fullReduction/10 500000 5987 16.70 MFlops/s BM_fullReduction/80 200000 10636 601.73 MFlops/s BM_fullReduction/640 50000 58428 7010.31 MFlops/s BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
* Added the ability to use a scratch buffer in cuda kernelsGravatar Benoit Steiner2016-05-09
|
* Added a new parallelFor api to the thread pool device.Gravatar Benoit Steiner2016-05-09
|
* Optimized the non blocking thread pool:Gravatar Benoit Steiner2016-05-09
| | | | | | | | | * Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered. * Directly pop from a non-empty queue when we are waiting for work, instead of first noticing that there is a non-empty queue and then doing another round of random stealing to re-discover the non-empty queue. * Steal only 1 task from a remote queue instead of half of tasks.
* Marked a few tensor operations as read onlyGravatar Benoit Steiner2016-05-05
|
* Relaxed an assertion that was tighter that necessary.Gravatar Benoit Steiner2016-05-05
|
* Fixed some incorrect assertionsGravatar Benoit Steiner2016-05-05
|
* Strongly hint but don't force the compiler to unroll a some loops in the ↵Gravatar Benoit Steiner2016-05-05
| | | | tensor executor. This results in up to 27% faster code.
* Added tests for full contractions using thread pools and gpu devices.Gravatar Benoit Steiner2016-05-05
| | | | Fixed a couple of issues in the corresponding code.
* Updated the contraction code to ensure that full contraction return a tensor ↵Gravatar Benoit Steiner2016-05-05
| | | | of rank 0
* Enable and fix -Wdouble-conversion warningsGravatar Christoph Hertzberg2016-05-05
|
* Removed extraneous 'explicit' keywordsGravatar Benoit Steiner2016-05-04
|
* Use numext::isfinite instead of std::isfiniteGravatar Benoit Steiner2016-05-03
|
* Deleted superfluous explicit keyword.Gravatar Benoit Steiner2016-05-03
|
* Fixed compilation errorGravatar Benoit Steiner2016-05-01
|
* Added missing accessors to fixed sized tensorsGravatar Benoit Steiner2016-04-29
|
* Deleted trailing commasGravatar Benoit Steiner2016-04-29
|
* Deleted useless trailing commasGravatar Benoit Steiner2016-04-29
|
* Deleted unnecessary trailing commas.Gravatar Benoit Steiner2016-04-29
|
* Return the proper size (ie 1) for tensors of rank 0Gravatar Benoit Steiner2016-04-29
|
* Deleted unused default values for template parametersGravatar Benoit Steiner2016-04-29
|
* Restore Tensor support for non c++11 compilersGravatar Benoit Steiner2016-04-29
|
* Fixed include pathGravatar Benoit Steiner2016-04-29
|
* Fix missing inclusion of Eigen/CoreGravatar Gael Guennebaud2016-04-27
|
* Use computeProductBlockingSizes to compute blocking for both ShardByCol and ↵Gravatar Rasmus Munk Larsen2016-04-27
| | | | ShardByRow cases.
* Refactor the unsupported CXX11/Core module to internal headers only.Gravatar Gael Guennebaud2016-04-26
|
* Fixed the partial evaluation of non vectorizable tensor subexpressionsGravatar Benoit Steiner2016-04-25
|
* Refined the cost of the striding operation.Gravatar Benoit Steiner2016-04-25
|
* Provide access to the base threadpool classesGravatar Benoit Steiner2016-04-21
|
* Added the ability to switch to the new thread pool with a #defineGravatar Benoit Steiner2016-04-21
|
* Fixed several compilation warningsGravatar Benoit Steiner2016-04-21
|
* Don't crash when attempting to reduce empty tensors.Gravatar Benoit Steiner2016-04-20
|
* Started to implement a portable way to yield.Gravatar Benoit Steiner2016-04-19
|