aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h
Commit message (Collapse)AuthorAge
...
* Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵Gravatar Gael Guennebaud2017-07-17
| | | | aliases
* Pull the latest updates from trunkGravatar Benoit Steiner2016-10-05
|\
| * Cleanup the cuda executor code.Gravatar Benoit Steiner2016-10-04
| |
* | Partial OpenCL support via SYCL compatible with ComputeCpp CE.Gravatar Luke Iwanski2016-09-19
|/
* Deleted dead code.Gravatar Benoit Steiner2016-07-25
|
* bug #1255: comment out broken and unsused line.Gravatar Gael Guennebaud2016-07-25
|
* Use a single PacketSize variableGravatar Benoit Steiner2016-06-01
|
* Merge.Gravatar Rasmus Munk Larsen2016-05-18
|\
* | Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵Gravatar Rasmus Munk Larsen2016-05-18
| | | | | | | | EIGEN_USE_COST_MODEL.
| * Reduce overhead for small tensors and cheap ops by short-circuiting the ↵Gravatar Rasmus Munk Larsen2016-05-17
|/ | | | const computation and block size calculation in parallelFor.
* #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if ↵Gravatar Benoit Steiner2016-05-17
| | | | !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
* Fixed compilation errorGravatar Benoit Steiner2016-05-17
|
* Address comments by bsteiner.Gravatar Rasmus Munk Larsen2016-05-12
|
* Improvements to parallelFor.Gravatar Rasmus Munk Larsen2016-05-12
| | | | Move some scalar functors from TensorFunctors. to Eigen core.
* Strongly hint but don't force the compiler to unroll a some loops in the ↵Gravatar Benoit Steiner2016-05-05
| | | | tensor executor. This results in up to 27% faster code.
* Fixed several compilation warningsGravatar Benoit Steiner2016-04-21
|
* Don't crash when attempting to reduce empty tensors.Gravatar Benoit Steiner2016-04-20
|
* Simplified the code that launches cuda kernels.Gravatar Benoit Steiner2016-04-19
|
* Avoid an unnecessary copy of the evaluator.Gravatar Benoit Steiner2016-04-19
|
* Get rid of void* casting when calling EvalRange::run.Gravatar Rasmus Munk Larsen2016-04-15
|
* Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵Gravatar Rasmus Munk Larsen2016-04-14
| | | | and reductions. The cost model is turned off by default.
* Defer the decision to vectorize tensor CUDA code to the meta kernel. This ↵Gravatar Benoit Steiner2016-04-12
| | | | makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3
* Prevent potential overflow.Gravatar Benoit Steiner2016-03-28
|
* Avoid unnecessary conversionsGravatar Benoit Steiner2016-03-23
|
* Fixed compilation warningGravatar Benoit Steiner2016-03-23
|
* Use a single Barrier instead of a collection of Notifications to reduce the ↵Gravatar Benoit Steiner2016-03-22
| | | | thread synchronization overhead
* Replace std::vector with our own implementation, as using the stl when ↵Gravatar Benoit Steiner2016-03-08
| | | | compiling with nvcc and avx enabled leads to many issues.
* Fix a couple of typos in the code.Gravatar Benoit Steiner2016-03-07
|
* Made it possible to limit the number of blocks that will be used to evaluate ↵Gravatar Benoit Steiner2016-02-01
| | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
* Silenced several compilation warnings triggered by nvcc.Gravatar Benoit Steiner2016-01-11
|
* Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵Gravatar Benoit Steiner2016-01-08
| | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
* Silenced some compilation warnings triggered by nvccGravatar Benoit Steiner2015-12-17
|
* Don't create more cuda blocks than necessaryGravatar Benoit Steiner2015-11-23
|
* Make it possible for a vectorized tensor expression to be executed in a CUDA ↵Gravatar Benoit Steiner2015-11-11
| | | | kernel.
* Fixed CUDA compilation errorsGravatar Benoit Steiner2015-11-11
|
* Refined the #ifdef __CUDACC__ guard to ensure that when trying to compile ↵Gravatar Benoit Steiner2015-10-23
| | | | gpu code with a non cuda compiler results in a linking error instead of bogus code.
* Use numext::mini/numext::maxi instead of std::min/std::max in the tensor codeGravatar Benoit Steiner2015-08-28
|
* Avoid relying on a default value for the Vectorizable template parameter of ↵Gravatar Benoit Steiner2015-07-15
| | | | the EvalRange functor
* Added support for multi gpu configuration to the GpuDevice classGravatar Benoit Steiner2015-07-15
|
* Enabled the vectorized evaluation of several tensor expressions that was ↵Gravatar Benoit Steiner2015-07-01
| | | | previously disabled by mistake
* Moved away from std::async and std::future as the underlying mechnism for ↵Gravatar Benoit Steiner2015-05-20
| | | | | | the thread pool device. On several platforms, the functions passed to std::async are not scheduled in the order in which they are given to std::async, which leads to massive performance issues in the contraction code. Instead we now have a custom thread pool that ensures that the functions are picked up by the threads in the pool in the order in which they are enqueued in the pool.
* Make sure that the copy constructor of the evaluator is always called before ↵Gravatar Benoit Steiner2015-04-21
| | | | launching the evaluation of a tensor expression on a cuda device.
* Fixed off-by-one error that prevented the evaluation of small tensor ↵Gravatar Benoit Steiner2015-02-27
| | | | expressions from being vectorized
* Fixed several compilation warnings reported by clangGravatar Benoit Steiner2015-02-25
|
* Fixed compilation error triggered when trying to vectorize a non ↵Gravatar Benoit Steiner2015-02-10
| | | | vectorizable cuda kernel.
* Ensured that each thread has it's own copy of the TensorEvaluator: this ↵Gravatar Benoit Steiner2015-01-14
| | | | avoid race conditions when the evaluator calls a non thread safe functor, eg when generating random numbers.
* Fixed the evaluation of expressions involving tensors of 2 or 3 elements on ↵Gravatar Benoit Steiner2014-11-18
| | | | CUDA devices.
* Use the proper index typeGravatar Benoit Steiner2014-10-30
|
* Misc improvements and cleanupsGravatar Benoit Steiner2014-10-13
|
* Fixed the tensor shuffling testGravatar Benoit Steiner2014-10-10
|