Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | Gael Guennebaud | 2017-07-17 | |
| | | | | aliases | |||
* | Pull the latest updates from trunk | Benoit Steiner | 2016-10-05 | |
|\ | ||||
| * | Cleanup the cuda executor code. | Benoit Steiner | 2016-10-04 | |
| | | ||||
* | | Partial OpenCL support via SYCL compatible with ComputeCpp CE. | Luke Iwanski | 2016-09-19 | |
|/ | ||||
* | Deleted dead code. | Benoit Steiner | 2016-07-25 | |
| | ||||
* | bug #1255: comment out broken and unsused line. | Gael Guennebaud | 2016-07-25 | |
| | ||||
* | Use a single PacketSize variable | Benoit Steiner | 2016-06-01 | |
| | ||||
* | Merge. | Rasmus Munk Larsen | 2016-05-18 | |
|\ | ||||
* | | Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵ | Rasmus Munk Larsen | 2016-05-18 | |
| | | | | | | | | EIGEN_USE_COST_MODEL. | |||
| * | Reduce overhead for small tensors and cheap ops by short-circuiting the ↵ | Rasmus Munk Larsen | 2016-05-17 | |
|/ | | | | const computation and block size calculation in parallelFor. | |||
* | #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if ↵ | Benoit Steiner | 2016-05-17 | |
| | | | | !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly. | |||
* | Fixed compilation error | Benoit Steiner | 2016-05-17 | |
| | ||||
* | Address comments by bsteiner. | Rasmus Munk Larsen | 2016-05-12 | |
| | ||||
* | Improvements to parallelFor. | Rasmus Munk Larsen | 2016-05-12 | |
| | | | | Move some scalar functors from TensorFunctors. to Eigen core. | |||
* | Strongly hint but don't force the compiler to unroll a some loops in the ↵ | Benoit Steiner | 2016-05-05 | |
| | | | | tensor executor. This results in up to 27% faster code. | |||
* | Fixed several compilation warnings | Benoit Steiner | 2016-04-21 | |
| | ||||
* | Don't crash when attempting to reduce empty tensors. | Benoit Steiner | 2016-04-20 | |
| | ||||
* | Simplified the code that launches cuda kernels. | Benoit Steiner | 2016-04-19 | |
| | ||||
* | Avoid an unnecessary copy of the evaluator. | Benoit Steiner | 2016-04-19 | |
| | ||||
* | Get rid of void* casting when calling EvalRange::run. | Rasmus Munk Larsen | 2016-04-15 | |
| | ||||
* | Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵ | Rasmus Munk Larsen | 2016-04-14 | |
| | | | | and reductions. The cost model is turned off by default. | |||
* | Defer the decision to vectorize tensor CUDA code to the meta kernel. This ↵ | Benoit Steiner | 2016-04-12 | |
| | | | | makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3 | |||
* | Prevent potential overflow. | Benoit Steiner | 2016-03-28 | |
| | ||||
* | Avoid unnecessary conversions | Benoit Steiner | 2016-03-23 | |
| | ||||
* | Fixed compilation warning | Benoit Steiner | 2016-03-23 | |
| | ||||
* | Use a single Barrier instead of a collection of Notifications to reduce the ↵ | Benoit Steiner | 2016-03-22 | |
| | | | | thread synchronization overhead | |||
* | Replace std::vector with our own implementation, as using the stl when ↵ | Benoit Steiner | 2016-03-08 | |
| | | | | compiling with nvcc and avx enabled leads to many issues. | |||
* | Fix a couple of typos in the code. | Benoit Steiner | 2016-03-07 | |
| | ||||
* | Made it possible to limit the number of blocks that will be used to evaluate ↵ | Benoit Steiner | 2016-02-01 | |
| | | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. | |||
* | Silenced several compilation warnings triggered by nvcc. | Benoit Steiner | 2016-01-11 | |
| | ||||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | Benoit Steiner | 2016-01-08 | |
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | |||
* | Silenced some compilation warnings triggered by nvcc | Benoit Steiner | 2015-12-17 | |
| | ||||
* | Don't create more cuda blocks than necessary | Benoit Steiner | 2015-11-23 | |
| | ||||
* | Make it possible for a vectorized tensor expression to be executed in a CUDA ↵ | Benoit Steiner | 2015-11-11 | |
| | | | | kernel. | |||
* | Fixed CUDA compilation errors | Benoit Steiner | 2015-11-11 | |
| | ||||
* | Refined the #ifdef __CUDACC__ guard to ensure that when trying to compile ↵ | Benoit Steiner | 2015-10-23 | |
| | | | | gpu code with a non cuda compiler results in a linking error instead of bogus code. | |||
* | Use numext::mini/numext::maxi instead of std::min/std::max in the tensor code | Benoit Steiner | 2015-08-28 | |
| | ||||
* | Avoid relying on a default value for the Vectorizable template parameter of ↵ | Benoit Steiner | 2015-07-15 | |
| | | | | the EvalRange functor | |||
* | Added support for multi gpu configuration to the GpuDevice class | Benoit Steiner | 2015-07-15 | |
| | ||||
* | Enabled the vectorized evaluation of several tensor expressions that was ↵ | Benoit Steiner | 2015-07-01 | |
| | | | | previously disabled by mistake | |||
* | Moved away from std::async and std::future as the underlying mechnism for ↵ | Benoit Steiner | 2015-05-20 | |
| | | | | | | the thread pool device. On several platforms, the functions passed to std::async are not scheduled in the order in which they are given to std::async, which leads to massive performance issues in the contraction code. Instead we now have a custom thread pool that ensures that the functions are picked up by the threads in the pool in the order in which they are enqueued in the pool. | |||
* | Make sure that the copy constructor of the evaluator is always called before ↵ | Benoit Steiner | 2015-04-21 | |
| | | | | launching the evaluation of a tensor expression on a cuda device. | |||
* | Fixed off-by-one error that prevented the evaluation of small tensor ↵ | Benoit Steiner | 2015-02-27 | |
| | | | | expressions from being vectorized | |||
* | Fixed several compilation warnings reported by clang | Benoit Steiner | 2015-02-25 | |
| | ||||
* | Fixed compilation error triggered when trying to vectorize a non ↵ | Benoit Steiner | 2015-02-10 | |
| | | | | vectorizable cuda kernel. | |||
* | Ensured that each thread has it's own copy of the TensorEvaluator: this ↵ | Benoit Steiner | 2015-01-14 | |
| | | | | avoid race conditions when the evaluator calls a non thread safe functor, eg when generating random numbers. | |||
* | Fixed the evaluation of expressions involving tensors of 2 or 3 elements on ↵ | Benoit Steiner | 2014-11-18 | |
| | | | | CUDA devices. | |||
* | Use the proper index type | Benoit Steiner | 2014-10-30 | |
| | ||||
* | Misc improvements and cleanups | Benoit Steiner | 2014-10-13 | |
| | ||||
* | Fixed the tensor shuffling test | Benoit Steiner | 2014-10-10 | |
| |