Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | 2017-07-17 | ||
| | | | | aliases | |||
* | Pull the latest updates from trunk | 2016-10-05 | ||
|\ | ||||
| * | Cleanup the cuda executor code. | 2016-10-04 | ||
| | | ||||
* | | Partial OpenCL support via SYCL compatible with ComputeCpp CE. | 2016-09-19 | ||
|/ | ||||
* | Deleted dead code. | 2016-07-25 | ||
| | ||||
* | bug #1255: comment out broken and unsused line. | 2016-07-25 | ||
| | ||||
* | Use a single PacketSize variable | 2016-06-01 | ||
| | ||||
* | Merge. | 2016-05-18 | ||
|\ | ||||
* | | Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of ↵ | 2016-05-18 | ||
| | | | | | | | | EIGEN_USE_COST_MODEL. | |||
| * | Reduce overhead for small tensors and cheap ops by short-circuiting the ↵ | 2016-05-17 | ||
|/ | | | | const computation and block size calculation in parallelFor. | |||
* | #if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if ↵ | 2016-05-17 | ||
| | | | | !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly. | |||
* | Fixed compilation error | 2016-05-17 | ||
| | ||||
* | Address comments by bsteiner. | 2016-05-12 | ||
| | ||||
* | Improvements to parallelFor. | 2016-05-12 | ||
| | | | | Move some scalar functors from TensorFunctors. to Eigen core. | |||
* | Strongly hint but don't force the compiler to unroll a some loops in the ↵ | 2016-05-05 | ||
| | | | | tensor executor. This results in up to 27% faster code. | |||
* | Fixed several compilation warnings | 2016-04-21 | ||
| | ||||
* | Don't crash when attempting to reduce empty tensors. | 2016-04-20 | ||
| | ||||
* | Simplified the code that launches cuda kernels. | 2016-04-19 | ||
| | ||||
* | Avoid an unnecessary copy of the evaluator. | 2016-04-19 | ||
| | ||||
* | Get rid of void* casting when calling EvalRange::run. | 2016-04-15 | ||
| | ||||
* | Eigen Tensor cost model part 2: Thread scheduling for standard evaluators ↵ | 2016-04-14 | ||
| | | | | and reductions. The cost model is turned off by default. | |||
* | Defer the decision to vectorize tensor CUDA code to the meta kernel. This ↵ | 2016-04-12 | ||
| | | | | makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3 | |||
* | Prevent potential overflow. | 2016-03-28 | ||
| | ||||
* | Avoid unnecessary conversions | 2016-03-23 | ||
| | ||||
* | Fixed compilation warning | 2016-03-23 | ||
| | ||||
* | Use a single Barrier instead of a collection of Notifications to reduce the ↵ | 2016-03-22 | ||
| | | | | thread synchronization overhead | |||
* | Replace std::vector with our own implementation, as using the stl when ↵ | 2016-03-08 | ||
| | | | | compiling with nvcc and avx enabled leads to many issues. | |||
* | Fix a couple of typos in the code. | 2016-03-07 | ||
| | ||||
* | Made it possible to limit the number of blocks that will be used to evaluate ↵ | 2016-02-01 | ||
| | | | | a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations. | |||
* | Silenced several compilation warnings triggered by nvcc. | 2016-01-11 | ||
| | ||||
* | Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this ↵ | 2016-01-08 | ||
| | | | | reintroduces some compulation warnings but it's much better than having to deal with random assertion failures. | |||
* | Silenced some compilation warnings triggered by nvcc | 2015-12-17 | ||
| | ||||
* | Don't create more cuda blocks than necessary | 2015-11-23 | ||
| | ||||
* | Make it possible for a vectorized tensor expression to be executed in a CUDA ↵ | 2015-11-11 | ||
| | | | | kernel. | |||
* | Fixed CUDA compilation errors | 2015-11-11 | ||
| | ||||
* | Refined the #ifdef __CUDACC__ guard to ensure that when trying to compile ↵ | 2015-10-23 | ||
| | | | | gpu code with a non cuda compiler results in a linking error instead of bogus code. | |||
* | Use numext::mini/numext::maxi instead of std::min/std::max in the tensor code | 2015-08-28 | ||
| | ||||
* | Avoid relying on a default value for the Vectorizable template parameter of ↵ | 2015-07-15 | ||
| | | | | the EvalRange functor | |||
* | Added support for multi gpu configuration to the GpuDevice class | 2015-07-15 | ||
| | ||||
* | Enabled the vectorized evaluation of several tensor expressions that was ↵ | 2015-07-01 | ||
| | | | | previously disabled by mistake | |||
* | Moved away from std::async and std::future as the underlying mechnism for ↵ | 2015-05-20 | ||
| | | | | | | the thread pool device. On several platforms, the functions passed to std::async are not scheduled in the order in which they are given to std::async, which leads to massive performance issues in the contraction code. Instead we now have a custom thread pool that ensures that the functions are picked up by the threads in the pool in the order in which they are enqueued in the pool. | |||
* | Make sure that the copy constructor of the evaluator is always called before ↵ | 2015-04-21 | ||
| | | | | launching the evaluation of a tensor expression on a cuda device. | |||
* | Fixed off-by-one error that prevented the evaluation of small tensor ↵ | 2015-02-27 | ||
| | | | | expressions from being vectorized | |||
* | Fixed several compilation warnings reported by clang | 2015-02-25 | ||
| | ||||
* | Fixed compilation error triggered when trying to vectorize a non ↵ | 2015-02-10 | ||
| | | | | vectorizable cuda kernel. | |||
* | Ensured that each thread has it's own copy of the TensorEvaluator: this ↵ | 2015-01-14 | ||
| | | | | avoid race conditions when the evaluator calls a non thread safe functor, eg when generating random numbers. | |||
* | Fixed the evaluation of expressions involving tensors of 2 or 3 elements on ↵ | 2014-11-18 | ||
| | | | | CUDA devices. | |||
* | Use the proper index type | 2014-10-30 | ||
| | ||||
* | Misc improvements and cleanups | 2014-10-13 | ||
| | ||||
* | Fixed the tensor shuffling test | 2014-10-10 | ||
| |