Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix more enum arithmetic. | Rasmus Munk Larsen | 2021-06-15 |
| | |||
* | Fix calls to device functions from host code | Nathan Luehr | 2021-05-11 |
| | |||
* | Fixing a CUDA / P100 regression introduced by PR 181 | Deven Desai | 2020-08-20 |
| | | | | | | PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only | ||
* | Adding an explicit launch_bounds(1024) attribute for GPU kernels. | Deven Desai | 2020-08-05 |
| | | | | | | | | | | Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang. This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user) Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5. This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang) | ||
* | Remove V2 suffix from TensorBlock | Eugene Zhulenev | 2019-12-10 |
| | |||
* | Remove legacy block evaluation support | Eugene Zhulenev | 2019-11-12 |
| | |||
* | Tensor block evaluation V2 support for unary/binary/broadcsting | Eugene Zhulenev | 2019-09-24 |
| | |||
* | [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵ | Mehdi Goli | 2019-06-28 |
| | | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes. | ||
* | Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all ↵ | Eugene Zhulenev | 2018-08-10 |
| | | | | evaluators | ||
* | Fix init order. | Rasmus Munk Larsen | 2018-08-07 |
| | |||
* | Fixed compilation errors. | Benoit Steiner | 2018-08-06 |
| | |||
* | Enabling per device specialisation of packetsize. | Mehdi Goli | 2018-08-01 |
| | |||
* | Add tiled evaluation support to TensorExecutor | Eugene Zhulenev | 2018-07-25 |
| | |||
* | Introduce gpu_assert for assertion in device-code, and disable them with ↵ | Gael Guennebaud | 2018-07-13 |
| | | | | clang-cuda. | ||
* | merging the CUDA and HIP implementation for the Tensor directory and the ↵ | Deven Desai | 2018-06-20 |
| | | | | unit tests | ||
* | Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵ | Gael Guennebaud | 2017-07-17 |
| | | | | aliases | ||
* | Fixed syntax errors generated by xcode | Benoit Steiner | 2017-07-09 |
| | |||
* | Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10) | Benoit Steiner | 2017-06-28 |
| | | | | | | | | | | DataDependancy * Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code. * Applying Ronnan's Comments. * Applying benoit's comments | ||
* | Adding non-deferrenciable pointer track for ComputeCpp backend; Adding ↵ | Mehdi Goli | 2017-01-19 |
| | | | | TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class. | ||
* | Marked a few tensor operations as read only | Benoit Steiner | 2016-05-05 |
| | |||
* | Deleted trailing commas | Benoit Steiner | 2016-04-29 |
| | |||
* | Added missing definition of PacketSize in the gpu evaluator of convolution | Benoit Steiner | 2016-04-14 |
| | |||
* | Eigen cost model part 1. This implements a basic recursive framework to ↵ | Rasmus Munk Larsen | 2016-04-14 |
| | | | | estimate the cost of evaluating tensor expressions. | ||
* | Decoupled the packet type definition from the definition of the tensor ops. ↵ | Benoit Steiner | 2016-03-08 |
| | | | | All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit. | ||
* | Fixed a number of compilation warnings generated by the cuda tests | Benoit Steiner | 2016-01-31 |
| | |||
* | Record whether the underlying tensor storage can be accessed directly during ↵ | Benoit Steiner | 2016-01-19 |
| | | | | the evaluation of an expression. | ||
* | Use numext::mini/numext::maxi instead of std::min/std::max in the tensor code | Benoit Steiner | 2015-08-28 |
| | |||
* | Many files were missing in previous changeset. | Gael Guennebaud | 2015-07-29 |
| | |||
* | Added support for multi gpu configuration to the GpuDevice class | Benoit Steiner | 2015-07-15 |
| | |||
* | Misc improvements and optimizations | Benoit Steiner | 2015-07-01 |
| | |||
* | Enabled the vectorized evaluation of several tensor expressions that was ↵ | Benoit Steiner | 2015-07-01 |
| | | | | previously disabled by mistake | ||
* | Fixed a compilation error triggered by nvcc 7 | Benoit Steiner | 2015-05-28 |
| | |||
* | Added support for convolution of tensors laid out in RowMajor mode | Benoit Steiner | 2015-03-31 |
| | |||
* | Silenced the last batch of compilation warnings triggered by gcc 4.8 | Benoit Steiner | 2015-02-10 |
| | |||
* | Improved support for RowMajor tensors | Benoit Steiner | 2015-01-14 |
| | | | | Misc fixes and API cleanups. | ||
* | Silenced a few compilation warnings | Benoit Steiner | 2014-10-16 |
| | | | | Generalized a TensorMap constructor | ||
* | Misc improvements and cleanups | Benoit Steiner | 2014-10-13 |
| | |||
* | Improved the performance of the tensor convolution code by a factor of about 4. | Benoit Steiner | 2014-09-03 |
| | |||
* | Improved the speed of convolutions when running on cuda devices | Benoit Steiner | 2014-08-19 |
| | |||
* | Fixed misc typos. | Benoit Steiner | 2014-08-13 |
| | |||
* | Added missing apis. | Benoit Steiner | 2014-08-13 |
| | |||
* | Updated the convolution and contraction evaluators to follow the new ↵ | Benoit Steiner | 2014-08-13 |
| | | | | EvalSubExprsIfNeeded apu. | ||
* | Silenced a compilation warning | Benoit Steiner | 2014-06-13 |
| | |||
* | Reworked the expression evaluation mechanism in order to make it possible to ↵ | Benoit Steiner | 2014-06-13 |
| | | | | | | | | efficiently compute convolutions and contractions in the future: * The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class * The assignment itself is now a regular node on the expression tree * The expression evaluators start by recursively evaluating all their subexpressions if needed | ||
* | Fixed a few compilation errors. | Benoit Steiner | 2014-06-10 |
| | |||
* | TensorEval are now typed on the device: this will make it possible to use ↵ | Benoit Steiner | 2014-06-10 |
| | | | | | | partial template specialization to optimize the strategy of each evaluator for each device type. Started work on partial evaluations. | ||
* | Improved support for rvalues in tensor expressions. | Benoit Steiner | 2014-06-09 |
| | |||
* | Added support for convolution and reshaping of tensors. | Benoit Steiner | 2014-06-06 |