| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.
That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
|
|
|
|
|
|
|
|
|
|
| |
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.
This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)
Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.
This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
|
|
|
| |
evaluators
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
clang-cuda.
|
|
|
|
| |
unit tests
|
|
|
|
| |
aliases
|
| |
|
|
|
|
|
|
|
|
|
|
| |
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
|
|
|
|
| |
TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
|
| |
|
| |
|
| |
|
|
|
|
| |
estimate the cost of evaluating tensor expressions.
|
|
|
|
| |
All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
|
| |
|
|
|
|
| |
the evaluation of an expression.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
previously disabled by mistake
|
| |
|
| |
|
| |
|
|
|
|
| |
Misc fixes and API cleanups.
|
|
|
|
| |
Generalized a TensorMap constructor
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
EvalSubExprsIfNeeded apu.
|
| |
|
|
|
|
|
|
|
|
| |
efficiently compute convolutions and contractions in the future:
* The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class
* The assignment itself is now a regular node on the expression tree
* The expression evaluators start by recursively evaluating all their subexpressions if needed
|
| |
|
|
|
|
|
|
| |
partial template specialization to optimize the strategy of each evaluator for each device type.
Started work on partial evaluations.
|
| |
|
|
|