| Commit message (Collapse) | Author | Age |
|
|
|
| |
broken by c6953f799b01d36f4236b64f351cc1446e0abe17.
|
|
|
|
|
|
| |
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.
That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
|
|
|
|
|
|
|
|
|
|
| |
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.
This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)
Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.
This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
|
|
|
|
| |
the Eigen::Half packet type
|
|
|
|
| |
expose pmul/add/div/min/max on host
|
|
|
|
| |
CUDA build failures.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
CMakeLists.txt | 26 +++++++++----------
Eigen/src/Core/GenericPacketMath.h | 2 +-
Eigen/src/SparseLU/SparseLU.h | 2 +-
bench/bench_norm.cpp | 2 +-
doc/HiPerformance.dox | 2 +-
doc/QuickStartGuide.dox | 2 +-
.../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++---
.../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +-
.../src/Tensor/TensorForwardDeclarations.h | 4 +--
.../src/Tensor/TensorGpuHipCudaDefines.h | 2 +-
.../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +-
.../CXX11/src/Tensor/TensorReductionGpu.h | 2 +-
.../test/cxx11_tensor_concatenation.cpp | 2 +-
unsupported/test/cxx11_tensor_executor.cpp | 2 +-
14 files changed, 29 insertions(+), 29 deletions(-)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
Changing "pass-by-reference" argument to be "pass-by-value" instead
(in a __global__ function decl).
"pass-by-reference" arguments to __global__ functions are unwise,
and will be explicitly flagged as errors by the newer versions of HIP.
- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Changes introduced in recent commits breaks the HIP compile.
Adding EIGEN_DEVICE_FUNC attribute to some functions and
calling ::malloc/free instead of the corresponding std:: versions
to get the HIP compile working again
- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Change introduced a recent commit breaks the HIP compile
(link stage errors out due to failure to inline a function).
Disabling the recently introduced code (only for HIP compile), to get
the eigen nightly testing going again.
Will submit another PR once we have te proper fix.
- Eigen/src/Core/util/ConfigureVectorization.h
Enabling GPU VECTOR support when HIP compiler is in use
(for both the host and device compile phases)
|
| |
|
| |
|
|
|
|
| |
clang-cuda.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
|
|
|
|
| |
unit tests
|
|
unsupported/test directories
|