| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
| |
This is to support TensorFlow's `tf.math.floormod` for half.
|
|
|
|
|
| |
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.
In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.
In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.
The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.
Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.
In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.
Changing `half` to take inputs by value fixes this. Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.
Also eliminated some compile warnings for powerpc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.
`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.
Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.
Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang. Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included. However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).
To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current implementations fail to consider half-float packets, only
half-float scalars. Added specializations for packets on AVX, AVX512 and
NEON. Added tests to `special_packetmath`.
The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.
Tested with AVX, AVX512.
|
|
|
|
|
|
|
|
|
|
| |
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.
Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.
Fixes #2083
|
|
|
|
| |
This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
|
| |
|
|
|
|
|
|
|
| |
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
|
|
|
|
|
|
|
|
|
| |
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`. This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.
Fixes #2077.
|
|
|
|
|
|
|
|
| |
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`. Allows `packetmath_13` to pass.
Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.
Also modified the `bfloat16_float` test to match.
Tested with `cortex-a53` and `cortex-a55`.
|
|
|
|
|
|
|
|
|
| |
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain
Eigen::{half,bfloat16} -> float -> bool
We thus remove the explicit cast operator to bool.
|
|
|
|
|
|
| |
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
|
|
|
|
|
|
|
|
|
|
| |
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`. Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
|
|
|
|
|
| |
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`. This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`
The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).
To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`. We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error
https://gitlab.com/libeigen/eigen/-/commit/e265f7ed8e59c26e15f2c35162c6b8da1c5d594f
```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
^~~~~~
Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
Error generating file
/home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o
test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```
The fix is in this commit is trivial. Please review and merge
|
|
|
|
|
|
|
|
|
|
|
| |
The following functions have been removed:
Eigen::half fabsh(const Eigen::half&)
Eigen::half exph(const Eigen::half&)
Eigen::half sqrth(const Eigen::half&)
Eigen::half powh(const Eigen::half&, const Eigen::half&)
Eigen::half floorh(const Eigen::half&)
Eigen::half ceilh(const Eigen::half&)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces
* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half
The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.
This closes #1940.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following commit causes regressions in the ROCm/HIP support for Eigen
https://gitlab.com/libeigen/eigen/-/commit/e55182ac09885d7558adf75e9e230b051a721c18
I suspect the same breakages occur on the CUDA side too.
The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.
When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.
This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
|
|
|
|
|
|
| |
constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
|
|
|
|
|
|
| |
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
|
|
|
|
|
|
| |
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.
We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.
Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
|
|
|
|
| |
the Eigen::Half packet type
|
|
|
|
|
| |
<complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
|
|
half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
|