aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Replace `-2147483648` by `-0.0f` or `-0.0` constants (this should fix #2189).Gravatar Christoph Hertzberg2021-04-07
| | | | Also, remove unnecessary `pgather` operations.
* Align local arrays to Packet boundary.Gravatar Rasmus Munk Larsen2021-04-06
|
* Fix clang tidy warnings in AnnoyingScalar.Gravatar Antonio Sanchez2021-04-05
| | | | | | | | Clang-tidy complains that full specializations in headers can cause ODR violations. Marked these as `inline` to fix. It also complains about renaming arguments in specializations. Set the argument names to match.
* Fix SelfAdjoingEigenSolver (#2191)Gravatar Antonio Sanchez2021-04-05
| | | | | | | | | | | | | | Adjust the relaxation step to use the condition ``` abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1])) ``` for setting the subdiagonal entry to zero. Also adjust Wilkinson shift for small `e = subdiag[end-1]` - I couldn't find a reference for the original, and it was not consistent with the Wilkinson definition. Fixes #2191.
* Fix two bugs in commitGravatar Rasmus Munk Larsen2021-04-02
|
* Fix address of temporary object errors in clang11.Gravatar Chip Kerchner2021-04-02
| | | | This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
* Add CI infrastructure for pre-merge smoke tests.Gravatar David Tellenbach2021-04-01
| | | | | | This patch adds pre-merge smoke tests for x86 Linux using gcc-10 and clang-10. Closes #2188.
* Add CMake infrastructure for smoke testingGravatar David Tellenbach2021-03-31
| | | | | Necessary CMake changes to implement pre-merge smoke tests running via CI.
* Add an info() method to the SVDBase class to make it possible to tell the ↵Gravatar Rasmus Munk Larsen2021-03-31
| | | | | | user that the computation failed, possibly due to invalid input. Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
* Add GitLab templates for issues and merge requestsGravatar Guoqiang QI2021-03-31
| | | | | | This patch adds GitLab templates for bug reports, feature and merge requests. This closes #2117.
* Fix CUDA constexpr issues for numeric_limits.Gravatar Antonio Sanchez2021-03-30
| | | | | | | | | | | | | | | | Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).
* Use Index type in loop over coefficients.Gravatar Antonio Sanchez2021-03-29
| | | | | Previously was `int`. Brought up by Kyle Snow (Polaris Geospatial Services) on the mailing list.
* Eliminate `round_impl` double-promotion warnings for c++03.Gravatar Antonio Sanchez2021-03-25
|
* Un-defining EIGEN_HAS_CONSTEXPR on the HIP platformGravatar Deven Desai2021-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit https://gitlab.com/libeigen/eigen/-/commit/e7b8643d70dfbb02ad94186169a8f16041f05bc2 ``` In file included from /home/rocm-user/eigen/test/main.h:360: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:162: /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static float (max)() { ^ /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression return HIPRT_MAX_NORMAL_F; ^ /home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F' #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) ^ /opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here __device__ static inline float __int_as_float(int x) { ^ ``` The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file. Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
* Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).Gravatar Chip Kerchner2021-03-25
|
* Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar Steve Bronder2021-03-24
| | | | | | innerStride(), outerStride(), and size()"" This reverts commit 5f0b4a4010af4cbf6161a0d1a03a747addc44a5d.
* Eliminate mixingtypes_7 warning.Gravatar Antonio Sanchez2021-03-24
| | | | | | `g_called` is not used in subtest 7, so was generating a `-Wunneeded-internal-declaration` warnings. Here we silence it by initializing the static variable.
* Revert "Uses _mm512_abs_pd for Packet8d pabs"Gravatar Christoph Hertzberg2021-03-23
| | | This reverts commit f019b97aca82071f35726b1aaebf1c598770f0f5
* Re-enable CI for PowerGravatar David Tellenbach2021-03-22
|
* Remove yet another comma at end of enumGravatar David Tellenbach2021-03-18
|
* Uses _mm512_abs_pd for Packet8d pabsGravatar Steve Bronder2021-03-18
|
* Split test commainitializer into two substestsGravatar David Tellenbach2021-03-18
|
* Use singleton pattern for static registered tests.Gravatar Antonio Sanchez2021-03-18
| | | | | | | | | | The original fails with nvcc+msvc - there's a static order of initialization issue leading to registered tests being cleared. The test then fails on ``` VERIFY(EigenTest::all().size()>0); ``` since `EigenTest` no longer contains any tests. The singleton pattern fixes this.
* Proposed fix for issue #2187Gravatar Niek Bouman2021-03-18
|
* Augment NumTraits with min/max_exponent() again.Gravatar Antonio Sanchez2021-03-16
| | | | | | | | | | | | Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148
* Fix another warning on missing commasGravatar David Tellenbach2021-03-17
|
* Revert "Augment NumTraits with min/max_exponent()."Gravatar David Tellenbach2021-03-17
| | | | This reverts commit 75ce9cd2a7aefaaea8543e2db14ce4dc149eeb03.
* Augment NumTraits with min/max_exponent().Gravatar Antonio Sanchez2021-03-17
| | | | | | | | Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148
* Silence warning on comma at end of enumerator listGravatar David Tellenbach2021-03-17
|
* Updated SelfAdjointEigenSolver documentation to include that the ↵Gravatar Theo Fletcher2021-03-16
| | | | eigenvectors matrix is unitary.
* Add NaN propagation options to minCoeff/maxCoeff visitors.Gravatar Rasmus Munk Larsen2021-03-16
|
* Fixed output of complex matricesGravatar Jens Wehner2021-03-15
|
* Add fmod(half, half).Gravatar Antonio Sanchez2021-03-15
| | | | This is to support TensorFlow's `tf.math.floormod` for half.
* Fix numext::round pre c++11 for large inputs.Gravatar Antonio Sanchez2021-03-15
| | | | | | | | This is to resolve an issue for large inputs when +0.5 can actually lead to +1 if the input doesn't have enough precision to resolve the addition - leading to an off-by-one error. See discussion on 9a663973.
* Fix pround and add printGravatar Chip Kerchner2021-03-15
|
* Fix NVCC+ICC issues.Gravatar Antonio Sanchez2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | | NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180
* Add increment/decrement operators to Eigen::half.Gravatar Antonio Sanchez2021-03-15
| | | | | This is for consistency with bfloat16, and to support initialization with `std::iota`.
* Bump up rand histogram threshold.Gravatar Antonio Sanchez2021-03-10
| | | | | | | The previous one sometimes fails for MSVC which has a poor random number generator. Fixes #2182
* Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.Gravatar Antonio Sanchez2021-03-10
| | | | | Doesn't seem to correctly select the register type, and most types lead to compiler crashes.
* Re-implement move assignments.Gravatar Antonio Sanchez2021-03-10
| | | | | | | | | | | | | | | The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119
* [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined ↵Gravatar Ben Niu2021-03-10
| | | | and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.
* Fix ambiguous call to CUDA __half constructor.Gravatar Antonio Sanchez2021-03-08
|
* Fix typo: DEVICE -> GPUGravatar Antonio Sanchez2021-03-08
|
* Fix non-trivial Half constructor for CUDA.Gravatar Antonio Sanchez2021-03-08
| | | | | | | | Both CUDA and HIP require trivial default constructors for types used in shared memory. Otherwise failing with ``` error: initialization is not supported for __shared__ variables. ```
* Revert stack allocation limit change that crept in.Gravatar Antonio Sanchez2021-03-05
| | | | This was accidentally introduced when copying changes between repos.
* Changing the Eigen::half implementation for HIPGravatar Deven Desai2021-03-05
| | | | | | | | | | | | | | | | Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build. In the ROCm Tensorflow build, * files that do not contain ant GPU code get compiled via gcc, and * files that contnain GPU code get compiled via hipcc. In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function. The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function. Changing the Eigen::half argument to be "pass by reference" seems to workaround the error. In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
* Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170
* Fix rint SSE/NEON again, using optimization barrier.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | | | | | | This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar David Tellenbach2021-03-05
| | | | | | | innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
* Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵Gravatar Steve Bronder2021-03-04
| | | | outerStride(), and size()