eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵	Steve Bronder	2021-03-24
\| \| \| \| \| \|	innerStride(), outerStride(), and size()"" This reverts commit 5f0b4a4010af4cbf6161a0d1a03a747addc44a5d.
*	Revert "Uses _mm512_abs_pd for Packet8d pabs"	Christoph Hertzberg	2021-03-23
\| \| \|	This reverts commit f019b97aca82071f35726b1aaebf1c598770f0f5
*	Remove yet another comma at end of enum	David Tellenbach	2021-03-18
\|
*	Uses _mm512_abs_pd for Packet8d pabs	Steve Bronder	2021-03-18
\|
*	Proposed fix for issue #2187	Niek Bouman	2021-03-18
\|
*	Augment NumTraits with min/max_exponent() again.	Antonio Sanchez	2021-03-16
\| \| \| \| \| \| \| \| \| \| \| \|	Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase where possible. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. The previous MR !443 failed for c++03 due to lack of `constexpr`. Because of this, we need to keep around the `std::numeric_limits` version in enum expressions until the switch to c++11. Fixes #2148
*	Fix another warning on missing commas	David Tellenbach	2021-03-17
\|
*	Revert "Augment NumTraits with min/max_exponent()."	David Tellenbach	2021-03-17
\| \| \| \|	This reverts commit 75ce9cd2a7aefaaea8543e2db14ce4dc149eeb03.
*	Augment NumTraits with min/max_exponent().	Antonio Sanchez	2021-03-17
\| \| \| \| \| \| \| \|	Replace usage of `std::numeric_limits<...>::min/max_exponent` in codebase. Also replaced some other `numeric_limits` usages in affected tests with the `NumTraits` equivalent. Fixes #2148
*	Silence warning on comma at end of enumerator list	David Tellenbach	2021-03-17
\|
*	Updated SelfAdjointEigenSolver documentation to include that the ↵	Theo Fletcher	2021-03-16
\| \| \| \|	eigenvectors matrix is unitary.
*	Add NaN propagation options to minCoeff/maxCoeff visitors.	Rasmus Munk Larsen	2021-03-16
\|
*	Add fmod(half, half).	Antonio Sanchez	2021-03-15
\| \| \| \|	This is to support TensorFlow's `tf.math.floormod` for half.
*	Fix numext::round pre c++11 for large inputs.	Antonio Sanchez	2021-03-15
\| \| \| \| \| \| \| \|	This is to resolve an issue for large inputs when +0.5 can actually lead to +1 if the input doesn't have enough precision to resolve the addition - leading to an off-by-one error. See discussion on 9a663973.
*	Fix pround and add print	Chip Kerchner	2021-03-15
\|
*	Fix NVCC+ICC issues.	Antonio Sanchez	2021-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180
*	Add increment/decrement operators to Eigen::half.	Antonio Sanchez	2021-03-15
\| \| \| \| \|	This is for consistency with bfloat16, and to support initialization with `std::iota`.
*	Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.	Antonio Sanchez	2021-03-10
\| \| \| \| \|	Doesn't seem to correctly select the register type, and most types lead to compiler crashes.
*	Re-implement move assignments.	Antonio Sanchez	2021-03-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original swap approach leads to potential undefined behavior (reading uninitialized memory) and results in unnecessary copying of data for static storage. Here we pass down the move assignment to the underlying storage. Static storage does a one-way copy, dynamic storage does a swap. Modified the tests to no longer read from the moved-from matrix/tensor, since that can lead to UB. Added a test to ensure we do not access uninitialized memory in a move. Fixes: #2119
*	[MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined ↵	Ben Niu	2021-03-10
\| \| \| \|	and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.
*	Fix ambiguous call to CUDA __half constructor.	Antonio Sanchez	2021-03-08
\|
*	Fix typo: DEVICE -> GPU	Antonio Sanchez	2021-03-08
\|
*	Fix non-trivial Half constructor for CUDA.	Antonio Sanchez	2021-03-08
\| \| \| \| \| \| \| \|	Both CUDA and HIP require trivial default constructors for types used in shared memory. Otherwise failing with ``` error: initialization is not supported for __shared__ variables. ```
*	Revert stack allocation limit change that crept in.	Antonio Sanchez	2021-03-05
\| \| \| \|	This was accidentally introduced when copying changes between repos.
*	Changing the Eigen::half implementation for HIP	Deven Desai	2021-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build. In the ROCm Tensorflow build, * files that do not contain ant GPU code get compiled via gcc, and * files that contnain GPU code get compiled via hipcc. In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function. The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function. Changing the Eigen::half argument to be "pass by reference" seems to workaround the error. In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
*	Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.	Antonio Sanchez	2021-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170
*	Fix rint SSE/NEON again, using optimization barrier.	Antonio Sanchez	2021-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
*	Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵	David Tellenbach	2021-03-05
\| \| \| \| \| \| \|	innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
*	Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵	Steve Bronder	2021-03-04
\| \| \| \|	outerStride(), and size()
*	Revert "Fix rint for SSE/NEON."	Antonio Sánchez	2021-03-03
\| \| \|	This reverts commit e72dfeb8b9fa5662831b5d0bb9d132521f9173dd
*	Fix rint for SSE/NEON.	Antonio Sanchez	2021-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems sometimes with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
*	Add print for SSE/NEON, use NEON rounding intrinsics if available.	Antonio Sanchez	2021-02-27
\| \| \| \| \| \| \| \| \| \|	In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.
*	Make half/bfloat16 constructor take inputs by value, fix powerpc test.	Antonio Sanchez	2021-02-27
\| \| \| \| \| \| \| \| \| \| \| \|	Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.
*	Remove unused include	Christoph Hertzberg	2021-02-27
\|
*	clang 10 aggressively warns about precision loss when converting int to ↵	Christoph Hertzberg	2021-02-27
\| \| \| \| \| \|	float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
*	Fix some enum-enum conversion warnings	Christoph Hertzberg	2021-02-27
\| \| \| \|	(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
*	Fixed/masked more implicit copy constructor warnings	Christoph Hertzberg	2021-02-27
\| \| \| \|	(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
*	Fix double-promotion warnings	Christoph Hertzberg	2021-02-27
\| \| \| \|	(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
*	Fix NEON sqrt for 32-bit, add prsqrt.	Antonio Sanchez	2021-02-26
\| \| \| \| \| \| \| \| \| \| \| \|	With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
*	Merge branch 'rmlarsen1/eigen-nan_prop'	Rasmus Munk Larsen	2021-02-26
\|\
\| *	Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop	Rasmus Munk Larsen	2021-02-26
\| \|\
\| * \|	Add TODO.	Rasmus Munk Larsen	2021-02-26
\| \| \|
\| * \|	Defer default for minCoeff/maxCoeff to templated variant.	Rasmus Munk Larsen	2021-02-26
\| \| \|
* \| \|	Fix floor/ceil for NEON fp16.	Antonio Sanchez	2021-02-25
\| \| \| \| \| \| \| \| \| \| \| \|	Forgot to test this. Fixes bug introduced in !416.
* \| \|	Fix SSE/NEON pfloor/pceil for saturated values.	Antonio Sanchez	2021-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.
\| \| *	Fix indentation.	Rasmus Munk Larsen	2021-02-25
\| \| \|
\| \| *	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵	Rasmus Munk Larsen	2021-02-25
\| \|/ \|/\| \| \| \| \|	reductions.
* \|	Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.	Chip-Kerchner	2021-02-24
\| \|
\| *	Fix indentation.	Rasmus Munk Larsen	2021-02-24
\| \|
\| *	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵	Rasmus Munk Larsen	2021-02-25
\|/ \| \| \|	reductions.