eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Remove comma at end of enumerator list in NEON PacketMath	David Tellenbach	2020-12-10
\|
*	Fix a typo in SparseMatrix documentation.	David Tellenbach	2020-12-09
\| \| \| \|	This fixes issue #2091.
*	Implement vectorized complex square root.	Rasmus Munk Larsen	2020-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #1905 Measured speedup for sqrt of `complex<float>` on Skylake: SSE: ``` name old time/op new time/op delta BM_eigen_sqrt_ctype/1 49.4ns ± 0% 54.3ns ± 0% +10.01% BM_eigen_sqrt_ctype/8 332ns ± 0% 50ns ± 1% -84.97% BM_eigen_sqrt_ctype/64 2.81µs ± 1% 0.38µs ± 0% -86.49% BM_eigen_sqrt_ctype/512 23.8µs ± 0% 3.0µs ± 0% -87.32% BM_eigen_sqrt_ctype/4k 202µs ± 0% 24µs ± 2% -88.03% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.19ms ± 0% -88.18% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 1.5ms ± 1% -88.20% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 6.2ms ± 0% -88.18% ``` AVX2: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.6ns ± 0% 55.6ns ± 0% +3.71% BM_eigen_sqrt_ctype/8 334ns ± 0% 27ns ± 0% -91.86% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.22µs ± 2% -92.28% BM_eigen_sqrt_ctype/512 23.8µs ± 1% 1.7µs ± 1% -92.81% BM_eigen_sqrt_ctype/4k 201µs ± 0% 14µs ± 1% -93.24% BM_eigen_sqrt_ctype/32k 1.62ms ± 0% 0.11ms ± 1% -93.29% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.9ms ± 1% -93.31% BM_eigen_sqrt_ctype/1M 52.0ms ± 0% 3.5ms ± 1% -93.31% ``` AVX512: ``` name old cpu/op new cpu/op delta BM_eigen_sqrt_ctype/1 53.7ns ± 0% 56.2ns ± 1% +4.75% BM_eigen_sqrt_ctype/8 334ns ± 0% 18ns ± 2% -94.63% BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.12µs ± 1% -95.54% BM_eigen_sqrt_ctype/512 23.9µs ± 1% 1.0µs ± 1% -95.89% BM_eigen_sqrt_ctype/4k 202µs ± 0% 8µs ± 1% -96.13% BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.06ms ± 1% -96.15% BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.5ms ± 4% -96.11% BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 2.0ms ± 1% -96.13% ```
*	Fix host/device calls for __half.	Antonio Sanchez	2020-12-08
\| \| \| \| \| \|	The previous code had `__host__ __device__` functions calling `__device__` functions (e.g. `__low2half`) which caused build failures in tensorflow. Also tried to simplify the `#ifdef` guards to make them more clear.
*	- Enabling PropagateNaN and PropagateNumbers for NEON.	Everton Constantino	2020-12-08
\| \| \| \|	- Adding propagate tests to bfloat16.
*	Fix unused warning on new `dense_assignment_loop` impl.	Antonio Sanchez	2020-12-07
\|
*	Add specialization for compile-time zero-sized dense assignment.	Antonio Sanchez	2020-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current `dense_assignment_loop` implementations, if the destination's inner or outer size is zero at compile time and if the kernel involves a product, we currently get a compile error (#2080). This is triggered by attempting to multiply a non-existent row by a column (or vice-versa). To address this, we add a specialization for zero-sized assignments (`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static check to ensure the size is in-fact zero. This now seems to be the only existing use of `AllAtOnceTraversal`. Fixes #2080.
*	Clean up `#if`s in GPU PacketPath.	Antonio Sanchez	2020-12-04
\| \| \| \| \| \| \| \| \| \| \|	Removed redundant checks and redundant code for CUDA/HIP. Note: there are several issues here of calling `__device__` functions from `__host__ __device__` functions, in particular `__low2half`. We do not address that here -- only modifying this file enough to get our current tests to compile. Fixed: #1847
*	Add log2() to Eigen.	Rasmus Munk Larsen	2020-12-04
\|
*	Fix bad NEON fp16 check	Antonio Sanchez	2020-12-04
\|
*	Special function implementations for half/bfloat16 packets.	Antonio Sanchez	2020-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.
*	Remove duplicate #if clause	David Tellenbach	2020-12-04
\|
*	Fix shfl* macros for CUDA/HIP	Antonio Sanchez	2020-12-04
\| \| \| \| \| \| \| \| \| \|	The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so they are defined whenever the corresponding CUDA/HIP ones are. Also changed the HIP/CUDA<9.0 versions to cast to int instead of doing the conversion `half`<->`float`. Fixes #2083
*	The function 'prefetch' did not work correctly on the win64 platform	shrek1402	2020-12-04
\|
*	Revert "Add log2() operator to Eigen"	Rasmus Munk Larsen	2020-12-03
\| \| \| \|	This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
*	Add log2() operator to Eigen	Rasmus Munk Larsen	2020-12-03
\|
*	Small cleanup of generic plog implementations:	Rasmus Munk Larsen	2020-12-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding the term eln(2) is split into two step for no obvious reason. This dates back to the original Cephes code from which the algorithm is adapted. It appears that this was done in Cephes to prevent the compiler from reordering the addition of the 3 terms in the approximation log(1+x) ~= x - 0.5x^2 + x^3*P(x)/Q(x) which must be added in reverse order since \|x\| < (sqrt(2)-1). This allows rewriting the code to just 2 pmadd and 1 padd instructions, which on a Skylake processor speeds up the code by 5-7%.
*	Include chrono in main for c++11.	Antonio Sanchez	2020-12-03
\| \| \| \|	Hack to fix tensor tests, since min/max are overridden by `main.h`.
*	Clean up the Tensor header and get rid of the EIGEN_SLEEP macro.	Rasmus Munk Larsen	2020-12-02
\|
*	Fix typo in `F32MaskToBf16Mask`.	Antonio Sanchez	2020-12-02
\|
*	Fix neon cmp* functions for bf16.	Antonio Sanchez	2020-12-02
\| \| \| \| \| \| \| \| \| \| \|	The current impl corrupts the comparison masks when converting from float back to bfloat16. The resulting masks are then no longer all zeros or all ones, which breaks when used with `pselect` (e.g. in `pmin<PropagateNumbers>`). This was causing `packetmath_15` to fail on arm. Introducing a simple `F32MaskToBf16Mask` corrects this (takes the lower 16-bits for each float mask).
*	Implement CUDA __shfl* for Eigen::half	Antonio Sanchez	2020-12-01
\| \| \| \| \| \| \|	Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.
*	Fix a few issues for AVX512. This change enables vectorized versions of log, ↵	Rasmus Munk Larsen	2020-12-01
\| \| \| \|	exp, log1p, expm1 when AVX512DQ is not available.
*	Fix #2077, `EIGEN_CONSTEXPR` in `Half`.	Antonio Sanchez	2020-12-01
\| \| \| \| \| \| \| \| \|	`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.
*	add EIGEN_DEVICE_FUNC to methods	acxz	2020-12-01
\|
*	AVX512 missing ops.	Antonio Sanchez	2020-11-30
\| \| \| \| \| \| \| \| \| \|	This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).
*	Fix typo in doc	Florian Maurin	2020-11-30
\|
*	Workaround for doxygen class template titles in which the template	Jim Lersch	2020-11-27
\| \| \| \| \| \|	part of the class signature is lost due to a problem with forward declarations. The problem is probably caused by doxygen bug #7689. It is confirmed to be fixed in doxygen >= 1.8.19.
*	Fix doxygen class blocks that were not associated with the correct classes.	Jim Lersch	2020-11-27
\|
*	Include CMakeDependentOption to be able to use cmake_dependent_option	David Tellenbach	2020-11-27
\|
*	Make inclusion of doc sub-directory optional by adjusting options.	Bowie Owens	2020-11-27
\| \| \| \| \| \| \| \| \| \|	Allows exclusion of doc and related targets to help when using eigen via add_subdirectory(). Requested by: https://gitlab.com/libeigen/eigen/-/issues/1842 Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
*	check for include dirs set	filippobrizzi	2020-11-26
\|
*	Fix some packet-functions in the IBM ZVector packet-math.	Andreas Krebbel	2020-11-25
\|
*	Revert "Fix Half NaN definition and test."	Rasmus Munk Larsen	2020-11-24
\| \| \| \|	This reverts commit c770746d709686ef2b8b652616d9232f9b028e78.
*	Fix Half NaN definition and test.	Rasmus Munk Larsen	2020-11-24
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.
*	Fix boolean float conversion and product warnings.	Antonio Sanchez	2020-11-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.
*	Implement missing AVX half ops.	Antonio Sanchez	2020-11-24
\| \| \| \| \| \| \| \|	Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).
*	Fix Half NaN definition and test.	Antonio Sanchez	2020-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.
*	Update AVX half packets, disable test.	Antonio Sanchez	2020-11-21
\| \| \| \| \| \| \| \|	The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.
*	Fixes duplicate symbol when building blas	Antonio Sanchez	2020-11-20
\| \| \| \| \| \| \|	Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.
*	Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool	David Tellenbach	2020-11-19
\| \| \| \| \| \| \| \| \|	Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.
*	Fix sparse_extra_3, disable counting temporaries for testing ↵	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048
*	Re-enable Arm Neon Eigen::half packets of size 8	David Tellenbach	2020-11-18
\| \| \| \| \| \|	- Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half
*	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \| \| \| \|	The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.
*	Initialize primitives to fix -Wuninitialized-const-reference.	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `meta` test generates warnings with the latest version of clang due to passing uninitialized variables as const reference arguments. ``` test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference] VERIFY(( check_is_convertible(a.dot(b), f) )); ``` We don't actually use the variables, but initializing them eliminates the new warning. Fixes #2067.
*	Fix rule-of-3 for the Tensor module.	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \|	Adds copy constructors to Tensor ops, inherits assignment operators from `TensorBase`. Addresses #1863
*	EOF newline added to InverseSize4.	Antonio Sanchez	2020-11-18
\| \| \| \| \|	Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.
*	Add missing parens around macro argument.	Rasmus Munk Larsen	2020-11-18
\|
*	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	Rasmus Munk Larsen	2020-11-17
\|
*	Avoid promotion of Arm __fp16 to float in Neon PacketMath	David Tellenbach	2020-11-17
\| \| \| \| \| \|	Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.