eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Fix sparse_extra_3, disable counting temporaries for testing ↵	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048
*	Re-enable Arm Neon Eigen::half packets of size 8	David Tellenbach	2020-11-18
\| \| \| \| \| \|	- Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half
*	Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom	Antonio Sanchez	2020-11-18
\| \| \| \| \| \| \| \| \| \|	The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.
*	EOF newline added to InverseSize4.	Antonio Sanchez	2020-11-18
\| \| \| \| \|	Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.
*	Add missing parens around macro argument.	Rasmus Munk Larsen	2020-11-18
\|
*	Replace SSE_SHUFFLE_MASK macro with shuffle_mask.	Rasmus Munk Larsen	2020-11-17
\|
*	Avoid promotion of Arm __fp16 to float in Neon PacketMath	David Tellenbach	2020-11-17
\| \| \| \| \| \|	Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.
*	Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`.	Antonio Sanchez	2020-11-17
\| \| \| \| \|	`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
*	Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation ↵	Guoqiang QI	2020-11-17
\| \| \| \|	using PacketMath.
*	Add EIGEN_DEVICE_FUNC to TranspositionsBase	acxz	2020-11-16
\| \| \| \|	Fixes #2057.
*	Explicit casts of S -> std::complex<T>	Antonio Sanchez	2020-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.
*	Fix typo in NEON/PacketMath.h	guoqiangqi	2020-11-13
\|
*	Simplify expression for inner product fallback in Gemv product evaluator.	Rasmus Munk Larsen	2020-11-12
\|
*	Remove redundant branch for handling dynamic vector*vector. This will be ↵	Rasmus Munk Larsen	2020-11-12
\| \| \| \|	handled by the equivalent branch in the specialization for GemvProduct.
*	Optimize matrixmatrix and matrixvector products when they correspond to ↵	Rasmus Munk Larsen	2020-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inner products at runtime. This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k). name old time/op new time/op delta BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~ BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~ BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66% BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~ BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26% BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~ BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~ BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~ BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35% BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00% BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57% BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52% BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72% BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89% BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~ BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~ BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91% BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~ BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57% BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07% BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11% BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23% BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~ BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~
*	Add support for dynamic dispatch of MMA instructions for POWER 10	Pedro Caldeira	2020-11-12
\|
*	remove annotation for first declaration of default con/destruction	acxz	2020-11-12
\|
*	[SYCL Function pointer Issue]: SYCL does not support function pointer inside ↵	mehdi-goli	2020-11-12
\| \| \| \|	the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects.
*	Fix issue2045 which get a error case _mm256_set_m128d op not supported by ↵	guoqiangqi	2020-11-04
\| \| \| \|	gcc 7.x
*	Fix for ROCm (and CUDA?) breakage - 201029	Deven Desai	2020-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error https://gitlab.com/libeigen/eigen/-/commit/e265f7ed8e59c26e15f2c35162c6b8da1c5d594f ``` Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20: In file included from /home/rocm-user/eigen/test/main.h:355: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:169: /home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'? return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t>(ptr))); ^~~~~~ Eigen::numext /home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here namespace numext { ^ 1 error generated when compiling for gfx900. CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message): Error generating file /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed make[3]: [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1 CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed make[2]: * [test/CMakeFiles/gpu_basic.dir/all] Error 2 CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed make[1]: * [test/CMakeFiles/gpu_basic.dir/rule] Error 2 Makefile:5401: recipe for target 'gpu_basic' failed make: * [gpu_basic] Error 2 ``` The fix is in this commit is trivial. Please review and merge
*	Remove unused functions in Half.h.	David Tellenbach	2020-10-29
\| \| \| \| \| \| \| \| \| \| \|	The following functions have been removed: Eigen::half fabsh(const Eigen::half&) Eigen::half exph(const Eigen::half&) Eigen::half sqrth(const Eigen::half&) Eigen::half powh(const Eigen::half&, const Eigen::half&) Eigen::half floorh(const Eigen::half&) Eigen::half ceilh(const Eigen::half&)
*	Replace numext::as_uint with numext::bit_cast<numext::uint32_t>	David Tellenbach	2020-10-29
\|
*	Add support for Armv8.2-a __fp16	David Tellenbach	2020-10-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.
*	[Missing SYCL math op]: Addin the missing LDEXP Function for SYCL.	mehdi-goli	2020-10-28
\|
*	[Fixing expf issue]: Eigen uses the packet type operation for scaler type ↵	mehdi-goli	2020-10-28
\| \| \| \|	float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend.
*	Improve polynomial evaluation with instruction-level parallelism for ↵	guoqiangqi	2020-10-20
\| \| \| \|	pexp_float and pexp<Packet16f>
*	remove unnecessary specialize template of pexp for scale float/double	guoqiangqi	2020-10-19
\|
*	Fix missing `pfirst<Packet16b>` for MSVC.	Antonio Sanchez	2020-10-16
\| \| \| \| \|	It was only defined under one `#ifdef` case. This fixes the `packetmath_14` test for MSVC.
*	Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is ↵	Rasmus Munk Larsen	2020-10-15
\| \| \| \|	not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent.
*	Fix for ROCm/HIP breakage - 201013	Deven Desai	2020-10-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following commit seems to have introduced regressions in ROCm/HIP support. https://gitlab.com/libeigen/eigen/-/commit/183a208212353ccf81a664d25dc7660b6269acdd It causes some unit-tests to fail with the following error ``` ... Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'? ... Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'? ... Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'? ... ``` The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above. It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( https://gitlab.com/libeigen/eigen/-/commit/4091f6b25c5ad0ca3f7c00bd82bfd7ca1bbedee3 ). Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option. Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either. Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
*	Revert change from 4e4d3f32d168ed9ce09d950f099a60ddcd11240f that broke ↵	Rasmus Munk Larsen	2020-10-15
\| \| \| \|	BFloat16.h build with older compilers.
*	Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 ↵	Guoqiang QI	2020-10-15
\| \| \| \|	plog<Packet16f> op with generic api
*	Add specializations for pmin/pmax with prescribed NaN propagation semantics ↵	Rasmus Munk Larsen	2020-10-14
\| \| \| \|	for SSE/AVX/AVX512.
*	Revert generic implementation of `predux`, since it break compilation of ↵	Rasmus Munk Larsen	2020-10-14
\| \| \| \|	`predux_any` with MSVC.
*	Add MatrixBase::cwiseArg()	David Tellenbach	2020-10-14
\|
*	Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and ↵	Rasmus Munk Larsen	2020-10-13
\| \| \| \|	`predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.
*	undefine EIGEN_CONSTEXPR before redefinition	acxz	2020-10-12
\|
*	Make bitwise_helper a device function to unbreak GPU builds.	Rasmus Munk Larsen	2020-10-10
\|
*	Clean up packetmath tests and fix various bugs to make bfloat16 pass ↵	Rasmus Munk Larsen	2020-10-09
\| \| \| \|	(almost) all packetmath tests with SSE, AVX, and AVX512.
*	Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD	David Tellenbach	2020-10-09
\|
*	Implement generic bitwise logical packet ops that work for all types.	Rasmus Munk Larsen	2020-10-08
\|
*	Don't make assumptions about NaN-propagation for pmin/pmax - it various ↵	Rasmus Munk Larsen	2020-10-07
\| \| \| \| \| \|	across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.
*	Use reinterpret_cast instead of C-style cast in Inverse_NEON.h	David Tellenbach	2020-10-04
\|
*	Don't cast away const in Inverse_NEON.h.	Rasmus Munk Larsen	2020-10-02
\|
*	Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h.	Rasmus Munk Larsen	2020-10-02
\|
*	Fix CUDA build breakage and incorrect result for absdiff on HIP with long ↵	Rasmus Munk Larsen	2020-10-02
\| \| \| \|	double arguments.
*	dont use =* might not return a Scalar	janos	2020-10-02
\|
*	Fix build breakage with MSVC 2019, which does not support MMX intrinsics for ↵	Rasmus Munk Larsen	2020-10-01
\| \| \| \| \| \| \| \|	64 bit builds, see: https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c Instead use the equivalent SSE2 intrinsics.
*	Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The ↵	Rasmus Munk Larsen	2020-10-01
\| \| \| \|	non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564.
*	Specialize pldexp_double and pfdexp_double and get rid of Packet2l ↵	Rasmus Munk Larsen	2020-09-30
\| \| \| \|	definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead.