eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Add ability to permanently enable HIP/CUDA gpu* defines.	Antonio Sanchez	2021-06-11
\| \| \| \| \| \|	When using Eigen for gpu, these simplify portability. If `EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then we do not undefine them.
*	Allow custom TENSOR_CONTRACTION_DISPATCH macro.	Antonio Sanchez	2021-06-11
\| \| \| \| \| \|	Currently TF lite needs to hack around with the Tensor headers in order to customize the contraction dispatch method. Here we add simple `#ifndef` guards to allow them to provide their own dispatch prior to inclusion.
*	Use bit_cast to create -0.0 for floating point types to avoid compiler ↵	Rasmus Munk Larsen	2021-06-11
\| \| \| \|	optimization changing sign with --ffast-math enabled.
*	Fix c++20 warnings about using enums in arithmetic expressions.	Rasmus Munk Larsen	2021-06-10
\|
*	Fix parsing of version for nvhpc	Nicolas Cornu	2021-06-10
\| \| \| \| \|	As the first line of the version is empty it crashes, so delete first line if it is empty
*	Removed dead code from GPU float16 unit test.	Rohit Santhanam	2021-05-28
\|
*	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.	Cyril Kaiser	2021-05-26
\|
*	Add missing NEON ptranspose implementations.	Antonio Sanchez	2021-05-25
\| \| \| \|	Unified implementation using only `vzip`.
*	Modify Unary/Binary/TernaryOp evaluators to work for non-class types.	Antonio Sanchez	2021-05-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit 11f55b29 to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251
*	predux_half_dowto4 test extended to all applicable packets	Jakub Lichman	2021-05-21
\|
*	Adds macro for checking if C++14 variable templates are supported	Steve Bronder	2021-05-21
\|
*	Use derived object type in conservative_resize_like_impl	Niall Murphy	2021-05-20
\| \| \| \| \| \| \| \| \|	When calling conservativeResize() on a matrix with DontAlign flag, the temporary variable used to perform the resize should have the same Options as the original matrix to ensure that the correct override of swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> & other). Calling the base class swap (i.e in DenseBase) results in assertions errors or memory corruption.
*	ptranpose test for non-square kernels added	Jakub Lichman	2021-05-19
\|
*	Ensure all generated matrices for inverse_4x4 testes are invertible, this ↵	Guoqiang QI	2021-05-13
\| \| \| \|	fix #2248 .
*	Fix calls to device functions from host code	Nathan Luehr	2021-05-11
\|
*	Device implementation of log for std::complex types.	Nathan Luehr	2021-05-11
\|
*	Fix ambiguity due to argument dependent lookup.	Nathan Luehr	2021-05-11
\|
*	Changing the storage of the SSE complex packets to that of the wrapper. This ↵	guoqiangqi	2021-05-10
\| \| \| \|	should fix #2242 .
*	Fix for issue where numext::imag and numext::real are used before they are ↵	Rohit Santhanam	2021-05-10
\| \| \| \|	defined.
*	Restore ABI compatibility for conj with 3.3, fix conflict with boost.	Antonio Sanchez	2021-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112.
*	Clean up gpu device properties.	Antonio Sanchez	2021-05-07
\| \| \| \| \| \| \| \|	Made a class and singleton to encapsulate initialization and retrieval of device properties. Related to !481, which already changed the API to address a static linkage issue.
*	Fix numext::arg return type.	Antonio Sanchez	2021-05-07
\| \| \| \| \| \| \| \|	The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue.
*	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242	Christoph Hertzberg	2021-05-06
\|
*	Simplify TensorRandom and remove time-dependence.	Antonio Sanchez	2021-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Time-dependence prevents tests from being repeatable. This has long been an issue with debugging the tensor tests. Removing this will allow future tests to be repeatable in the usual way. Also, the recently added macros in !476 are causing headaches across different platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE` are sometimes defined with values, sometimes defined as empty, and sometimes not defined at all when they probably should be. This is leading to multiple build breakages. The simplest approach is to generate a seed via `Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a hash based on the current thread ID (since `rand()` isn't supported on GPU). Fixes #1602.
*	Better CUDA complex division.	Antonio Sanchez	2021-04-29
\| \| \| \| \| \|	The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.
*	Add missing pcmp_lt_or_nan for NEON Packet4bf.	Antonio Sanchez	2021-04-27
\|
*	Added complex matrix unit tests for SelfAdjointEigenSolve	Theo Fletcher	2021-04-26
\|
*	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	Jakub Lichman	2021-04-25
\|
*	Tests for pcmp_lt and pcmp_le added	Jakub Lichman	2021-04-23
\|
*	Fix for issue with static global variables in TensorDeviceGpu.h	Turing Eret	2021-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	m_deviceProperties and m_devicePropInitialized are defined as global statics which will define multiple copies which can cause issues if initializeDeviceProp() is called in one translation unit and then m_deviceProperties is used in a different translation unit. Added inline functions getDeviceProperties() and getDevicePropInitialized() which defines those variables as static locals. As per the C++ standard 7.1.2/4, a static local declared in an inline function always refers to the same object, so this should be safer. Credit to Sun Chenggen for this fix. This fixes issue #1475.
*	Check existence of BSD random before use.	Antonio Sanchez	2021-04-22
\| \| \| \| \| \| \| \| \| \| \| \| \|	`TensorRandom` currently relies on BSD `random()`, which is not always available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html) gives the glibc condition: ``` _XOPEN_SOURCE >= 500 \|\| /* Glibc since 2.19: / _DEFAULT_SOURCE \|\| / Glibc <= 2.19: */ _SVID_SOURCE \|\| _BSD_SOURCE ``` In particular, this was failing to compile for MinGW via msys2. If not available, we fall back to using `rand()`.
*	DenseStorage safely copy/swap.	Antonio Sanchez	2021-04-22
\| \| \| \| \| \| \| \|	Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.
*	Make vectorized compute_inverse_size4 compile with AVX.	Rasmus Munk Larsen	2021-04-22
\|
*	Compilation of basicbenchmark fixed	Jakub Lichman	2021-04-21
\|
*	Fix taking address of rvalue compiler issue with TensorFlow (plus other ↵	Chip-Kerchner	2021-04-21
\| \| \| \|	warnings).
*	HasExp added for AVX512 Packet8d	Jakub Lichman	2021-04-20
\|
*	Fix ldexp for AVX512 (#2215)	Antonio Sanchez	2021-04-20
\| \| \| \| \| \| \|	Wrong shuffle was used. Need to interleave low/high halves with a `permute` instruction. Fixes #2215.
*	Before 3.4 branch	David Tellenbach	2021-04-18
\|
*	Modify googlehash use to account for namespace issues.	Antonio Sanchez	2021-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The namespace declaration for googlehash is a configurable macro that can be disabled. In particular, it is disabled within google, causing compile errors since `dense_hash_map`/`sparse_hash_map` are then in the global namespace instead of in `::google`. Here we play a bit of gynastics to allow for both `google::_hash_map` and `_hash_map`, while limiting namespace polution. Symbols within the `::google` namespace are imported into `Eigen::google`. We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be defined.
*	Avoid using uninitialized inputs and if available, use slightly more ↵	Christoph Hertzberg	2021-04-13
\| \| \| \|	efficient `movsd` instruction for `pset1<Packet2cf>`.
*	Fix typo in TensorDimensions.h	Rasmus Munk Larsen	2021-04-12
\|
*	Fix for float16 GPU unit test.	Rohit Santhanam	2021-04-12
\|
*	Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for ↵	Christoph Hertzberg	2021-04-12
\| \| \| \| \| \|	`std::result_of` and `std::invoke_result`. Fixes #2209
*	fixed doxygen for unsupported iterative solver module	Jens Wehner	2021-04-11
\|
*	Make iterators default constructible and assignable, by making...	Christoph Hertzberg	2021-04-09
\|
*	This fixes an issue where the compiler was not choosing the GPU specific ↵	Rohit Santhanam	2021-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	specialization of ScanLauncher. The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault. The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU. But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location. The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled. Previously, the GPU specialization is chosen only if Vectorization is not used.
*	Scaled epsilon the wrong way.	Antonio Sanchez	2021-04-07
\| \| \| \| \| \|	Should have been 0.5 to widen the bounds, since this is inverse precision. Setting to 0.5, however, leads to many more failing tests at Google, so reverting to 1 for now.
*	Replace `-2147483648` by `-0.0f` or `-0.0` constants (this should fix #2189).	Christoph Hertzberg	2021-04-07
\| \| \| \|	Also, remove unnecessary `pgather` operations.
*	Align local arrays to Packet boundary.	Rasmus Munk Larsen	2021-04-06
\|
*	Fix clang tidy warnings in AnnoyingScalar.	Antonio Sanchez	2021-04-05
\| \| \| \| \| \| \| \|	Clang-tidy complains that full specializations in headers can cause ODR violations. Marked these as `inline` to fix. It also complains about renaming arguments in specializations. Set the argument names to match.