eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵	Rasmus Munk Larsen	2021-02-25
\| \| \| \|	reductions.
*	Disable new/delete test for HIP	Antonio Sanchez	2021-02-25
\|
*	Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.	Chip-Kerchner	2021-02-24
\|
*	Don't crash when attempting to slice an empty tensor.	Rasmus Munk Larsen	2021-02-24
\|
*	Remove unused function scalar_cmp_with_cast.	Rasmus Munk Larsen	2021-02-24
\|
*	Cast anonymous enums to int when used in expressions.	Rasmus Munk Larsen	2021-02-24
\|
*	Having forward template function declarations in a P10 file causes bad code ↵	Chip-Kerchner	2021-02-24
\| \| \| \|	in certain situations.
*	Some improvements for kissfft from Martin Reinecke(pocketfft author):	Guoqiang QI	2021-02-24
\| \| \| \| \| \|	1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time. 2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms. 3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
*	Add `invoke_result` and eliminate `result_of` warnings for C++17+.	Antonio Sanchez	2021-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.
*	Fixes to support old and new versions of the compilers for built-ins. Cast ↵	Chip-Kerchner	2021-02-24
\| \| \| \|	to non-const when using vector_pair with certain built-ins.
*	Fix CUDA device new and delete, and add test.	Antonio Sanchez	2021-02-24
\| \| \| \|	HIP does not support new/delete on device, so test is skipped.
*	Eliminate CMake FindPackageHandleStandardArgs warnings.	Antonio Sanchez	2021-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CMake complains that the package name does not match when the case differs, e.g.: ``` CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message): The package name passed to `find_package_handle_standard_args` (UMFPACK) does not match the name of the calling package (Umfpack). This can lead to problems in calling code that expects `find_package` result variables (e.g., `_FOUND`) to follow a certain pattern. Call Stack (most recent call first): cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args) bench/spbench/CMakeLists.txt:24 (find_package) This warning is for project developers. Use -Wno-dev to suppress it. ``` Here we rename the libraries to match their true cases.
*	Disable fast psqrt for NEON.	Antonio Sanchez	2021-02-23
\| \| \| \| \| \| \|	Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.
*	Fix check if GPU compile phase for std::hash	Antonio Sanchez	2021-02-23
\|
*	Fix some CUDA warnings.	Antonio Sanchez	2021-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half` to `half2`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.
*	Accurate pow, part 2. This change adds specializations of log2 and exp2 for ↵	Rasmus Munk Larsen	2021-02-23
\| \| \| \| \| \| \|	double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.
*	Fixed sparse conservativeResize() when both num cols and rows decreased.	Adam Shapiro	2021-02-23
\| \| \| \| \|	The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.
*	Fix compilation errors with later versions of GCC and use of MMA.	Chip-Kerchner	2021-02-22
\|
*	Fixes Bug #1925. Packets should be passed by const reference, even to inline ↵	Christoph Hertzberg	2021-02-20
\| \| \| \|	functions.
*	Add missing adolc isinf/isnan.	Antonio Sanchez	2021-02-19
\| \| \| \| \| \| \|	Also modified cmake/FindAdolc.cmake to eliminate warnings, and added search paths to match install layout. Fixed: #2157
*	Missing change regarding #1910	Christoph Hertzberg	2021-02-19
\|
*	Bug #1910: Make SparseCholesky work for RowMajor matrices	Christoph Hertzberg	2021-02-19
\|
*	Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros ↵	Antonio Sánchez	2021-02-19
\| \| \| \| \|	(only if not HIPCC)." This reverts commit 12fd3dd655e37ba26e7ab236d32163e0aa35da39
*	Return nan at poles of polygamma, digamma, and zeta if limit is not defined	frgossen	2021-02-19
\|
*	Use the Cephes double subtraction trick in pexp<float> even when FMA is ↵	Rasmus Munk Larsen	2021-02-18
\| \| \| \|	available. Otherwise the accuracy drops from 1 ulp to 3 ulp.
*	add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if ↵	Masaki Murooka	2021-02-17
\| \| \| \|	not HIPCC).
*	Bump to 3.4.99	David Tellenbach	2021-02-17
\|
*	Define internal::make_unsigned for [unsigned]long long on macOS.	David Tellenbach	2021-02-17
\| \| \| \| \| \| \| \| \| \|	macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.
*	Fix uninitialized warning on AVX.	Antonio Sanchez	2021-02-17
\|
*	Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product	Chip Kerchner	2021-02-17
\|
*	New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps ↵	Rasmus Munk Larsen	2021-02-17
\| \| \| \|	for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
*	Updated pfrexp implementation.	Antonio Sanchez	2021-02-17
\| \| \| \| \| \|	The original implementation fails for 0, denormals, inf, and NaN. See #2150
*	Document possible inconsistencies when using `Matrix<bool, ...>`	David Tellenbach	2021-02-17
\|
*	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& ↵	Ashutosh Sharma	2021-02-16
\| \| \| \|	kernel)
*	Avoid -Wunused warnings in NDEBUG builds.	Jan van Dijk	2021-02-12
\| \| \| \| \| \| \| \|	In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.
*	Don't allow all test jobs to fail but only the currently failing ones.	David Tellenbach	2021-02-12
\|
*	Use vrsqrts for rsqrt Newton iterations.	Antonio Sanchez	2021-02-11
\| \| \| \| \|	It's slightly faster and slightly more accurate, allowing our current packetmath tests to pass for sqrt with a single iteration.
*	Adjust bounds for pexp_float/double	Antonio Sanchez	2021-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original clamping bounds on `_x` actually produce finite values: ``` exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38 exp(709.437) = 1.27226e+308 < 1.79769e+308 ``` so with an accurate `ldexp` implementation, `pexp` fails for large inputs, producing finite values instead of `inf`. This adjusts the bounds slightly outside the finite range so that the output will overflow to +/- `inf` as expected.
*	Fix ldexp implementations.	Antonio Sanchez	2021-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous implementations produced garbage values if the exponent did not fit within the exponent bits. See #2131 for a complete discussion, and !375 for other possible implementations. Here we implement the 4-factor version. See `pldexp_impl` in `GenericPacketMathFunctions.h` for a full description. The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>` requires `por`. Left as a "TODO" is to delegate to a faster version if we know the exponent does fit within the exponent bits. Fixes #2131.
*	loop less ptranspose	Ashutosh Sharma	2021-02-10
\|
*	Remove vim specific comments to recognoize correct file-type.	David Tellenbach	2021-02-09
\| \| \| \|	As discussed in #2143 we remove editor specific comments.
*	Replace nullptr by NULL in SparseLU.h to be C++03 compliant.	David Tellenbach	2021-02-09
\|
*	add specialization of check_sparse_solving() for SuperLU solver, in order to ↵	Ralf Hannemann-Tamas	2021-02-08
\| \| \| \|	test adjoint and transpose solves
*	Fix documentation typos in LDLT.h	Nikolaus Demmel	2021-02-08
\|
*	Enable bdcsvd on host.	Antonio Sanchez	2021-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation is skipped, leading to a linker error. This prevents it from running on the host as well. Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but `jacobiSvd` is now enabled on host. Tested and runs fine on host, but will not compile/run for device (though it's not labelled as a device function, so this should be fine). Fixes #2139
*	Add more tests for pow and fix a corner case for huge exponent where the ↵	Rasmus Munk Larsen	2021-02-05
\| \| \| \|	result is always zero or infinite unless x is one.
*	Disable vectorized pow for half/bfloat16.	Antonio Sanchez	2021-02-05
\| \| \| \| \| \| \| \| \|	We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.
*	Fix excessive GEBP register spilling for 32-bit NEON.	Antonio Sanchez	2021-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.
*	Eliminate implicit conversions from float to double.	Antonio Sanchez	2021-02-01
\|
*	Implement bit_* for device.	Antonio Sanchez	2021-02-01
\| \| \| \| \| \| \| \| \| \|	Unfortunately `std::bit_and` and the like are host-only functions prior to c++14 (since they are not `constexpr`). They also never exist in the global namespace, so the current implementation always fails to compile via NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global namespace on device. To overcome these limitations, we implement these functionals here.