aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen
Commit message (Collapse)AuthorAge
* Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170
* Fix rint SSE/NEON again, using optimization barrier.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | | | | | | This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar David Tellenbach2021-03-05
| | | | | | | innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
* Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵Gravatar Steve Bronder2021-03-04
| | | | outerStride(), and size()
* Revert "Fix rint for SSE/NEON."Gravatar Antonio Sánchez2021-03-03
| | | This reverts commit e72dfeb8b9fa5662831b5d0bb9d132521f9173dd
* Fix rint for SSE/NEON.Gravatar Antonio Sanchez2021-03-03
| | | | | | | | | | | | | | It seems *sometimes* with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* Add print for SSE/NEON, use NEON rounding intrinsics if available.Gravatar Antonio Sanchez2021-02-27
| | | | | | | | | | In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.
* Make half/bfloat16 constructor take inputs by value, fix powerpc test.Gravatar Antonio Sanchez2021-02-27
| | | | | | | | | | | | Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.
* Remove unused includeGravatar Christoph Hertzberg2021-02-27
|
* clang 10 aggressively warns about precision loss when converting int to ↵Gravatar Christoph Hertzberg2021-02-27
| | | | | | float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
* Fix some enum-enum conversion warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
* Fixed/masked more implicit copy constructor warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
* Fix double-promotion warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
* Fix NEON sqrt for 32-bit, add prsqrt.Gravatar Antonio Sanchez2021-02-26
| | | | | | | | | | | | With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
* Merge branch 'rmlarsen1/eigen-nan_prop'Gravatar Rasmus Munk Larsen2021-02-26
|\
| * Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_propGravatar Rasmus Munk Larsen2021-02-26
| |\
| * | Add TODO.Gravatar Rasmus Munk Larsen2021-02-26
| | |
| * | Defer default for minCoeff/maxCoeff to templated variant.Gravatar Rasmus Munk Larsen2021-02-26
| | |
* | | Fix floor/ceil for NEON fp16.Gravatar Antonio Sanchez2021-02-25
| | | | | | | | | | | | Forgot to test this. Fixes bug introduced in !416.
* | | Fix SSE/NEON pfloor/pceil for saturated values.Gravatar Antonio Sanchez2021-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.
| | * Fix indentation.Gravatar Rasmus Munk Larsen2021-02-25
| | |
| | * Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵Gravatar Rasmus Munk Larsen2021-02-25
| |/ |/| | | | | reductions.
* | Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.Gravatar Chip-Kerchner2021-02-24
| |
| * Fix indentation.Gravatar Rasmus Munk Larsen2021-02-24
| |
| * Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵Gravatar Rasmus Munk Larsen2021-02-25
|/ | | | reductions.
* Remove unused function scalar_cmp_with_cast.Gravatar Rasmus Munk Larsen2021-02-24
|
* Cast anonymous enums to int when used in expressions.Gravatar Rasmus Munk Larsen2021-02-24
|
* Having forward template function declarations in a P10 file causes bad code ↵Gravatar Chip-Kerchner2021-02-24
| | | | in certain situations.
* Add `invoke_result` and eliminate `result_of` warnings for C++17+.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.
* Fixes to support old and new versions of the compilers for built-ins. Cast ↵Gravatar Chip-Kerchner2021-02-24
| | | | to non-const when using vector_pair with certain built-ins.
* Fix CUDA device new and delete, and add test.Gravatar Antonio Sanchez2021-02-24
| | | | HIP does not support new/delete on device, so test is skipped.
* Disable fast psqrt for NEON.Gravatar Antonio Sanchez2021-02-23
| | | | | | | Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.
* Fix check if GPU compile phase for std::hashGravatar Antonio Sanchez2021-02-23
|
* Fix some CUDA warnings.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half*` to `half2*`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.
* Accurate pow, part 2. This change adds specializations of log2 and exp2 for ↵Gravatar Rasmus Munk Larsen2021-02-23
| | | | | | | double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.
* Fixed sparse conservativeResize() when both num cols and rows decreased.Gravatar Adam Shapiro2021-02-23
| | | | | The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.
* Fix compilation errors with later versions of GCC and use of MMA.Gravatar Chip-Kerchner2021-02-22
|
* Fixes Bug #1925. Packets should be passed by const reference, even to inline ↵Gravatar Christoph Hertzberg2021-02-20
| | | | functions.
* Bug #1910: Make SparseCholesky work for RowMajor matricesGravatar Christoph Hertzberg2021-02-19
|
* Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros ↵Gravatar Antonio Sánchez2021-02-19
| | | | | (only if not HIPCC)." This reverts commit 12fd3dd655e37ba26e7ab236d32163e0aa35da39
* Use the Cephes double subtraction trick in pexp<float> even when FMA is ↵Gravatar Rasmus Munk Larsen2021-02-18
| | | | available. Otherwise the accuracy drops from 1 ulp to 3 ulp.
* add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if ↵Gravatar Masaki Murooka2021-02-17
| | | | not HIPCC).
* Bump to 3.4.99Gravatar David Tellenbach2021-02-17
|
* Define internal::make_unsigned for [unsigned]long long on macOS.Gravatar David Tellenbach2021-02-17
| | | | | | | | | | macOS defines int64_t as long long even for C++03 and therefore expects a template specialization internal::make_unsigned<long long>, for C++03. Since other platforms define int64_t as long for C++03 we cannot add the specialization for all cases.
* Fix uninitialized warning on AVX.Gravatar Antonio Sanchez2021-02-17
|
* Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_productGravatar Chip Kerchner2021-02-17
|
* New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps ↵Gravatar Rasmus Munk Larsen2021-02-17
| | | | for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
* Updated pfrexp implementation.Gravatar Antonio Sanchez2021-02-17
| | | | | | The original implementation fails for 0, denormals, inf, and NaN. See #2150
* missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& ↵Gravatar Ashutosh Sharma2021-02-16
| | | | kernel)
* Avoid -Wunused warnings in NDEBUG builds.Gravatar Jan van Dijk2021-02-12
| | | | | | | | In two places in SuperLUSupport.h, a local variable 'size' is created that is used only inside an eigen_assert. Remove these, just fetch the required values inside the assert statements. This avoids annoying -Wunused warnings (and -Werror=unused errors) in NDEBUG builds.