aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170
* Fix rint SSE/NEON again, using optimization barrier.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | | | | | | This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar David Tellenbach2021-03-05
| | | | | | | innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
* Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵Gravatar Steve Bronder2021-03-04
| | | | outerStride(), and size()
* Deactive CI for Power due to problems with GitLab runnerGravatar David Tellenbach2021-03-04
|
* Add log2 operation to TensorBaseGravatar Eugene Zhulenev2021-03-04
|
* Revert "Fix rint for SSE/NEON."Gravatar Antonio Sánchez2021-03-03
| | | This reverts commit e72dfeb8b9fa5662831b5d0bb9d132521f9173dd
* Fix rint for SSE/NEON.Gravatar Antonio Sanchez2021-03-03
| | | | | | | | | | | | | | It seems *sometimes* with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* geo_alignedbox_5 was failing with AVX enabled, due to storing `Vector4d` in ↵Gravatar Christoph Hertzberg2021-03-01
| | | | | | | a `std::vector` without using an aligned allocator. Got rid of using `std::vector` and simplified the code. Avoid leading `_`
* Add print for SSE/NEON, use NEON rounding intrinsics if available.Gravatar Antonio Sanchez2021-02-27
| | | | | | | | | | In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.
* Document that using raw function pointers doesn't work with unaryExpr.Gravatar David Tellenbach2021-02-27
|
* Make half/bfloat16 constructor take inputs by value, fix powerpc test.Gravatar Antonio Sanchez2021-02-27
| | | | | | | | | | | | Since `numeric_limits<half>::max_exponent` is a static inline constant, it cannot be directly passed by reference. This triggers a linker error in recent versions of `g++-powerpc64le`. Changing `half` to take inputs by value fixes this. Wrapping `max_exponent` with `int(...)` to make an addressable integer also fixes this and may help with other custom `Scalar` types down-the-road. Also eliminated some compile warnings for powerpc.
* Remove unused includeGravatar Christoph Hertzberg2021-02-27
|
* clang 10 aggressively warns about precision loss when converting int to ↵Gravatar Christoph Hertzberg2021-02-27
| | | | | | float (or long to double) (cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
* Inherit from `no_assignment_operator` to avoid implicit copy constructor ↵Gravatar Christoph Hertzberg2021-02-27
| | | | | | warnings (cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
* Fix some enum-enum conversion warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
* Fixed/masked more implicit copy constructor warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
* ReturnByValue is already non-copyableGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
* Fix double-promotion warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
* Idrs iterative linear solverGravatar Jens Wehner2021-02-27
|
* Fix NEON sqrt for 32-bit, add prsqrt.Gravatar Antonio Sanchez2021-02-26
| | | | | | | | | | | | With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
* Merge branch 'rmlarsen1/eigen-nan_prop'Gravatar Rasmus Munk Larsen2021-02-26
|\
| * Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_propGravatar Rasmus Munk Larsen2021-02-26
| |\
| * | Add TODO.Gravatar Rasmus Munk Larsen2021-02-26
| | |
| * | Defer default for minCoeff/maxCoeff to templated variant.Gravatar Rasmus Munk Larsen2021-02-26
| | |
* | | Fix floor/ceil for NEON fp16.Gravatar Antonio Sanchez2021-02-25
| | | | | | | | | | | | Forgot to test this. Fixes bug introduced in !416.
* | | Fix SSE/NEON pfloor/pceil for saturated values.Gravatar Antonio Sanchez2021-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.
| | * Fix indentation.Gravatar Rasmus Munk Larsen2021-02-25
| | |
| | * Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵Gravatar Rasmus Munk Larsen2021-02-25
| |/ |/| | | | | reductions.
* | Disable new/delete test for HIPGravatar Antonio Sanchez2021-02-25
| |
* | Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.Gravatar Chip-Kerchner2021-02-24
| |
* | Don't crash when attempting to slice an empty tensor.Gravatar Rasmus Munk Larsen2021-02-24
| |
| * Fix indentation.Gravatar Rasmus Munk Larsen2021-02-24
| |
| * Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_propGravatar Rasmus Munk Larsen2021-02-24
| |\
| | * Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵Gravatar Rasmus Munk Larsen2021-02-25
| |/ |/| | | | | reductions.
| * Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff ↵Gravatar Rasmus Munk Larsen2021-02-24
| | | | | | | | reductions.
* | Remove unused function scalar_cmp_with_cast.Gravatar Rasmus Munk Larsen2021-02-24
| |
* | Cast anonymous enums to int when used in expressions.Gravatar Rasmus Munk Larsen2021-02-24
|/
* Having forward template function declarations in a P10 file causes bad code ↵Gravatar Chip-Kerchner2021-02-24
| | | | in certain situations.
* Some improvements for kissfft from Martin Reinecke(pocketfft author):Gravatar Guoqiang QI2021-02-24
| | | | | | 1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time. 2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms. 3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
* Add `invoke_result` and eliminate `result_of` warnings for C++17+.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.
* Fixes to support old and new versions of the compilers for built-ins. Cast ↵Gravatar Chip-Kerchner2021-02-24
| | | | to non-const when using vector_pair with certain built-ins.
* Fix CUDA device new and delete, and add test.Gravatar Antonio Sanchez2021-02-24
| | | | HIP does not support new/delete on device, so test is skipped.
* Eliminate CMake FindPackageHandleStandardArgs warnings.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | CMake complains that the package name does not match when the case differs, e.g.: ``` CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message): The package name passed to `find_package_handle_standard_args` (UMFPACK) does not match the name of the calling package (Umfpack). This can lead to problems in calling code that expects `find_package` result variables (e.g., `_FOUND`) to follow a certain pattern. Call Stack (most recent call first): cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args) bench/spbench/CMakeLists.txt:24 (find_package) This warning is for project developers. Use -Wno-dev to suppress it. ``` Here we rename the libraries to match their true cases.
* Disable fast psqrt for NEON.Gravatar Antonio Sanchez2021-02-23
| | | | | | | Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.
* Fix check if GPU compile phase for std::hashGravatar Antonio Sanchez2021-02-23
|
* Fix some CUDA warnings.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half*` to `half2*`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.
* Accurate pow, part 2. This change adds specializations of log2 and exp2 for ↵Gravatar Rasmus Munk Larsen2021-02-23
| | | | | | | double that make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect implementation.
* Fixed sparse conservativeResize() when both num cols and rows decreased.Gravatar Adam Shapiro2021-02-23
| | | | | The previous implementation caused a buffer overflow trying to calculate non- zero counts for columns that no longer exist.
* Fix compilation errors with later versions of GCC and use of MMA.Gravatar Chip-Kerchner2021-02-22
|