aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen
Commit message (Collapse)AuthorAge
* Correct declarations for aarch64-pc-windows-msvcGravatar 大河メタル2021-06-30
|
* Get rid of redundant `pabs` instruction in complex square root.Gravatar Rasmus Munk Larsen2021-06-29
|
* Commit 52a5f982 broke conjhelper functionality for HIP GPUs.Gravatar Rohit Santhanam2021-06-25
| | | | This commit addresses this.
* Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and ↵Gravatar Rasmus Munk Larsen2021-06-24
| | | | CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
* Get rid of code duplication for conj_helper. For packets where ↵Gravatar Rasmus Munk Larsen2021-06-24
| | | | LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
* Use internal::ref_selector to avoid holding a reference to a RHS expression.Gravatar Rasmus Munk Larsen2021-06-22
|
* Fix fix<> for gcc-4.9.3.Gravatar Antonio Sanchez2021-06-18
| | | | | | | There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267
* Remove pset, replace with ploadu.Gravatar Antonio Sanchez2021-06-16
| | | | | | | | | We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.
* EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X ↵Gravatar Chip-Kerchner2021-06-16
| | | | slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
* Add missing ppc pcmp_lt_or_nan<Packet8bf>Gravatar Antonio Sanchez2021-06-15
|
* Fix more enum arithmetic.Gravatar Rasmus Munk Larsen2021-06-15
|
* Fix checking of version number for mingw.Gravatar Antonio Sanchez2021-06-11
| | | | | | | | | | | MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this.
* Use bit_cast to create -0.0 for floating point types to avoid compiler ↵Gravatar Rasmus Munk Larsen2021-06-11
| | | | optimization changing sign with --ffast-math enabled.
* Fix c++20 warnings about using enums in arithmetic expressions.Gravatar Rasmus Munk Larsen2021-06-10
|
* Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.Gravatar Cyril Kaiser2021-05-26
|
* Add missing NEON ptranspose implementations.Gravatar Antonio Sanchez2021-05-25
| | | | Unified implementation using only `vzip`.
* Modify Unary/Binary/TernaryOp evaluators to work for non-class types.Gravatar Antonio Sanchez2021-05-23
| | | | | | | | | | | | | | | | | | | This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit 11f55b29 to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251
* Adds macro for checking if C++14 variable templates are supportedGravatar Steve Bronder2021-05-21
|
* Use derived object type in conservative_resize_like_implGravatar Niall Murphy2021-05-20
| | | | | | | | | When calling conservativeResize() on a matrix with DontAlign flag, the temporary variable used to perform the resize should have the same Options as the original matrix to ensure that the correct override of swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> & other). Calling the base class swap (i.e in DenseBase) results in assertions errors or memory corruption.
* Device implementation of log for std::complex types.Gravatar Nathan Luehr2021-05-11
|
* Fix ambiguity due to argument dependent lookup.Gravatar Nathan Luehr2021-05-11
|
* Changing the storage of the SSE complex packets to that of the wrapper. This ↵Gravatar guoqiangqi2021-05-10
| | | | should fix #2242 .
* Fix for issue where numext::imag and numext::real are used before they are ↵Gravatar Rohit Santhanam2021-05-10
| | | | defined.
* Restore ABI compatibility for conj with 3.3, fix conflict with boost.Gravatar Antonio Sanchez2021-05-07
| | | | | | | | | | | | | | | The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112.
* Fix numext::arg return type.Gravatar Antonio Sanchez2021-05-07
| | | | | | | | The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue.
* Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242Gravatar Christoph Hertzberg2021-05-06
|
* Better CUDA complex division.Gravatar Antonio Sanchez2021-04-29
| | | | | | The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.
* Add missing pcmp_lt_or_nan for NEON Packet4bf.Gravatar Antonio Sanchez2021-04-27
|
* Tests added and AVX512 bug fixed for pcmp_lt_or_nanGravatar Jakub Lichman2021-04-25
|
* DenseStorage safely copy/swap.Gravatar Antonio Sanchez2021-04-22
| | | | | | | | Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements.
* Make vectorized compute_inverse_size4 compile with AVX.Gravatar Rasmus Munk Larsen2021-04-22
|
* Fix taking address of rvalue compiler issue with TensorFlow (plus other ↵Gravatar Chip-Kerchner2021-04-21
| | | | warnings).
* HasExp added for AVX512 Packet8dGravatar Jakub Lichman2021-04-20
|
* Fix ldexp for AVX512 (#2215)Gravatar Antonio Sanchez2021-04-20
| | | | | | | Wrong shuffle was used. Need to interleave low/high halves with a `permute` instruction. Fixes #2215.
* Before 3.4 branchGravatar David Tellenbach2021-04-18
|
* Avoid using uninitialized inputs and if available, use slightly more ↵Gravatar Christoph Hertzberg2021-04-13
| | | | efficient `movsd` instruction for `pset1<Packet2cf>`.
* Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for ↵Gravatar Christoph Hertzberg2021-04-12
| | | | | | `std::result_of` and `std::invoke_result`. Fixes #2209
* Make iterators default constructible and assignable, by making...Gravatar Christoph Hertzberg2021-04-09
|
* Scaled epsilon the wrong way.Gravatar Antonio Sanchez2021-04-07
| | | | | | Should have been 0.5 to widen the bounds, since this is inverse precision. Setting to 0.5, however, leads to many more failing tests at Google, so reverting to 1 for now.
* Replace `-2147483648` by `-0.0f` or `-0.0` constants (this should fix #2189).Gravatar Christoph Hertzberg2021-04-07
| | | | Also, remove unnecessary `pgather` operations.
* Align local arrays to Packet boundary.Gravatar Rasmus Munk Larsen2021-04-06
|
* Fix SelfAdjoingEigenSolver (#2191)Gravatar Antonio Sanchez2021-04-05
| | | | | | | | | | | | | | Adjust the relaxation step to use the condition ``` abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1])) ``` for setting the subdiagonal entry to zero. Also adjust Wilkinson shift for small `e = subdiag[end-1]` - I couldn't find a reference for the original, and it was not consistent with the Wilkinson definition. Fixes #2191.
* Fix two bugs in commitGravatar Rasmus Munk Larsen2021-04-02
|
* Fix address of temporary object errors in clang11.Gravatar Chip Kerchner2021-04-02
| | | | This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
* Add an info() method to the SVDBase class to make it possible to tell the ↵Gravatar Rasmus Munk Larsen2021-03-31
| | | | | | user that the computation failed, possibly due to invalid input. Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
* Fix CUDA constexpr issues for numeric_limits.Gravatar Antonio Sanchez2021-03-30
| | | | | | | | | | | | | | | | Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).
* Use Index type in loop over coefficients.Gravatar Antonio Sanchez2021-03-29
| | | | | Previously was `int`. Brought up by Kyle Snow (Polaris Geospatial Services) on the mailing list.
* Eliminate `round_impl` double-promotion warnings for c++03.Gravatar Antonio Sanchez2021-03-25
|
* Un-defining EIGEN_HAS_CONSTEXPR on the HIP platformGravatar Deven Desai2021-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit https://gitlab.com/libeigen/eigen/-/commit/e7b8643d70dfbb02ad94186169a8f16041f05bc2 ``` In file included from /home/rocm-user/eigen/test/main.h:360: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:162: /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static float (max)() { ^ /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression return HIPRT_MAX_NORMAL_F; ^ /home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F' #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) ^ /opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here __device__ static inline float __int_as_float(int x) { ^ ``` The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file. Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
* Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).Gravatar Chip Kerchner2021-03-25
|