aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Fix bad NEON fp16 checkGravatar Antonio Sanchez2020-12-04
|
* Special function implementations for half/bfloat16 packets.Gravatar Antonio Sanchez2020-12-04
| | | | | | | | | | | | | Current implementations fail to consider half-float packets, only half-float scalars. Added specializations for packets on AVX, AVX512 and NEON. Added tests to `special_packetmath`. The current `special_functions` tests would fail for half and bfloat16 due to lack of precision. The NEON tests also fail with precision issues and due to different handling of `sqrt(inf)`, so special functions bessel, ndtri have been disabled. Tested with AVX, AVX512.
* Remove duplicate #if clauseGravatar David Tellenbach2020-12-04
|
* Fix shfl* macros for CUDA/HIPGravatar Antonio Sanchez2020-12-04
| | | | | | | | | | The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so they are defined whenever the corresponding CUDA/HIP ones are. Also changed the HIP/CUDA<9.0 versions to cast to int instead of doing the conversion `half`<->`float`. Fixes #2083
* The function 'prefetch' did not work correctly on the win64 platformGravatar shrek14022020-12-04
|
* Revert "Add log2() operator to Eigen"Gravatar Rasmus Munk Larsen2020-12-03
| | | | This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
* Add log2() operator to EigenGravatar Rasmus Munk Larsen2020-12-03
|
* Small cleanup of generic plog implementations:Gravatar Rasmus Munk Larsen2020-12-03
| | | | | | | | | | | | | | Adding the term e*ln(2) is split into two step for no obvious reason. This dates back to the original Cephes code from which the algorithm is adapted. It appears that this was done in Cephes to prevent the compiler from reordering the addition of the 3 terms in the approximation log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x) which must be added in reverse order since |x| < (sqrt(2)-1). This allows rewriting the code to just 2 pmadd and 1 padd instructions, which on a Skylake processor speeds up the code by 5-7%.
* Include chrono in main for c++11.Gravatar Antonio Sanchez2020-12-03
| | | | Hack to fix tensor tests, since min/max are overridden by `main.h`.
* Clean up the Tensor header and get rid of the EIGEN_SLEEP macro.Gravatar Rasmus Munk Larsen2020-12-02
|
* Fix typo in `F32MaskToBf16Mask`.Gravatar Antonio Sanchez2020-12-02
|
* Fix neon cmp* functions for bf16.Gravatar Antonio Sanchez2020-12-02
| | | | | | | | | | | The current impl corrupts the comparison masks when converting from float back to bfloat16. The resulting masks are then no longer all zeros or all ones, which breaks when used with `pselect` (e.g. in `pmin<PropagateNumbers>`). This was causing `packetmath_15` to fail on arm. Introducing a simple `F32MaskToBf16Mask` corrects this (takes the lower 16-bits for each float mask).
* Implement CUDA __shfl* for Eigen::halfGravatar Antonio Sanchez2020-12-01
| | | | | | | Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu` test are broken, as well as several ops in Tensorflow. The gpu functions `__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float. Here we add the required specializations.
* Fix a few issues for AVX512. This change enables vectorized versions of log, ↵Gravatar Rasmus Munk Larsen2020-12-01
| | | | exp, log1p, expm1 when AVX512DQ is not available.
* Fix #2077, `EIGEN_CONSTEXPR` in `Half`.Gravatar Antonio Sanchez2020-12-01
| | | | | | | | | `bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from `raw_half_as_uint16(...)`. This shouldn't affect anything else, since it is only used in `a bit_cast<uint16_t,half>()` which is not itself `constexpr`. Fixes #2077.
* add EIGEN_DEVICE_FUNC to methodsGravatar acxz2020-12-01
|
* AVX512 missing ops.Gravatar Antonio Sanchez2020-11-30
| | | | | | | | | | This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).
* Fix typo in docGravatar Florian Maurin2020-11-30
|
* Workaround for doxygen class template titles in which the templateGravatar Jim Lersch2020-11-27
| | | | | | part of the class signature is lost due to a problem with forward declarations. The problem is probably caused by doxygen bug #7689. It is confirmed to be fixed in doxygen >= 1.8.19.
* Fix doxygen class blocks that were not associated with the correct classes.Gravatar Jim Lersch2020-11-27
|
* Include CMakeDependentOption to be able to use cmake_dependent_optionGravatar David Tellenbach2020-11-27
|
* Make inclusion of doc sub-directory optional by adjusting options.Gravatar Bowie Owens2020-11-27
| | | | | | | | | | Allows exclusion of doc and related targets to help when using eigen via add_subdirectory(). Requested by: https://gitlab.com/libeigen/eigen/-/issues/1842 Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
* check for include dirs setGravatar filippobrizzi2020-11-26
|
* Fix some packet-functions in the IBM ZVector packet-math.Gravatar Andreas Krebbel2020-11-25
|
* Revert "Fix Half NaN definition and test."Gravatar Rasmus Munk Larsen2020-11-24
| | | | This reverts commit c770746d709686ef2b8b652616d9232f9b028e78.
* Fix Half NaN definition and test.Gravatar Rasmus Munk Larsen2020-11-24
| | | | | | | | | | | | | The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.
* Fix boolean float conversion and product warnings.Gravatar Antonio Sanchez2020-11-24
| | | | | | | | | | | | | | | | | | | | | This fixes some gcc warnings such as: ``` Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool] Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); } ``` Details: - Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`). - Added `scalar_square_op<bool>` and `scalar_cube_op<bool>` specializations (`-Wint-in-bool-context`) - Deprecated above specialized ops for bool. - Modified `cxx11_tensor_block_eval` to specialize generator for booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to avoid deprecated bool ops.
* Implement missing AVX half ops.Gravatar Antonio Sanchez2020-11-24
| | | | | | | | Minimal implementation of AVX `Eigen::half` ops to bring in line with `bfloat16`. Allows `packetmath_13` to pass. Also adjusted `bfloat16` packet traits to match the supported set of ops (e.g. Bessel is not actually implemented).
* Fix Half NaN definition and test.Gravatar Antonio Sanchez2020-11-23
| | | | | | | | | | | | | The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`, the signaling `NaN` is quieted). There was also an inconsistency between `numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we correct the inconsistency and compare NaNs according to the IEEE 754 definition. Also modified the `bfloat16_float` test to match. Tested with `cortex-a53` and `cortex-a55`.
* Update AVX half packets, disable test.Gravatar Antonio Sanchez2020-11-21
| | | | | | | | The AVX half implementation is incomplete, causing the `packetmath_13` test to fail. This disables the test. Also refactored the existing AVX implementation to use `bit_cast` instead of direct access to `.x`.
* Fixes duplicate symbol when building blasGravatar Antonio Sanchez2020-11-20
| | | | | | | Missing inline breaks blas, since symbol generated in `complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp` Changed rest of inlines to `EIGEN_STRONG_INLINE`.
* Remove explicit casts from Eigen::half and Eigen::bfloat16 to boolGravatar David Tellenbach2020-11-19
| | | | | | | | | Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to float and can hence be converted to bool via the conversion chain Eigen::{half,bfloat16} -> float -> bool We thus remove the explicit cast operator to bool.
* Fix sparse_extra_3, disable counting temporaries for testing ↵Gravatar Antonio Sanchez2020-11-18
| | | | | | | | | | | | | | | | | | | | | | | DynamicSparseMatrix. Multiplication of column-major `DynamicSparseMatrix`es involves three temporaries: - two for transposing twice to sort the coefficients (`ConservativeSparseSparseProduct.h`, L160-161) - one for a final copy assignment (`SparseAssign.h`, L108) The latter is avoided in an optimization for `SparseMatrix`. Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not worth the effort to optimize further, so I simply disabled counting temporaries via a macro. Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra` tests actually re-run all the original `sparse_product` tests as well. We may want to simply drop the `DynamicSparseMatrix` tests altogether, which would eliminate the test duplication. Related to #2048
* Re-enable Arm Neon Eigen::half packets of size 8Gravatar David Tellenbach2020-11-18
| | | | | | - Add predux_half_dowto4 - Remove explicit casts in Half.h to match the behaviour of BFloat16.h - Enable more packetmath tests for Eigen::half
* Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandomGravatar Antonio Sanchez2020-11-18
| | | | | | | | | | The existing `TensorRandom.h` implementation makes the assumption that `half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not always true. This currently fails on arm64, where `x` has type `__fp16`. Added `bit_cast` specializations to allow casting to/from `uint16_t` for both `half` and `bfloat16`. Also added tests in `half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch these errors in the future.
* Initialize primitives to fix -Wuninitialized-const-reference.Gravatar Antonio Sanchez2020-11-18
| | | | | | | | | | | | | The `meta` test generates warnings with the latest version of clang due to passing uninitialized variables as const reference arguments. ``` test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference] VERIFY(( check_is_convertible(a.dot(b), f) )); ``` We don't actually use the variables, but initializing them eliminates the new warning. Fixes #2067.
* Fix rule-of-3 for the Tensor module.Gravatar Antonio Sanchez2020-11-18
| | | | | | | Adds copy constructors to Tensor ops, inherits assignment operators from `TensorBase`. Addresses #1863
* EOF newline added to InverseSize4.Gravatar Antonio Sanchez2020-11-18
| | | | | Causing build breakages due to `-Wnewline-eof -Werror` that seems to be common across Google.
* Add missing parens around macro argument.Gravatar Rasmus Munk Larsen2020-11-18
|
* Replace SSE_SHUFFLE_MASK macro with shuffle_mask.Gravatar Rasmus Munk Larsen2020-11-17
|
* Avoid promotion of Arm __fp16 to float in Neon PacketMathGravatar David Tellenbach2020-11-17
| | | | | | Using overloaded arithmetic operators for Arm __fp16 always causes a promotion to float. We replace operator* by vmulh_f16 to avoid this.
* Fix missing `EIGEN_CONSTEXPR` pop_macro in `Half`.Gravatar Antonio Sanchez2020-11-17
| | | | | `EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if `EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
* Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation ↵Gravatar Guoqiang QI2020-11-17
| | | | using PacketMath.
* Eliminate double-promotion warnings.Gravatar Antonio Sanchez2020-11-16
| | | | | | | | | | | | | Clang currently complains about implicit conversions, e.g. ``` test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion] data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2)); ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion] data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2)); ``` Modified to explicitly cast to double.
* Add EIGEN_DEVICE_FUNC to TranspositionsBaseGravatar acxz2020-11-16
| | | | Fixes #2057.
* Enable MathJax in Doxygen.inGravatar Martin Vonheim Larsen2020-11-16
| | | | Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
* Explicit casts of S -> std::complex<T>Gravatar Antonio Sanchez2020-11-14
| | | | | | | | | | | | | | | | When calling `internal::cast<S, std::complex<T>>(x)`, clang often generates an implicit conversion warning due to an implicit cast from type `S` to `T`. This currently affects the following tests: - `basicstuff` - `bfloat16_float` - `cxx11_tensor_casts` The implicit cast leads to widening/narrowing float conversions. Widening warnings only seem to be generated by clang (`-Wdouble-promotion`). To eliminate the warning, we explicitly cast the real-component first from `S` to `T`. We also adjust tests to use `internal::cast` instead of `static_cast` when a complex type may be involved.
* Suppress ignored-attributes warning (same as in vectorization_logic). Remove ↵Gravatar Christoph Hertzberg2020-11-13
| | | | redundant include and using namespace.
* Fix typo in NEON/PacketMath.hGravatar guoqiangqi2020-11-13
|
* Disable testing of OpenGL by default.Gravatar Antonio Sanchez2020-11-12
| | | | | | | | | | | | The `OpenGLSupport` module contains mostly deprecated features, and the test is highly GL context-dependent, relies on deprecated GLUT, and requires a display. Until the module is updated to support modern OpenGL and the test to use newer windowing frameworks (e.g. GLFW) it's probably best to disable the test by default. The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`. See #2053 for more details.