aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/util/Macros.h
Commit message (Collapse)AuthorAge
* Support manually disabling exceptionsHEADmasterGravatar Benjamin Barenblat2021-07-07
| | | | | Rename EIGEN_EXCEPTIONS to EIGEN_USE_EXCEPTIONS, and allow disabling exceptions with -DEIGEN_USE_EXCEPTIONS=0.
* Adds macro for checking if C++14 variable templates are supportedGravatar Steve Bronder2021-05-21
|
* Before 3.4 branchGravatar David Tellenbach2021-04-18
|
* Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for ↵Gravatar Christoph Hertzberg2021-04-12
| | | | | | `std::result_of` and `std::invoke_result`. Fixes #2209
* Fix CUDA constexpr issues for numeric_limits.Gravatar Antonio Sanchez2021-03-30
| | | | | | | | | | | | | | | | Some CUDA/HIP constants fail on device with `constexpr` since they internally rely on non-constexpr functions, e.g. ``` \#define CUDART_INF_F __int_as_float(0x7f800000) ``` This fails for cuda-clang (though passes with nvcc). These constants are currently used by `device::numeric_limits`. For portability, we need to remove `constexpr` from the affected functions. For C++11 or higher, we should be able to rely on the `std::numeric_limits` versions anyways, since the methods themselves are now `constexpr`, so should be supported on device (clang/hipcc natively, nvcc with `--expr-relaxed-constexpr`).
* Un-defining EIGEN_HAS_CONSTEXPR on the HIP platformGravatar Deven Desai2021-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit https://gitlab.com/libeigen/eigen/-/commit/e7b8643d70dfbb02ad94186169a8f16041f05bc2 ``` In file included from /home/rocm-user/eigen/test/main.h:360: In file included from /home/rocm-user/eigen/Eigen/QR:11: In file included from /home/rocm-user/eigen/Eigen/Core:162: /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr] static float (max)() { ^ /home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression return HIPRT_MAX_NORMAL_F; ^ /home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F' #define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff) ^ /opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here __device__ static inline float __int_as_float(int x) { ^ ``` The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file. Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
* Fix NVCC+ICC issues.Gravatar Antonio Sanchez2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | | NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180
* Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.Gravatar Antonio Sanchez2021-03-10
| | | | | Doesn't seem to correctly select the register type, and most types lead to compiler crashes.
* [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined ↵Gravatar Ben Niu2021-03-10
| | | | and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.
* Revert stack allocation limit change that crept in.Gravatar Antonio Sanchez2021-03-05
| | | | This was accidentally introduced when copying changes between repos.
* Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | The macro `__cplusplus` is not defined correctly in MSVC unless building with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the specified c++ standard version number. Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version number both for MSVC and otherwise. This simplifies checks for supported features. Also replaced most instances of standard version checking via `__cplusplus` with the existing `EIGEN_COMP_CXXVER` macro for better clarity. Fixes: #2170
* Fix rint SSE/NEON again, using optimization barrier.Gravatar Antonio Sanchez2021-03-05
| | | | | | | | | | | | | | | | | | | | This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.
* Add print for SSE/NEON, use NEON rounding intrinsics if available.Gravatar Antonio Sanchez2021-02-27
| | | | | | | | | | In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.
* Add `invoke_result` and eliminate `result_of` warnings for C++17+.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | The `std::result_of` meta struct is deprecated in C++17 and removed in C++20. It was still slipping through due to a faulty definition of `EIGEN_HAS_STD_RESULT_OF`. Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and `Eigen::internal::invoke_result` implementation with fallback for pre C++17. Replaces the `result_of` definition with one based on `std::invoke_result` for C++17 and higher. For completeness, added nullary op support for c++03. Fixes #1850.
* Fix check if GPU compile phase for std::hashGravatar Antonio Sanchez2021-02-23
|
* Fix some CUDA warnings.Gravatar Antonio Sanchez2021-02-24
| | | | | | | | | | | | | | | | | Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not running on GPU. `std::hash<float>` is not a device function, so cannot be used by `std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only define if `EIGEN_HAS_STD_HASH`. Same for `half`. Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability, eliminate warnings about `EIGEN_CUDA_ARCH` not being defined. Replaced a couple C-style casts with `reinterpret_cast` for aligned loading of `half*` to `half2*`. This eliminates `-Wcast-align` warnings in clang. Although not ideal due to potential type aliasing, this is how CUDA handles these conversions internally.
* Bump to 3.4.99Gravatar David Tellenbach2021-02-17
|
* Add missing #endif directive in Macros.hGravatar David Tellenbach2021-01-07
|
* #define was defined incorrectly because the result_of function was ↵Gravatar shrek14022021-01-07
| | | | deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors.
* remove annotation for first declaration of default con/destructionGravatar acxz2020-11-12
|
* Add support for Armv8.2-a __fp16Gravatar David Tellenbach2020-10-28
| | | | | | | | | | | | | | | Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.
* Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STDGravatar David Tellenbach2020-10-09
|
* Get rid of initialization logic for blueNorm by making the computed ↵Gravatar Rasmus Munk Larsen2020-09-18
| | | | | | constants static const or constexpr. Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
* Add support for CastXML on ARM aarch64Gravatar Brad King2020-09-16
| | | | | | | | CastXML simulates the preprocessors of other compilers, but actually parses the translation unit with an internal Clang compiler. Use the same `vld1q_u64` workaround that we do for Clang. Fixes: #1979
* Fixing a CUDA / P100 regression introduced by PR 181Gravatar Deven Desai2020-08-20
| | | | | | PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified. That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
* Bug #1788: Fix rule-of-three violations inside the stable modules.Gravatar Christoph Hertzberg2019-12-19
| | | | | This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587)
* Add default definition for EIGEN_PREDICT_*Gravatar Rasmus Munk Larsen2019-12-16
|
* Improve accuracy of fast approximate tanh and the logistic functions in ↵Gravatar Rasmus Munk Larsen2019-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function). This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in https://gitlab.com/libeigen/eigen/commit/66f07efeaed39d6a67005343d7e0caf7d9eeacdb), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9. This change also contains a few improvements to speed up the original float specialization of logistic: - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case). - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup). The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set. The benchmarks below repeated calls u = v.logistic() (u = v.tanh(), respectively) where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1]. Benchmark numbers for logistic: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 4467 4468 155835 model_time: 4827 AVX BM_eigen_logistic_float 2347 2347 299135 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1467 1467 476143 model_time: 2926 AVX512 BM_eigen_logistic_float 805 805 858696 model_time: 1463 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_logistic_float 2589 2590 270264 model_time: 4827 AVX BM_eigen_logistic_float 1428 1428 489265 model_time: 2926 AVX+FMA BM_eigen_logistic_float 1059 1059 662255 model_time: 2926 AVX512 BM_eigen_logistic_float 673 673 1000000 model_time: 1463 Benchmark numbers for tanh: Before: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2391 2391 292624 model_time: 4242 AVX BM_eigen_tanh_float 1256 1256 554662 model_time: 2633 AVX+FMA BM_eigen_tanh_float 823 823 866267 model_time: 1609 AVX512 BM_eigen_tanh_float 443 443 1578999 model_time: 805 After: Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------------------- SSE BM_eigen_tanh_float 2588 2588 273531 model_time: 4242 AVX BM_eigen_tanh_float 1536 1536 452321 model_time: 2633 AVX+FMA BM_eigen_tanh_float 1007 1007 694681 model_time: 1609 AVX512 BM_eigen_tanh_float 471 471 1472178 model_time: 805
* [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵Gravatar Mehdi Goli2019-11-28
| | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake
* Add EIGEN_HAS_INTRINSIC_INT128 macroGravatar Rasmus Munk Larsen2019-11-06
| | | | Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
* PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTENGravatar David Tellenbach2019-03-21
|
* [SYCL] This PR adds the minimum modifications to Eigen core required to run ↵Gravatar Mehdi Goli2019-06-27
| | | | | | | | Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization
* Clean up CUDA/NVCC version macros and their use in Eigen, and a few other ↵Gravatar Rasmus Munk Larsen2019-05-31
| | | | CUDA build failures.
* Merged in rmlarsen/eigen (pull request PR-643)Gravatar Rasmus Larsen2019-05-20
|\ | | | | | | | | | | Make Eigen build with cuda 10 and clang. Approved-by: Justin Lebar <justin.lebar@gmail.com>
| * Make Eigen build with cuda 10 and clang.Gravatar Rasmus Munk Larsen2019-05-15
| |
* | Eigen: Fix MSVC C++17 language standard detection logicGravatar Scott Ramsby2019-05-03
|/ | | | | | | To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17). See these web pages for more details: https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/ https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
* Merged eigen/eigen into defaultGravatar Deven Desai2019-03-19
|\
| * bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:Gravatar Gael Guennebaud2019-02-20
| | | | | | | | | | - this helps clang 5 and 6 to support alignas in STL's containers. - this makes the public API of our (and users) classes cleaner
| * Add C++17 detection macro, and make sure throw(xpr) is not used if the ↵Gravatar Gael Guennebaud2019-02-19
| | | | | | | | compiler is in c++17 mode.
| * bug #1680: improve MSVC inlining by declaring many triavial constructors and ↵Gravatar Gael Guennebaud2019-02-15
| | | | | | | | accessors as STRONG_INLINE.
| * TypoGravatar Gael Guennebaud2019-01-15
| |
| * Replace compiler's alignas/alignof extension by respective c++11 keywords ↵Gravatar Gael Guennebaud2019-01-11
| | | | | | | | when available. This also fix a compilation issue with gcc-4.7.
| * Do not disable alignment with EIGEN_GPUCCGravatar Eugene Zhulenev2018-12-03
| |
* | ROCm/HIP specfic fixes + updatesGravatar Deven Desai2018-11-19
|/ | | | | | | | | | | | | | | | | | | | | | | | | | 1. Eigen/src/Core/arch/GPU/Half.h Updating the HIPCC implementation half so that it can declared as a __shared__ variable 2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h introducing a EIGEN_USE_STD(func) macro that calls - std::func be default - ::func when eigen is being compiled with HIPCC This change was requested in the previous HIP PR (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff) 3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC 4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
* Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATEGravatar Gael Guennebaud2018-11-09
|
* Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if ↵Gravatar Rasmus Munk Larsen2018-10-18
| | | | cxx_relaxed_constexpr is available.
* Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overridesGravatar Christoph Hertzberg2018-09-24
|
* Enable std::result_of for msvc 2015 and laterGravatar Gael Guennebaud2018-09-13
|
* Fixing typo.Gravatar Mehdi Goli2018-08-08
|
* Adding EIGEN_UNROLL_LOOP macro.Gravatar Mehdi Goli2018-08-08
|