aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/Core
Commit message (Collapse)AuthorAge
* Support manually disabling exceptionsHEADmasterGravatar Benjamin Barenblat2021-07-07
| | | | | Rename EIGEN_EXCEPTIONS to EIGEN_USE_EXCEPTIONS, and allow disabling exceptions with -DEIGEN_USE_EXCEPTIONS=0.
* Fix NVCC+ICC issues.Gravatar Antonio Sanchez2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | | NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180
* Fix excessive GEBP register spilling for 32-bit NEON.Gravatar Antonio Sanchez2021-02-03
| | | | | | | | | | | | | | | | | | | | | Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.
* Add support for Arm SVEGravatar David Tellenbach2021-01-21
| | | | | | | | | | | | This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
* Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STDGravatar David Tellenbach2020-10-09
|
* MatrixProuct enhancements:Gravatar Everton Constantino2020-09-02
| | | | | | | | | | | | | - Changes to Altivec/MatrixProduct Adapting code to gcc 10. Generic code style and performance enhancements. Adding PanelMode support. Adding stride/offset support. Enabling float64, std::complex and std::complex. Fixing lack of symm_pack. Enabling mixedtypes. - Adding std::complex tests to blasutil. - Adding an implementation of storePacketBlock when Incr!= 1.
* Support BFloat16 in EigenGravatar Teng Lu2020-06-20
|
* Fix #1757: remove the word 'suicide'Gravatar Sebastien Boisvert2020-06-11
|
* Fix #556: warnings with mingwGravatar Gael Guennebaud2020-05-31
|
* Fix incorrect usage of `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`Gravatar Yong Tang2020-05-28
| | | | | | | | | | | | | | This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)` in `Eigen/Core` header. In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true. This causes issues when building on non PPC platform and `MatrixProduct.h` is not available. This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* - Vectorizing MMA packing.Gravatar Everton Constantino2020-05-19
| | | | | - Optimizing MMA kernel. - Adding PacketBlock store to blas_data_mapper.
* Eigen moved the `scanLauncehr` function inside the internal namespace.Gravatar mehdi-goli2020-05-11
| | | | | | | This commit applies the following changes: - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend. - Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard. - minor fixes: commenting out an unused variable to avoid compiler warnings.
* Include <sstream> explicitly, and don't rely on the implicit include via ↵Gravatar Tobias Bosch2020-02-24
| | | | | <complex>. This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
* Fix a circular dependency regarding pshift* functions and ↵Gravatar Gael Guennebaud2019-09-06
| | | | | | | GenericPacketMathFunctions. Another solution would have been to make pshift* fully generic template functions with partial specialization which is always a mess in c++03.
* Fix compilation without vector engine available (e.g., x86 with SSE disabled):Gravatar Gael Guennebaud2019-09-05
| | | | -> ppolevl is required by ndtri even for the scalar path
* Fix missing header inclusion and colliding definitions for half type ↵Gravatar Rasmus Munk Larsen2019-08-30
| | | | | | casting, which broke build with -march=native on Haswell/Skylake.
* Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of ↵Gravatar Rasmus Munk Larsen2019-08-27
| | | | half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
* [SYCL] This PR adds the minimum modifications to Eigen core required to run ↵Gravatar Mehdi Goli2019-06-27
| | | | | | | | Eigen unsupported modules on devices supporting SYCL. * Adding SYCL memory model * Enabling/Disabling SYCL backend in Core * Supporting Vectorization
* Implement AVX512 vectorization of std::complex<float/double>Gravatar Gael Guennebaud2018-12-06
|
* temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵Gravatar Gael Guennebaud2018-12-06
| | | | needs to be fixed though!
* Fix pandnot order in AVX512Gravatar Gael Guennebaud2018-11-30
|
* Add missing SSE/AVX type-casting in AVX512 modeGravatar Gael Guennebaud2018-11-28
|
* bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵Gravatar Gael Guennebaud2018-11-27
| | | | pshiftright_and_cast and pcast_and_shiftleft functions.
* Unify SSE/AVX psin functions.Gravatar Gael Guennebaud2018-11-27
| | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv
* Collapsed revision (based on pull request PR-325)Gravatar Christian von Schultz2018-10-22
| | | | | | | * Support compiling without IO streams Add the preprocessor definition EIGEN_NO_IO which, if defined, disables all use of the IO streams part of the standard library.
* bug #65: add vectorization of partial reductions along the outer-dimension, ↵Gravatar Gael Guennebaud2018-10-09
| | | | for instance: colmajor_mat.rowwise().mean()
* bug #231: initial implementation of STL iterators for dense expressionsGravatar Gael Guennebaud2018-10-01
|
* merge with default EigenGravatar Gael Guennebaud2018-09-21
|\
| * Creating separate SYCL required PR for uncontroversial files.Gravatar Mehdi Goli2018-08-03
| |
| * Add pcast packet op for NEON.Gravatar Rasmus Munk Larsen2018-07-26
| |
| * Add MIPS changes missing from previous merge.Gravatar Alexey Frunze2018-07-18
| |
| * More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDAGravatar Gael Guennebaud2018-07-18
| |
| * Forward declaring std::array does not work with all std libs, so let's just ↵Gravatar Gael Guennebaud2018-07-13
| | | | | | | | include <array>
| * Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate ↵Gravatar Gael Guennebaud2018-07-12
| | | | | | | | | | | | places (Macros.h), and alignment/vectorization logic is now in util/ConfigureVectorization.h
| * Updates corresponding to the latest round of PR feedbackGravatar Deven Desai2018-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The major changes are 1. Moving CUDA/PacketMath.h to GPU/PacketMath.h 2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h 3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h The above three changes effectively enable the Eigen "Packet" layer for the HIP platform 4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic") 5. Updating the "EIGEN_DEVICE_FUNC" marking in some places The change has been tested on the HIP and CUDA platforms.
| * merging updates from upstreamGravatar Deven Desai2018-07-11
| |\
| * | updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
| | * Extend CUDA support to matrix inversion and selfadjointeigensolverGravatar Andrea Bocci2018-06-11
| | |
| * | Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
| |/ | | | | | | | | | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
| * Define pcast<> for SSE types even when AVX is enabled. (otherwise float are ↵Gravatar Gael Guennebaud2018-05-29
| | | | | | | | silently reinterpreted as int instead of being converted)
| * AVX512: _mm512_rsqrt28_ps is available for AVX512ER onlyGravatar Gael Guennebaud2018-04-03
| |
| * MIsc. source and comment typosGravatar luz.paz2018-03-11
| | | | | | | | Found using `codespell` and `grep` from downstream FreeCAD
| * For cuda 9.1 replace math_functions.hpp with cuda_runtime.hGravatar nluehr2017-12-18
| |
| * Added support for CUDA 9.0.Gravatar Benoit Steiner2017-08-31
| |
| * bug #1462: remove all occurences of the deprecated __CUDACC_VER__ macro by ↵Gravatar Gael Guennebaud2017-08-24
| | | | | | | | introducing EIGEN_CUDACC_VER
* | mergeGravatar Gael Guennebaud2017-02-21
|\ \
* | | Add support for automatic-size deduction in reshaped, e.g.:Gravatar Gael Guennebaud2017-02-21
| | | | | | | | | | | | mat.reshaped(4,AutoSize); <-> mat.reshaped(4,mat.size()/4);
* | | Use fix<> API to specify compile-time reshaped sizes.Gravatar Gael Guennebaud2017-01-29
| | |
* | | Cleanup intitial reshape implementation:Gravatar Gael Guennebaud2017-01-29
| | | | | | | | | | | | | | | - reshape -> reshaped - make it compatible with evaluators.
* | | import yoco xiao's work on reshapeGravatar Gael Guennebaud2017-01-29
|\ \ \