aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/util/ConfigureVectorization.h
Commit message (Collapse)AuthorAge
* Add support for Arm SVEGravatar David Tellenbach2021-01-21
| | | | | | | | | | | | This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*). During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector. Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`. This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements. This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
* Add support for Armv8.2-a __fp16Gravatar David Tellenbach2020-10-28
| | | | | | | | | | | | | | | Armv8.2-a provides a native half-precision floating point (__fp16 aka. float16_t). This patch introduces * __fp16 as underlying type of Eigen::half if this type is available * the packet types Packet4hf and Packet8hf representing float16x4_t and float16x8_t respectively * packet-math for the above packets with corresponding scalar type Eigen::half The packet-math functionality has been implemented by Ashutosh Sharma <ashutosh.sharma@amperecomputing.com>. This closes #1940.
* Support BFloat16 in EigenGravatar Teng Lu2020-06-20
|
* Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as ↵Gravatar Deven Desai2020-03-12
| | | | the Eigen::Half packet type
* Merged in ↵Gravatar Rasmus Larsen2019-12-04
|\ | | | | | | | | | | | | | | anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754) Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda. Approved-by: Deven Desai <deven.desai.amd@gmail.com>
* | [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵Gravatar Mehdi Goli2019-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake
| * Update ConfigureVectorization.h to not optimize fp16 routines when compiling ↵Gravatar Anshul Jaiswal2019-11-06
| | | | | | | | with cuda.
* | Disable AVX on broken xcode versions. See PR 748.Gravatar Gael Guennebaud2019-11-12
|/ | | | Patch adapted from Hans Johnson's PR 748.
* Add workaround for choosing the right include files with FP16C support with ↵Gravatar Rasmus Munk Larsen2019-06-05
| | | | clang.
* Clean up CUDA/NVCC version macros and their use in Eigen, and a few other ↵Gravatar Rasmus Munk Larsen2019-05-31
| | | | CUDA build failures.
* Enable support for F16C with Clang. The required intrinsics were added here: ↵Gravatar Rasmus Munk Larsen2019-05-20
| | | | | | https://reviews.llvm.org/D16177 and are part of LLVM 3.8.0.
* updates requested in the PR feedback. Also droping coded within #ifdef ↵Gravatar Deven Desai2019-03-19
| | | | EIGEN_HAS_OLD_HIP_FP16
* bug #1678: Fix lack of __FMA__ macro on MSVC with AVX512Gravatar Gael Guennebaud2019-02-15
|
* Replace host_define.h with cuda_runtime_api.hGravatar nluehr2019-01-18
|
* Replace compiler's alignas/alignof extension by respective c++11 keywords ↵Gravatar Gael Guennebaud2019-01-11
| | | | when available. This also fix a compilation issue with gcc-4.7.
* Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵Gravatar Gael Guennebaud2018-12-07
| | | | to turn the #warning regarding AVX512-FMA to a #error.
* bug #1638: add a warning if avx512 is enabled without SSE/AVX FMAGravatar Gael Guennebaud2018-12-07
|
* #elif -> #else to fix GPU build.Gravatar Rasmus Munk Larsen2018-12-05
|
* Update checks in ConfigureVectorization.hGravatar Eugene Zhulenev2018-12-03
|
* Do not disable alignment with EIGEN_GPUCCGravatar Eugene Zhulenev2018-12-03
|
* Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTESGravatar Nikolaus Demmel2018-11-14
|
* This commit contains the following (HIP specific) updates:Gravatar Deven Desai2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | - unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h Changing "pass-by-reference" argument to be "pass-by-value" instead (in a __global__ function decl). "pass-by-reference" arguments to __global__ functions are unwise, and will be explicitly flagged as errors by the newer versions of HIP. - Eigen/src/Core/util/Memory.h - unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h Changes introduced in recent commits breaks the HIP compile. Adding EIGEN_DEVICE_FUNC attribute to some functions and calling ::malloc/free instead of the corresponding std:: versions to get the HIP compile working again - unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Change introduced a recent commit breaks the HIP compile (link stage errors out due to failure to inline a function). Disabling the recently introduced code (only for HIP compile), to get the eigen nightly testing going again. Will submit another PR once we have te proper fix. - Eigen/src/Core/util/ConfigureVectorization.h Enabling GPU VECTOR support when HIP compiler is in use (for both the host and device compile phases)
* Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the ↵Gravatar Christoph Hertzberg2018-09-14
| | | | possibility for alignments larger than the standard alignment.
* Add MIPS changes missing from previous merge.Gravatar Alexey Frunze2018-07-18
|
* Add support for MIPS SIMD (MSA)Gravatar Alexey Frunze2018-07-06
|
* Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate ↵Gravatar Gael Guennebaud2018-07-12
places (Macros.h), and alignment/vectorization logic is now in util/ConfigureVectorization.h