Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵ | Gael Guennebaud | 2018-12-06 |
| | | | | needs to be fixed though! | ||
* | bug #1636: fix compilation with some ABI versions. | Gael Guennebaud | 2018-12-06 |
| | |||
* | #elif -> #else to fix GPU build. | Rasmus Munk Larsen | 2018-12-05 |
| | |||
* | bug #1635: Use infinity from Numtraits instead of creating it manually. | Christoph Hertzberg | 2018-12-05 |
| | |||
* | Merged in ezhulenev/eigen-01 (pull request PR-553) | Rasmus Munk Larsen | 2018-12-04 |
|\ | | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> | ||
| * | Update checks in ConfigureVectorization.h | Eugene Zhulenev | 2018-12-03 |
| | | |||
| * | Do not disable alignment with EIGEN_GPUCC | Eugene Zhulenev | 2018-12-03 |
| | | |||
* | | bug #785: Make Cholesky decomposition work for empty matrices | Christoph Hertzberg | 2018-12-03 |
|/ | |||
* | Add missing padd for Packet8i (it was implicitly generated by clang and gcc) | Gael Guennebaud | 2018-11-30 |
| | |||
* | bug #1634: remove double copy in move-ctor of non movable Matrix/Array | Gael Guennebaud | 2018-11-30 |
| | |||
* | Add packet sin and cos to Altivec/VSX and NEON | Gael Guennebaud | 2018-11-30 |
| | |||
* | Several improvements regarding packet-bitwise operations: | Gael Guennebaud | 2018-11-30 |
| | | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...) | ||
* | Add psin/pcos on AVX512 -> almost for free, at last! | Gael Guennebaud | 2018-11-30 |
| | |||
* | Cleanup | Gael Guennebaud | 2018-11-30 |
| | |||
* | Fix pandnot order in AVX512 | Gael Guennebaud | 2018-11-30 |
| | |||
* | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵ | Gael Guennebaud | 2018-11-30 |
| | | | | it (-> this adds pcos for AVX) | ||
* | Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) | Gael Guennebaud | 2018-11-28 |
| | |||
* | same for pmax | Gael Guennebaud | 2018-11-28 |
| | |||
* | pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵ | Gael Guennebaud | 2018-11-28 |
| | | | | disable gcc workaround for fixed gcc versions | ||
* | Add missing SSE/AVX type-casting in AVX512 mode | Gael Guennebaud | 2018-11-28 |
| | |||
* | bug #1630: fix linspaced when requesting smaller packet size than default one. | Gael Guennebaud | 2018-11-28 |
| | |||
* | Use explicit packet type in SSE/PacketMath pldexp | Eugene Zhulenev | 2018-11-27 |
| | |||
* | do not read buffers out of bounds -- load only the 4 bytes we know exist ↵ | Benoit Jacob | 2018-11-27 |
| | | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). | ||
* | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵ | Gael Guennebaud | 2018-11-27 |
| | | | | pshiftright_and_cast and pcast_and_shiftleft functions. | ||
* | Update pshiftleft to pass the shift as a true compile-time integer. | Gael Guennebaud | 2018-11-27 |
| | |||
* | Unify SSE/AVX psin functions. | Gael Guennebaud | 2018-11-27 |
| | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv | ||
* | fix the build on 64-bit ARM when NEON is disabled | Benoit Jacob | 2018-11-27 |
| | |||
* | Unify Altivec/VSX pexp(double) with default implementation | Gael Guennebaud | 2018-11-27 |
| | |||
* | cleanup | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify SSE and AVX pexp for double. | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify NEON's pexp with generic implementation | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify Altivec/VSX's pexp with generic implementation | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify SSE and AVX implementation of pexp | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify Altivec/VSX's plog with generic implementation, and enable it! | Gael Guennebaud | 2018-11-26 |
| | |||
* | Unify NEON's plog with generic implementation | Gael Guennebaud | 2018-11-26 |
| | |||
* | First step toward a unification of packet log implementation, currently only ↵ | Gael Guennebaud | 2018-11-26 |
| | | | | | | SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions. | ||
* | Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" | Gael Guennebaud | 2018-11-26 |
| | |||
* | bug #1611: fix plog(0) on NEON | Gael Guennebaud | 2018-11-26 |
| | |||
* | Fix typos | Patrik Huber | 2018-11-23 |
| | |||
* | Fix reserved usage of double __ in macro names | Gael Guennebaud | 2018-11-23 |
| | |||
* | Fix several uninitialized member from ctor | Gael Guennebaud | 2018-11-23 |
| | |||
* | bug #1624: improve matrix-matrix product on ARM 64, 20% speedup | Gael Guennebaud | 2018-11-23 |
| | |||
* | Workaround weird MSVC bug | Gael Guennebaud | 2018-11-21 |
| | |||
* | Make MaxPacketSize a true upper bound, even for fixed-size inputs | Gael Guennebaud | 2018-11-16 |
| | |||
* | PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals | Mark D Ryan | 2018-11-13 |
| | | | | | | | | | | | | | | | | | | | | | | | Commit aa110e681b8b2237757a652ba47da49e1fbd2cd6 optimised the multiplication of small dyanmically sized matrices by restricting the packet size to a maximum of 4, increasing the chances that SIMD instructions are used in the computation. However, it introduced a mismatch between the packet size and the requestedAlignment. This mismatch can lead to crashes when the destination is not aligned. This patch fixes the issue by ensuring that the AssignmentTraits are correctly computed when using a restricted packet size. * * * Bind LinearPacketType to MaxPacketSize This commit applies any packet size limit specified when instantiating copy_using_evaluator_traits to the LinearPacketType, providing that the size of the destination is not known at compile time. * * * Add unit test for restricted packet assignment A new unit test is added to check that multiplication of small dynamically sized matrices works correctly when the packet size is restricted to 4 and the destination is unaligned. | ||
* | Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES | Nikolaus Demmel | 2018-11-14 |
| | |||
* | typo | Gael Guennebaud | 2018-11-14 |
| | |||
* | help doxygen linking to DenseBase::NulllaryExpr | Gael Guennebaud | 2018-11-14 |
| | |||
* | [PATCH 1/2] Misc. typos | luz.paz" | 2018-09-18 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt | 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h | 2 +- Eigen/src/SparseLU/SparseLU.h | 2 +- bench/bench_norm.cpp | 2 +- doc/HiPerformance.dox | 2 +- doc/QuickStartGuide.dox | 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +- .../src/Tensor/TensorForwardDeclarations.h | 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h | 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h | 2 +- .../test/cxx11_tensor_concatenation.cpp | 2 +- unsupported/test/cxx11_tensor_executor.cpp | 2 +- 14 files changed, 29 insertions(+), 29 deletions(-) | ||
* | Add optimized version of logistic function for float. As an example, this is ↵ | Rasmus Munk Larsen | 2018-11-12 |
| | | | | about 50% faster than the existing version on Haswell using AVX. |