Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix compilation with expression template scalar type. | 2018-12-12 | |
| | |||
* | bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the ↵ | 2018-12-12 | |
| | | | | diagonal. | ||
* | bug #1644: fix warning | 2018-12-11 | |
| | |||
* | Artificially increase l1-blocking size for AVX512. +10% speedup with current ↵ | 2018-12-11 | |
| | | | | | | kernels. With a 6pX4 kernel (not committed yet), this provides a +20% speedup. | ||
* | Properly set the number of registers for AVX512 | 2018-12-11 | |
| | |||
* | bug #1643: fix compilation issue with gcc and no optimizaion | 2018-12-11 | |
| | |||
* | enable spilling workaround on architectures with SSE/AVX | 2018-12-10 | |
| | |||
* | workaround "may be used uninitialized" warning | 2018-12-08 | |
| | |||
* | bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 | 2018-12-08 | |
| | |||
* | fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non ↵ | 2018-12-08 | |
| | | | | x86/64 target | ||
* | bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of ↵ | 2018-12-07 | |
| | | | | register spilling. | ||
* | Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵ | 2018-12-07 | |
| | | | | to turn the #warning regarding AVX512-FMA to a #error. | ||
* | bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA | 2018-12-07 | |
| | |||
* | bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA | 2018-12-07 | |
| | |||
* | bug #1636: fix gemm performance issue with gcc>=6 and no FMA | 2018-12-07 | |
| | |||
* | AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only | 2018-12-06 | |
| | |||
* | Fix compilation with avx512f only, i.e., no AVX512DQ | 2018-12-06 | |
| | |||
* | Implement AVX512 vectorization of std::complex<float/double> | 2018-12-06 | |
| | |||
* | temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵ | 2018-12-06 | |
| | | | | needs to be fixed though! | ||
* | bug #1636: fix compilation with some ABI versions. | 2018-12-06 | |
| | |||
* | #elif -> #else to fix GPU build. | 2018-12-05 | |
| | |||
* | bug #1635: Use infinity from Numtraits instead of creating it manually. | 2018-12-05 | |
| | |||
* | Merged in ezhulenev/eigen-01 (pull request PR-553) | 2018-12-04 | |
|\ | | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> | ||
| * | Update checks in ConfigureVectorization.h | 2018-12-03 | |
| | | |||
| * | Do not disable alignment with EIGEN_GPUCC | 2018-12-03 | |
| | | |||
* | | bug #785: Make Cholesky decomposition work for empty matrices | 2018-12-03 | |
|/ | |||
* | Add missing padd for Packet8i (it was implicitly generated by clang and gcc) | 2018-11-30 | |
| | |||
* | bug #1634: remove double copy in move-ctor of non movable Matrix/Array | 2018-11-30 | |
| | |||
* | Add packet sin and cos to Altivec/VSX and NEON | 2018-11-30 | |
| | |||
* | Several improvements regarding packet-bitwise operations: | 2018-11-30 | |
| | | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...) | ||
* | Add psin/pcos on AVX512 -> almost for free, at last! | 2018-11-30 | |
| | |||
* | Cleanup | 2018-11-30 | |
| | |||
* | Fix pandnot order in AVX512 | 2018-11-30 | |
| | |||
* | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵ | 2018-11-30 | |
| | | | | it (-> this adds pcos for AVX) | ||
* | Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) | 2018-11-28 | |
| | |||
* | same for pmax | 2018-11-28 | |
| | |||
* | pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵ | 2018-11-28 | |
| | | | | disable gcc workaround for fixed gcc versions | ||
* | Add missing SSE/AVX type-casting in AVX512 mode | 2018-11-28 | |
| | |||
* | bug #1630: fix linspaced when requesting smaller packet size than default one. | 2018-11-28 | |
| | |||
* | Use explicit packet type in SSE/PacketMath pldexp | 2018-11-27 | |
| | |||
* | do not read buffers out of bounds -- load only the 4 bytes we know exist ↵ | 2018-11-27 | |
| | | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). | ||
* | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵ | 2018-11-27 | |
| | | | | pshiftright_and_cast and pcast_and_shiftleft functions. | ||
* | Update pshiftleft to pass the shift as a true compile-time integer. | 2018-11-27 | |
| | |||
* | Unify SSE/AVX psin functions. | 2018-11-27 | |
| | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv | ||
* | fix the build on 64-bit ARM when NEON is disabled | 2018-11-27 | |
| | |||
* | Unify Altivec/VSX pexp(double) with default implementation | 2018-11-27 | |
| | |||
* | cleanup | 2018-11-26 | |
| | |||
* | Unify SSE and AVX pexp for double. | 2018-11-26 | |
| | |||
* | Unify NEON's pexp with generic implementation | 2018-11-26 | |
| | |||
* | Unify Altivec/VSX's pexp with generic implementation | 2018-11-26 | |
| |