Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | bug #1643: fix compilation issue with gcc and no optimizaion | 2018-12-11 | ||
| | ||||
* | enable spilling workaround on architectures with SSE/AVX | 2018-12-10 | ||
| | ||||
* | workaround "may be used uninitialized" warning | 2018-12-08 | ||
| | ||||
* | bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 | 2018-12-08 | ||
| | ||||
* | fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non ↵ | 2018-12-08 | ||
| | | | | x86/64 target | |||
* | bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of ↵ | 2018-12-07 | ||
| | | | | register spilling. | |||
* | Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵ | 2018-12-07 | ||
| | | | | to turn the #warning regarding AVX512-FMA to a #error. | |||
* | bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA | 2018-12-07 | ||
| | ||||
* | bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA | 2018-12-07 | ||
| | ||||
* | bug #1636: fix gemm performance issue with gcc>=6 and no FMA | 2018-12-07 | ||
| | ||||
* | AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only | 2018-12-06 | ||
| | ||||
* | Fix compilation with avx512f only, i.e., no AVX512DQ | 2018-12-06 | ||
| | ||||
* | Implement AVX512 vectorization of std::complex<float/double> | 2018-12-06 | ||
| | ||||
* | temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵ | 2018-12-06 | ||
| | | | | needs to be fixed though! | |||
* | bug #1636: fix compilation with some ABI versions. | 2018-12-06 | ||
| | ||||
* | #elif -> #else to fix GPU build. | 2018-12-05 | ||
| | ||||
* | bug #1635: Use infinity from Numtraits instead of creating it manually. | 2018-12-05 | ||
| | ||||
* | Merged in ezhulenev/eigen-01 (pull request PR-553) | 2018-12-04 | ||
|\ | | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> | |||
| * | Update checks in ConfigureVectorization.h | 2018-12-03 | ||
| | | ||||
| * | Do not disable alignment with EIGEN_GPUCC | 2018-12-03 | ||
| | | ||||
* | | bug #785: Make Cholesky decomposition work for empty matrices | 2018-12-03 | ||
|/ | ||||
* | Add missing padd for Packet8i (it was implicitly generated by clang and gcc) | 2018-11-30 | ||
| | ||||
* | bug #1634: remove double copy in move-ctor of non movable Matrix/Array | 2018-11-30 | ||
| | ||||
* | Add packet sin and cos to Altivec/VSX and NEON | 2018-11-30 | ||
| | ||||
* | Several improvements regarding packet-bitwise operations: | 2018-11-30 | ||
| | | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...) | |||
* | Add psin/pcos on AVX512 -> almost for free, at last! | 2018-11-30 | ||
| | ||||
* | Cleanup | 2018-11-30 | ||
| | ||||
* | Fix pandnot order in AVX512 | 2018-11-30 | ||
| | ||||
* | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵ | 2018-11-30 | ||
| | | | | it (-> this adds pcos for AVX) | |||
* | Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) | 2018-11-28 | ||
| | ||||
* | same for pmax | 2018-11-28 | ||
| | ||||
* | pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵ | 2018-11-28 | ||
| | | | | disable gcc workaround for fixed gcc versions | |||
* | Add missing SSE/AVX type-casting in AVX512 mode | 2018-11-28 | ||
| | ||||
* | bug #1630: fix linspaced when requesting smaller packet size than default one. | 2018-11-28 | ||
| | ||||
* | Use explicit packet type in SSE/PacketMath pldexp | 2018-11-27 | ||
| | ||||
* | do not read buffers out of bounds -- load only the 4 bytes we know exist ↵ | 2018-11-27 | ||
| | | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). | |||
* | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵ | 2018-11-27 | ||
| | | | | pshiftright_and_cast and pcast_and_shiftleft functions. | |||
* | Update pshiftleft to pass the shift as a true compile-time integer. | 2018-11-27 | ||
| | ||||
* | Unify SSE/AVX psin functions. | 2018-11-27 | ||
| | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv | |||
* | fix the build on 64-bit ARM when NEON is disabled | 2018-11-27 | ||
| | ||||
* | Unify Altivec/VSX pexp(double) with default implementation | 2018-11-27 | ||
| | ||||
* | cleanup | 2018-11-26 | ||
| | ||||
* | Unify SSE and AVX pexp for double. | 2018-11-26 | ||
| | ||||
* | Unify NEON's pexp with generic implementation | 2018-11-26 | ||
| | ||||
* | Unify Altivec/VSX's pexp with generic implementation | 2018-11-26 | ||
| | ||||
* | Unify SSE and AVX implementation of pexp | 2018-11-26 | ||
| | ||||
* | Unify Altivec/VSX's plog with generic implementation, and enable it! | 2018-11-26 | ||
| | ||||
* | Unify NEON's plog with generic implementation | 2018-11-26 | ||
| | ||||
* | First step toward a unification of packet log implementation, currently only ↵ | 2018-11-26 | ||
| | | | | | | SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions. | |||
* | Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" | 2018-11-26 | ||
| |