Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix noise in lu unit test | 2018-12-08 | |
| | |||
* | bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of ↵ | 2018-12-07 | |
| | | | | register spilling. | ||
* | Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵ | 2018-12-07 | |
| | | | | to turn the #warning regarding AVX512-FMA to a #error. | ||
* | bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA | 2018-12-07 | |
| | |||
* | bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA | 2018-12-07 | |
| | |||
* | bug #1636: fix gemm performance issue with gcc>=6 and no FMA | 2018-12-07 | |
| | |||
* | AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only | 2018-12-06 | |
| | |||
* | Fix compilation with avx512f only, i.e., no AVX512DQ | 2018-12-06 | |
| | |||
* | fix test regarding AVX512 vectorization of complexes. | 2018-12-06 | |
| | |||
* | Implement AVX512 vectorization of std::complex<float/double> | 2018-12-06 | |
| | |||
* | temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵ | 2018-12-06 | |
| | | | | needs to be fixed though! | ||
* | bug #1636: fix compilation with some ABI versions. | 2018-12-06 | |
| | |||
* | #elif -> #else to fix GPU build. | 2018-12-05 | |
| | |||
* | Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554) | 2018-12-05 | |
|\ | | | | | | | | | | | Fix tensor contraction on AVX512 builds Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> | ||
* | | Add help messages in the quick ref/ascii docs regarding slicing, indexing, ↵ | 2018-12-05 | |
| | | | | | | | | and reshaping. | ||
* | | Fix page nesting | 2018-12-05 | |
| | | |||
* | | bug #1635: Use infinity from Numtraits instead of creating it manually. | 2018-12-05 | |
| | | |||
| * | Fix evalShardedByInnerDim for AVX512 builds | 2018-12-05 | |
| | | | | | | | | | | | | | | | | | | | | | | | | evalShardedByInnerDim ensures that the values it passes for start_k and end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel does not work correctly when the values of k are not multiples of the packet_size. While this precaution works for AVX builds, it is insufficient for AVX512 builds where the maximum packet size is 16. The result is slightly incorrect float32 contractions on AVX512 builds. This commit fixes the problem by ensuring that k is always a multiple of the packet_size if the packet_size is > 8. | ||
* | | Merged in ezhulenev/eigen-01 (pull request PR-553) | 2018-12-04 | |
|\ \ | | | | | | | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com> | ||
| * | | Update checks in ConfigureVectorization.h | 2018-12-03 | |
| | | | |||
| * | | Do not disable alignment with EIGEN_GPUCC | 2018-12-03 | |
| | | | |||
* | | | bug #785: Make Cholesky decomposition work for empty matrices | 2018-12-03 | |
|/ / | |||
* | | Add missing padd for Packet8i (it was implicitly generated by clang and gcc) | 2018-11-30 | |
| | | |||
* | | bug #1634: remove double copy in move-ctor of non movable Matrix/Array | 2018-11-30 | |
| | | |||
* | | Add packet sin and cos to Altivec/VSX and NEON | 2018-11-30 | |
| | | |||
* | | Several improvements regarding packet-bitwise operations: | 2018-11-30 | |
| | | | | | | | | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...) | ||
* | | Add psin/pcos on AVX512 -> almost for free, at last! | 2018-11-30 | |
| | | |||
* | | Cleanup | 2018-11-30 | |
| | | |||
* | | Fix pandnot order in AVX512 | 2018-11-30 | |
| | | |||
* | | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵ | 2018-11-30 | |
| | | | | | | | | it (-> this adds pcos for AVX) | ||
* | | Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) | 2018-11-28 | |
| | | |||
* | | same for pmax | 2018-11-28 | |
| | | |||
* | | pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵ | 2018-11-28 | |
| | | | | | | | | disable gcc workaround for fixed gcc versions | ||
* | | Add missing SSE/AVX type-casting in AVX512 mode | 2018-11-28 | |
| | | |||
* | | bug #1630: fix linspaced when requesting smaller packet size than default one. | 2018-11-28 | |
| | | |||
* | | Use explicit packet type in SSE/PacketMath pldexp | 2018-11-27 | |
| | | |||
* | | do not read buffers out of bounds -- load only the 4 bytes we know exist ↵ | 2018-11-27 | |
| | | | | | | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). | ||
* | | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵ | 2018-11-27 | |
| | | | | | | | | pshiftright_and_cast and pcast_and_shiftleft functions. | ||
* | | Update pshiftleft to pass the shift as a true compile-time integer. | 2018-11-27 | |
| | | |||
* | | Unify SSE/AVX psin functions. | 2018-11-27 | |
| | | | | | | | | | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv | ||
* | | Merged in bjacob/eigen/fixbuild (pull request PR-549) | 2018-11-27 | |
|\ \ | | | | | | | | | | fix the build on 64-bit ARM when NEON is disabled | ||
| * | | fix the build on 64-bit ARM when NEON is disabled | 2018-11-27 | |
|/ / | |||
* | | Unify Altivec/VSX pexp(double) with default implementation | 2018-11-27 | |
| | | |||
* | | cleanup | 2018-11-26 | |
| | | |||
* | | Unify SSE and AVX pexp for double. | 2018-11-26 | |
| | | |||
* | | Unify NEON's pexp with generic implementation | 2018-11-26 | |
| | | |||
* | | Unify Altivec/VSX's pexp with generic implementation | 2018-11-26 | |
| | | |||
* | | Unify SSE and AVX implementation of pexp | 2018-11-26 | |
| | | |||
* | | Unify Altivec/VSX's plog with generic implementation, and enable it! | 2018-11-26 | |
| | | |||
* | | Unify NEON's plog with generic implementation | 2018-11-26 | |
| | |