aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Fix noise in lu unit testGravatar Gael Guennebaud2018-12-08
|
* bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of ↵Gravatar Gael Guennebaud2018-12-07
| | | | register spilling.
* Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵Gravatar Gael Guennebaud2018-12-07
| | | | to turn the #warning regarding AVX512-FMA to a #error.
* bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMAGravatar Gael Guennebaud2018-12-07
|
* bug #1638: add a warning if avx512 is enabled without SSE/AVX FMAGravatar Gael Guennebaud2018-12-07
|
* bug #1636: fix gemm performance issue with gcc>=6 and no FMAGravatar Gael Guennebaud2018-12-07
|
* AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f onlyGravatar Gael Guennebaud2018-12-06
|
* Fix compilation with avx512f only, i.e., no AVX512DQGravatar Gael Guennebaud2018-12-06
|
* fix test regarding AVX512 vectorization of complexes.Gravatar Gael Guennebaud2018-12-06
|
* Implement AVX512 vectorization of std::complex<float/double>Gravatar Gael Guennebaud2018-12-06
|
* temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵Gravatar Gael Guennebaud2018-12-06
| | | | needs to be fixed though!
* bug #1636: fix compilation with some ABI versions.Gravatar Gael Guennebaud2018-12-06
|
* #elif -> #else to fix GPU build.Gravatar Rasmus Munk Larsen2018-12-05
|
* Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)Gravatar Rasmus Munk Larsen2018-12-05
|\ | | | | | | | | | | Fix tensor contraction on AVX512 builds Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
* | Add help messages in the quick ref/ascii docs regarding slicing, indexing, ↵Gravatar Gael Guennebaud2018-12-05
| | | | | | | | and reshaping.
* | Fix page nestingGravatar Gael Guennebaud2018-12-05
| |
* | bug #1635: Use infinity from Numtraits instead of creating it manually.Gravatar Christoph Hertzberg2018-12-05
| |
| * Fix evalShardedByInnerDim for AVX512 buildsGravatar Mark D Ryan2018-12-05
| | | | | | | | | | | | | | | | | | | | | | | | evalShardedByInnerDim ensures that the values it passes for start_k and end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel does not work correctly when the values of k are not multiples of the packet_size. While this precaution works for AVX builds, it is insufficient for AVX512 builds where the maximum packet size is 16. The result is slightly incorrect float32 contractions on AVX512 builds. This commit fixes the problem by ensuring that k is always a multiple of the packet_size if the packet_size is > 8.
* | Merged in ezhulenev/eigen-01 (pull request PR-553)Gravatar Rasmus Munk Larsen2018-12-04
|\ \ | | | | | | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
| * | Update checks in ConfigureVectorization.hGravatar Eugene Zhulenev2018-12-03
| | |
| * | Do not disable alignment with EIGEN_GPUCCGravatar Eugene Zhulenev2018-12-03
| | |
* | | bug #785: Make Cholesky decomposition work for empty matricesGravatar Christoph Hertzberg2018-12-03
|/ /
* | Add missing padd for Packet8i (it was implicitly generated by clang and gcc)Gravatar Gael Guennebaud2018-11-30
| |
* | bug #1634: remove double copy in move-ctor of non movable Matrix/ArrayGravatar Gael Guennebaud2018-11-30
| |
* | Add packet sin and cos to Altivec/VSX and NEONGravatar Gael Guennebaud2018-11-30
| |
* | Several improvements regarding packet-bitwise operations:Gravatar Gael Guennebaud2018-11-30
| | | | | | | | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)
* | Add psin/pcos on AVX512 -> almost for free, at last!Gravatar Gael Guennebaud2018-11-30
| |
* | CleanupGravatar Gael Guennebaud2018-11-30
| |
* | Fix pandnot order in AVX512Gravatar Gael Guennebaud2018-11-30
| |
* | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵Gravatar Gael Guennebaud2018-11-30
| | | | | | | | it (-> this adds pcos for AVX)
* | Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)Gravatar Gael Guennebaud2018-11-28
| |
* | same for pmaxGravatar Gael Guennebaud2018-11-28
| |
* | pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵Gravatar Gael Guennebaud2018-11-28
| | | | | | | | disable gcc workaround for fixed gcc versions
* | Add missing SSE/AVX type-casting in AVX512 modeGravatar Gael Guennebaud2018-11-28
| |
* | bug #1630: fix linspaced when requesting smaller packet size than default one.Gravatar Gael Guennebaud2018-11-28
| |
* | Use explicit packet type in SSE/PacketMath pldexpGravatar Eugene Zhulenev2018-11-27
| |
* | do not read buffers out of bounds -- load only the 4 bytes we know exist ↵Gravatar Benoit Jacob2018-11-27
| | | | | | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
* | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵Gravatar Gael Guennebaud2018-11-27
| | | | | | | | pshiftright_and_cast and pcast_and_shiftleft functions.
* | Update pshiftleft to pass the shift as a true compile-time integer.Gravatar Gael Guennebaud2018-11-27
| |
* | Unify SSE/AVX psin functions.Gravatar Gael Guennebaud2018-11-27
| | | | | | | | | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv
* | Merged in bjacob/eigen/fixbuild (pull request PR-549)Gravatar Rasmus Munk Larsen2018-11-27
|\ \ | | | | | | | | | fix the build on 64-bit ARM when NEON is disabled
| * | fix the build on 64-bit ARM when NEON is disabledGravatar Benoit Jacob2018-11-27
|/ /
* | Unify Altivec/VSX pexp(double) with default implementationGravatar Gael Guennebaud2018-11-27
| |
* | cleanupGravatar Gael Guennebaud2018-11-26
| |
* | Unify SSE and AVX pexp for double.Gravatar Gael Guennebaud2018-11-26
| |
* | Unify NEON's pexp with generic implementationGravatar Gael Guennebaud2018-11-26
| |
* | Unify Altivec/VSX's pexp with generic implementationGravatar Gael Guennebaud2018-11-26
| |
* | Unify SSE and AVX implementation of pexpGravatar Gael Guennebaud2018-11-26
| |
* | Unify Altivec/VSX's plog with generic implementation, and enable it!Gravatar Gael Guennebaud2018-11-26
| |
* | Unify NEON's plog with generic implementationGravatar Gael Guennebaud2018-11-26
| |