aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen
Commit message (Collapse)AuthorAge
...
* bug #1643: fix compilation issue with gcc and no optimizaionGravatar Gael Guennebaud2018-12-11
|
* enable spilling workaround on architectures with SSE/AVXGravatar Gael Guennebaud2018-12-10
|
* workaround "may be used uninitialized" warningGravatar Gael Guennebaud2018-12-08
|
* bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512Gravatar Gael Guennebaud2018-12-08
|
* fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non ↵Gravatar Gael Guennebaud2018-12-08
| | | | x86/64 target
* bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of ↵Gravatar Gael Guennebaud2018-12-07
| | | | register spilling.
* Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has ↵Gravatar Gael Guennebaud2018-12-07
| | | | to turn the #warning regarding AVX512-FMA to a #error.
* bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMAGravatar Gael Guennebaud2018-12-07
|
* bug #1638: add a warning if avx512 is enabled without SSE/AVX FMAGravatar Gael Guennebaud2018-12-07
|
* bug #1636: fix gemm performance issue with gcc>=6 and no FMAGravatar Gael Guennebaud2018-12-07
|
* AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f onlyGravatar Gael Guennebaud2018-12-06
|
* Fix compilation with avx512f only, i.e., no AVX512DQGravatar Gael Guennebaud2018-12-06
|
* Implement AVX512 vectorization of std::complex<float/double>Gravatar Gael Guennebaud2018-12-06
|
* temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this ↵Gravatar Gael Guennebaud2018-12-06
| | | | needs to be fixed though!
* bug #1636: fix compilation with some ABI versions.Gravatar Gael Guennebaud2018-12-06
|
* #elif -> #else to fix GPU build.Gravatar Rasmus Munk Larsen2018-12-05
|
* bug #1635: Use infinity from Numtraits instead of creating it manually.Gravatar Christoph Hertzberg2018-12-05
|
* Merged in ezhulenev/eigen-01 (pull request PR-553)Gravatar Rasmus Munk Larsen2018-12-04
|\ | | | | | | | | | | Do not disable alignment with EIGEN_GPUCC Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
| * Update checks in ConfigureVectorization.hGravatar Eugene Zhulenev2018-12-03
| |
| * Do not disable alignment with EIGEN_GPUCCGravatar Eugene Zhulenev2018-12-03
| |
* | bug #785: Make Cholesky decomposition work for empty matricesGravatar Christoph Hertzberg2018-12-03
|/
* Add missing padd for Packet8i (it was implicitly generated by clang and gcc)Gravatar Gael Guennebaud2018-11-30
|
* bug #1634: remove double copy in move-ctor of non movable Matrix/ArrayGravatar Gael Guennebaud2018-11-30
|
* Add packet sin and cos to Altivec/VSX and NEONGravatar Gael Guennebaud2018-11-30
|
* Several improvements regarding packet-bitwise operations:Gravatar Gael Guennebaud2018-11-30
| | | | | | - add unit tests - optimize their AVX512f implementation - add missing implementations (half, Packet4f, ...)
* Add psin/pcos on AVX512 -> almost for free, at last!Gravatar Gael Guennebaud2018-11-30
|
* CleanupGravatar Gael Guennebaud2018-11-30
|
* Fix pandnot order in AVX512Gravatar Gael Guennebaud2018-11-30
|
* Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵Gravatar Gael Guennebaud2018-11-30
| | | | it (-> this adds pcos for AVX)
* Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)Gravatar Gael Guennebaud2018-11-28
|
* same for pmaxGravatar Gael Guennebaud2018-11-28
|
* pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and ↵Gravatar Gael Guennebaud2018-11-28
| | | | disable gcc workaround for fixed gcc versions
* Add missing SSE/AVX type-casting in AVX512 modeGravatar Gael Guennebaud2018-11-28
|
* bug #1630: fix linspaced when requesting smaller packet size than default one.Gravatar Gael Guennebaud2018-11-28
|
* Use explicit packet type in SSE/PacketMath pldexpGravatar Eugene Zhulenev2018-11-27
|
* do not read buffers out of bounds -- load only the 4 bytes we know exist ↵Gravatar Benoit Jacob2018-11-27
| | | | here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
* bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵Gravatar Gael Guennebaud2018-11-27
| | | | pshiftright_and_cast and pcast_and_shiftleft functions.
* Update pshiftleft to pass the shift as a true compile-time integer.Gravatar Gael Guennebaud2018-11-27
|
* Unify SSE/AVX psin functions.Gravatar Gael Guennebaud2018-11-27
| | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv
* fix the build on 64-bit ARM when NEON is disabledGravatar Benoit Jacob2018-11-27
|
* Unify Altivec/VSX pexp(double) with default implementationGravatar Gael Guennebaud2018-11-27
|
* cleanupGravatar Gael Guennebaud2018-11-26
|
* Unify SSE and AVX pexp for double.Gravatar Gael Guennebaud2018-11-26
|
* Unify NEON's pexp with generic implementationGravatar Gael Guennebaud2018-11-26
|
* Unify Altivec/VSX's pexp with generic implementationGravatar Gael Guennebaud2018-11-26
|
* Unify SSE and AVX implementation of pexpGravatar Gael Guennebaud2018-11-26
|
* Unify Altivec/VSX's plog with generic implementation, and enable it!Gravatar Gael Guennebaud2018-11-26
|
* Unify NEON's plog with generic implementationGravatar Gael Guennebaud2018-11-26
|
* First step toward a unification of packet log implementation, currently only ↵Gravatar Gael Guennebaud2018-11-26
| | | | | | SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
* Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"Gravatar Gael Guennebaud2018-11-26
|