Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | | Disable 3pX4 kernel on Altivec: despite this platform has 32 registers, this ↵ | 2014-04-25 | ||
| | | | | | | | | version seems significantly slower. | |||
* | | Fix ptranspose overload prototypes for NEON | 2014-04-25 | ||
| | | ||||
* | | Minor optimizations in product kernel: | 2014-04-25 | ||
| | | | | | | | | | | - use pbroadcast4 (helpful when AVX is not available) - process all remaining rows at once (significant speedup for small matrices) | |||
* | | Enable vectorization of pack_rhs with a column-major RHS. | 2014-04-25 | ||
| | | | | | | | | Rename and generalize Kernel<*> to PacketBlock<*,N>. | |||
* | | Enable fused madd for Altivec | 2014-04-24 | ||
| | | ||||
* | | Implement ptranspose on altivec and fix pgather/pscatter | 2014-04-24 | ||
| | | ||||
* | | Fixed the NEON implementation of predux_max<Packet4i>. | 2014-04-23 | ||
| | | ||||
* | | Created a NEON version of the ptranspose packet primitives | 2014-04-23 | ||
| | | ||||
* | | Add Altivec implementation of pgather/pscatter (not tested) | 2014-04-23 | ||
| | | ||||
* | | Fix EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT macro | 2014-04-22 | ||
| | | ||||
* | | merge with default branch | 2014-04-22 | ||
|\ \ | ||||
* | | | Workaround gcc's default ABI not being able to distinghish between vector ↵ | 2014-04-22 | ||
| | | | | | | | | | | | | types of different sizes. | |||
* | | | Fix 128bit packet size assumptions in unit tests. | 2014-04-18 | ||
| | | | ||||
* | | | Fix alignment assertion. | 2014-04-18 | ||
| | | | ||||
* | | | Fix calls to lazy products (lazy product does not like matrices with 0 length) | 2014-04-18 | ||
| | | | ||||
* | | | Smarter block size computation | 2014-04-18 | ||
| | | | ||||
* | | | Fix typo (was working with clang\!) | 2014-04-18 | ||
| | | | ||||
* | | | Fixes for fixed sizes and non vectorizable types | 2014-04-17 | ||
| | | | ||||
* | | | merge | 2014-04-17 | ||
|\ \ \ | ||||
| * | | | Implemented the pgather/pscatter packet primitives for the arm/NEON architecture | 2014-04-17 | ||
| | | | | ||||
* | | | | Optimize AVX pset1 for complexes and ploaddup | 2014-04-17 | ||
| | | | | ||||
* | | | | Fix and optimize mixed products | 2014-04-17 | ||
| | | | | ||||
* | | | | Optimize ploaddup for AVX | 2014-04-17 | ||
|/ / / | ||||
* | | | Fallback to lazy products for very small ones. | 2014-04-16 | ||
| | | | ||||
* | | | Enable alloca on MAC OSX | 2014-04-16 | ||
| | | | ||||
* | | | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | 2014-04-16 | ||
| | | | | | | | | | | | | | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | |||
| * | | Check IMKL version for compatibility with Eigen | 2014-04-15 | ||
| | | | ||||
| * | | bug #793: detect NaN and INF in EigenSolver instead of aborting with an assert. | 2014-04-14 | ||
| | | | ||||
| * | | Add isfinite overload for complexes. | 2014-04-14 | ||
| | | | ||||
* | | | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | 2014-04-14 | ||
| | | | | | | | | | | | | a recent version of gcc (ie gcc 4.8). | |||
| * | | bug #790: fix overflow in real_2x2_jacobi_svd | 2014-04-14 | ||
| | | | ||||
| * | | bug #793: fix overflow in EigenSolver and add respective regression unit test | 2014-04-14 | ||
| | | | ||||
| * | | Updated my previous fix to avoid introducing a compilation warning on ARM ↵ | 2014-04-10 | ||
| | | | | | | | | | | | | platforms. | |||
| * | | Silenced a compilation warning produced by nvcc. | 2014-04-10 | ||
| |/ | ||||
| * | doc: Add references to Cholesky methods in SelfAdjointView. | 2014-04-07 | ||
| | | ||||
* | | Deleted some dead code. | 2014-04-04 | ||
| | | ||||
| * | Fix bug #784: Assert if assigning a product to a triangularView does not ↵ | 2014-04-04 | ||
| | | | | | | | | match the size. | |||
| * | bug #782: Workaround for gcc <= 4.4 compilation error on the NEON PacketMath ↵ | 2014-04-03 | ||
| | | | | | | | | code. | |||
* | | Finally, prefetching seems to help getting more stable performance | 2014-03-31 | ||
| | | ||||
* | | Workaround alignment warnings | 2014-03-30 | ||
| | | ||||
* | | Optimize gebp kernel: | 2014-03-30 | ||
| | | | | | | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs | |||
* | | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵ | 2014-03-28 | ||
| | | | | | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size. | |||
* | | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | ||
| | | ||||
* | | merge with default branch | 2014-03-28 | ||
|\| | ||||
* | | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵ | 2014-03-28 | ||
| | | | | | | | | better than no vectorization) | |||
* | | Merged latest changes from parent. | 2014-03-27 | ||
|\ \ | ||||
* | | | Implemented the SSE version of the gather and scatter packet primitives. | 2014-03-27 | ||
| | | | ||||
* | | | Implemented the AVX version of the gather and scatter packet primitives. | 2014-03-27 | ||
| | | | ||||
* | | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵ | 2014-03-27 | ||
| | | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel. | |||
| * | | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵ | 2014-03-27 | ||
|/ / | | | | | | | the other fmadd variants plus some register moves...) |