Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Optimize AVX pset1 for complexes and ploaddup | 2014-04-17 | |
| | |||
* | Fix and optimize mixed products | 2014-04-17 | |
| | |||
* | Optimize ploaddup for AVX | 2014-04-17 | |
| | |||
* | Fallback to lazy products for very small ones. | 2014-04-16 | |
| | |||
* | Enable alloca on MAC OSX | 2014-04-16 | |
| | |||
* | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | 2014-04-16 | |
| | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | ||
* | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | 2014-04-14 | |
| | | | | a recent version of gcc (ie gcc 4.8). | ||
* | Deleted some dead code. | 2014-04-04 | |
| | |||
* | Finally, prefetching seems to help getting more stable performance | 2014-03-31 | |
| | |||
* | Workaround alignment warnings | 2014-03-30 | |
| | |||
* | Optimize gebp kernel: | 2014-03-30 | |
| | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs | ||
* | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵ | 2014-03-28 | |
| | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size. | ||
* | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | |
| | |||
* | merge with default branch | 2014-03-28 | |
|\ | |||
* | | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵ | 2014-03-28 | |
| | | | | | | | | better than no vectorization) | ||
* | | Merged latest changes from parent. | 2014-03-27 | |
|\ \ | |||
* | | | Implemented the SSE version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | | |||
* | | | Implemented the AVX version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | | |||
* | | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵ | 2014-03-27 | |
| | | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel. | ||
| * | | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵ | 2014-03-27 | |
|/ / | | | | | | | the other fmadd variants plus some register moves...) | ||
* | | Fixed compilation error when FMA instructions are enabled. | 2014-03-27 | |
| | | |||
* | | Silenced "unused variable" warnings when compiling with FMA. | 2014-03-27 | |
| | | |||
* | | Vectorized the packing of a col-major matrix used as the right hand side ↵ | 2014-03-27 | |
| | | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance. | ||
* | | Vectorized the packing of a row-major matrix used as the left hand side ↵ | 2014-03-27 | |
| | | | | | | | | argument in a matrix-matrix product. | ||
* | | Implemented the AVX version of the ptranspose packet primitive. | 2014-03-27 | |
| | | |||
* | | Implement pcplflip, palign, predux and the likes from AVC/complexes | 2014-03-27 | |
| | | |||
| * | Fix warning | 2014-03-27 | |
| | | |||
* | | Created the ptranspose packet primitive that can transpose an array of N ↵ | 2014-03-26 | |
| | | | | | | | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions. | ||
* | | Made sure that the version of gemm_pack_rhs specialized for row major ↵ | 2014-03-26 | |
| | | | | | | | | matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode). | ||
* | | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵ | 2014-03-26 | |
| | | | | | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible. | ||
* | | Merged latest updates from the parent branch | 2014-03-26 | |
|\ \ | |||
| | * | Update gebp kernel to process a panle of 4 columns at once for the remaining ↵ | 2014-03-26 | |
| | | | | | | | | | | | | ones. | ||
| | * | Remove remaining bits of the dead working buffer | 2014-03-26 | |
| |/ | |||
* | | Vectorized the multiplication and division of complex numbers using AVX ↵ | 2014-03-26 | |
| | | | | | | | | instructions. | ||
* | | Used AVX instructions to vectorize the complex version of the pfirst and ↵ | 2014-03-26 | |
| | | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings. | ||
| * | Implement new 1 packet x 8 gebp kernel | 2014-03-26 | |
| | | |||
| * | add pbroadcast2/4 generic intrinsics | 2014-03-26 | |
| | | |||
* | | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵ | 2014-03-25 | |
| | | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf> | ||
* | | Used AVX instructions to vectorize the predux_min<Packet8f>, ↵ | 2014-03-24 | |
| | | | | | | | | predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives. | ||
* | | Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to ↵ | 2014-03-21 | |
| | | | | | | | | true to prevent build failures when vectorization is disabled. | ||
* | | Merged latest changes from the parent | 2014-03-18 | |
|\ \ | |||
* \ \ | Merged latest changes from the main trunk | 2014-02-24 | |
|\ \ \ | |||
* \ \ \ | Pulled latest changes from the Eigen main trunk | 2014-02-24 | |
|\ \ \ \ | |||
| | * | | | Merged eigen/eigen into default | 2014-02-24 | |
| |/| | | | |||
* | | | | | Added support for FMA instructions | 2014-02-24 | |
| | | | | | |||
| * | | | | Merged the latest version of the code from eigen/eigen | 2014-02-18 | |
|/| | | | | |||
* | | | | | Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a ↵ | 2014-02-18 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon. | ||
* | | | | | Added support for AVX to Eigen. | 2014-01-29 | |
| | | | | | |||
| | | | * | Improved the efficiency if the block-panel matrix multiplication code: the ↵ | 2014-01-02 | |
| | | | | | | | | | | | | | | | | | | | | change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize. | ||
| | | | * | Fix bug #222. Make temporary matrix column-major independently of ↵ | 2014-03-26 | |
| | | | | | | | | | | | | | | | | | | | | EIGEN_DEFAULT_TO_ROW_MAJOR |