aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src
Commit message (Collapse)AuthorAge
* Optimize AVX pset1 for complexes and ploaddupGravatar Gael Guennebaud2014-04-17
|
* Fix and optimize mixed productsGravatar Gael Guennebaud2014-04-17
|
* Optimize ploaddup for AVXGravatar Gael Guennebaud2014-04-17
|
* Fallback to lazy products for very small ones.Gravatar Gael Guennebaud2014-04-16
|
* Enable alloca on MAC OSXGravatar Gael Guennebaud2014-04-16
|
* New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵Gravatar Gael Guennebaud2014-04-16
| | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
* Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵Gravatar Benoit Steiner2014-04-14
| | | | a recent version of gcc (ie gcc 4.8).
* Deleted some dead code.Gravatar Benoit Steiner2014-04-04
|
* Finally, prefetching seems to help getting more stable performanceGravatar Gael Guennebaud2014-03-31
|
* Workaround alignment warningsGravatar Gael Guennebaud2014-03-30
|
* Optimize gebp kernel:Gravatar Gael Guennebaud2014-03-30
| | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
* Vectorized the loop peeling of the inner loop of the block-panel matrix ↵Gravatar Benoit Steiner2014-03-28
| | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
* Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
|
* merge with default branchGravatar Gael Guennebaud2014-03-28
|\
* | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵Gravatar Gael Guennebaud2014-03-28
| | | | | | | | better than no vectorization)
* | Merged latest changes from parent.Gravatar Benoit Steiner2014-03-27
|\ \
* | | Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Implemented the AVX version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel.
| * | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵Gravatar Gael Guennebaud2014-03-27
|/ / | | | | | | the other fmadd variants plus some register moves...)
* | Fixed compilation error when FMA instructions are enabled.Gravatar Benoit Steiner2014-03-27
| |
* | Silenced "unused variable" warnings when compiling with FMA.Gravatar Benoit Steiner2014-03-27
| |
* | Vectorized the packing of a col-major matrix used as the right hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* | Vectorized the packing of a row-major matrix used as the left hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product.
* | Implemented the AVX version of the ptranspose packet primitive.Gravatar Benoit Steiner2014-03-27
| |
* | Implement pcplflip, palign, predux and the likes from AVC/complexesGravatar Gael Guennebaud2014-03-27
| |
| * Fix warningGravatar Gael Guennebaud2014-03-27
| |
* | Created the ptranspose packet primitive that can transpose an array of N ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
* | Made sure that the version of gemm_pack_rhs specialized for row major ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
* | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
* | Merged latest updates from the parent branchGravatar Benoit Steiner2014-03-26
|\ \
| | * Update gebp kernel to process a panle of 4 columns at once for the remaining ↵Gravatar Gael Guennebaud2014-03-26
| | | | | | | | | | | | ones.
| | * Remove remaining bits of the dead working bufferGravatar Gael Guennebaud2014-03-26
| |/
* | Vectorized the multiplication and division of complex numbers using AVX ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | instructions.
* | Used AVX instructions to vectorize the complex version of the pfirst and ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings.
| * Implement new 1 packet x 8 gebp kernelGravatar Gael Guennebaud2014-03-26
| |
| * add pbroadcast2/4 generic intrinsicsGravatar Gael Guennebaud2014-03-26
| |
* | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵Gravatar Benoit Steiner2014-03-25
| | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf>
* | Used AVX instructions to vectorize the predux_min<Packet8f>, ↵Gravatar Benoit Steiner2014-03-24
| | | | | | | | predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
* | Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to ↵Gravatar Benoit Steiner2014-03-21
| | | | | | | | true to prevent build failures when vectorization is disabled.
* | Merged latest changes from the parentGravatar Benoit Steiner2014-03-18
|\ \
* \ \ Merged latest changes from the main trunkGravatar Benoit Steiner2014-02-24
|\ \ \
* \ \ \ Pulled latest changes from the Eigen main trunkGravatar Benoit Steiner2014-02-24
|\ \ \ \
| | * | | Merged eigen/eigen into defaultGravatar Benoit Steiner2014-02-24
| |/| | |
* | | | | Added support for FMA instructionsGravatar Benoit Steiner2014-02-24
| | | | |
| * | | | Merged the latest version of the code from eigen/eigenGravatar Benoit Steiner2014-02-18
|/| | | |
* | | | | Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a ↵Gravatar Benoit Steiner2014-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.
* | | | | Added support for AVX to Eigen.Gravatar Benoit Steiner2014-01-29
| | | | |
| | | | * Improved the efficiency if the block-panel matrix multiplication code: the ↵Gravatar Benoit Steiner2014-01-02
| | | | | | | | | | | | | | | | | | | | change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize.
| | | | * Fix bug #222. Make temporary matrix column-major independently of ↵Gravatar Christoph Hertzberg2014-03-26
| | | | | | | | | | | | | | | | | | | | EIGEN_DEFAULT_TO_ROW_MAJOR