aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc.Gravatar Gael Guennebaud2014-03-31
|
* Finally, prefetching seems to help getting more stable performanceGravatar Gael Guennebaud2014-03-31
|
* Enable repetition in mixing type unit testGravatar Gael Guennebaud2014-03-31
|
* Workaround alignment warningsGravatar Gael Guennebaud2014-03-30
|
* Optimize gebp kernel:Gravatar Gael Guennebaud2014-03-30
| | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
* Vectorized the loop peeling of the inner loop of the block-panel matrix ↵Gravatar Benoit Steiner2014-03-28
| | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
* Properly align the input data to prevent false failures of the ↵Gravatar Benoit Steiner2014-03-28
| | | | packetmath.cpp test.
* Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
|
* merge with default branchGravatar Gael Guennebaud2014-03-28
|\
* | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵Gravatar Gael Guennebaud2014-03-28
| | | | | | | | better than no vectorization)
* | Merged latest changes from parent.Gravatar Benoit Steiner2014-03-27
|\ \
* | | Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Implemented the AVX version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel.
| * | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵Gravatar Gael Guennebaud2014-03-27
|/ / | | | | | | the other fmadd variants plus some register moves...)
* | Fixed compilation error when FMA instructions are enabled.Gravatar Benoit Steiner2014-03-27
| |
* | Silenced "unused variable" warnings when compiling with FMA.Gravatar Benoit Steiner2014-03-27
| |
* | Vectorized the packing of a col-major matrix used as the right hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* | Vectorized the packing of a row-major matrix used as the left hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product.
* | Implemented the AVX version of the ptranspose packet primitive.Gravatar Benoit Steiner2014-03-27
| |
* | Change abi version when enabling AVX with GCCGravatar Gael Guennebaud2014-03-27
| |
* | Fix geo_* unit tests with respect to AVXGravatar Gael Guennebaud2014-03-27
| |
* | Implement pcplflip, palign, predux and the likes from AVC/complexesGravatar Gael Guennebaud2014-03-27
| |
| * Fix warningGravatar Gael Guennebaud2014-03-27
| |
| * Merged in infinitei/eigen (pull request PR-50)Gravatar Jitse Niesen2014-03-27
| |\ | | | | | | | | | Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls
| * | immintrin.h did not come until intel version 11Gravatar Mark Borgerding2014-03-26
| | |
* | | Created the ptranspose packet primitive that can transpose an array of N ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
| | * Fixed compilation error due to obsolete internal::abs and internal::sqrt ↵Gravatar Abhijit Kundu2014-03-26
| |/ | | | | | | function calls
* | Made sure that the version of gemm_pack_rhs specialized for row major ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
* | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
* | Merged latest updates from the parent branchGravatar Benoit Steiner2014-03-26
|\ \
| | * Update gebp kernel to process a panle of 4 columns at once for the remaining ↵Gravatar Gael Guennebaud2014-03-26
| | | | | | | | | | | | ones.
| | * Remove remaining bits of the dead working bufferGravatar Gael Guennebaud2014-03-26
| |/
* | Vectorized the multiplication and division of complex numbers using AVX ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | instructions.
* | Used AVX instructions to vectorize the complex version of the pfirst and ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings.
| * Implement new 1 packet x 8 gebp kernelGravatar Gael Guennebaud2014-03-26
| |
| * add pbroadcast2/4 generic intrinsicsGravatar Gael Guennebaud2014-03-26
| |
* | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵Gravatar Benoit Steiner2014-03-25
| | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf>
* | Used AVX instructions to vectorize the predux_min<Packet8f>, ↵Gravatar Benoit Steiner2014-03-24
| | | | | | | | predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
* | Added proper support for AVX and FMA in the makefiles.Gravatar Benoit Steiner2014-03-24
| |
* | Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to ↵Gravatar Benoit Steiner2014-03-21
| | | | | | | | true to prevent build failures when vectorization is disabled.
* | Merged latest changes from the parentGravatar Benoit Steiner2014-03-18
|\ \
* \ \ Merged latest changes from the main trunkGravatar Benoit Steiner2014-02-24
|\ \ \
* \ \ \ Pulled latest changes from the Eigen main trunkGravatar Benoit Steiner2014-02-24
|\ \ \ \
| | * | | Merged eigen/eigen into defaultGravatar Benoit Steiner2014-02-24
| |/| | |
* | | | | Added support for FMA instructionsGravatar Benoit Steiner2014-02-24
| | | | |
| * | | | Merged the latest version of the code from eigen/eigenGravatar Benoit Steiner2014-02-18
|/| | | |
* | | | | Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a ↵Gravatar Benoit Steiner2014-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.
* | | | | Added support for AVX to Eigen.Gravatar Benoit Steiner2014-01-29
| | | | |
| | | | * Improved the efficiency if the block-panel matrix multiplication code: the ↵Gravatar Benoit Steiner2014-01-02
| | | | | | | | | | | | | | | | | | | | change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize.