aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵Gravatar Benoit Steiner2014-04-14
| | | | a recent version of gcc (ie gcc 4.8).
* Updated the geo_parametrizedline_2 test for AVX.Gravatar Benoit Steiner2014-04-04
|
* Deleted some dead code.Gravatar Benoit Steiner2014-04-04
|
* Pulled the latest updates from the eigen trunk.Gravatar Benoit Steiner2014-04-01
|\
| * Make some actual verifications inside the autodiff unit testGravatar Christoph Hertzberg2014-04-01
| |
| * Fixed typo: symmretric -> symmetricGravatar Florian George2014-04-01
| |
| * Fix lapack buildGravatar Gael Guennebaud2014-04-01
| |
| * bug #775: propagate generator when workingaround cmake bug #9220Gravatar Gael Guennebaud2014-04-01
| |
| * Fix bug #776: it seems that mingw does not support weak linkingGravatar Gael Guennebaud2014-04-01
| |
| * Rename the vector() factories defined in blas/common.h into make_vector() to ↵Gravatar Benoit Steiner2014-04-01
| | | | | | | | prevent a possible name conflict with std::vector.
| * Fix no newline at end of file warningGravatar Gael Guennebaud2014-04-01
| |
* | BTL: add blazeGravatar Gael Guennebaud2014-03-31
| |
* | BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc.Gravatar Gael Guennebaud2014-03-31
| |
* | Finally, prefetching seems to help getting more stable performanceGravatar Gael Guennebaud2014-03-31
| |
* | Enable repetition in mixing type unit testGravatar Gael Guennebaud2014-03-31
| |
* | Workaround alignment warningsGravatar Gael Guennebaud2014-03-30
| |
* | Optimize gebp kernel:Gravatar Gael Guennebaud2014-03-30
| | | | | | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
* | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵Gravatar Benoit Steiner2014-03-28
| | | | | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
* | Properly align the input data to prevent false failures of the ↵Gravatar Benoit Steiner2014-03-28
| | | | | | | | packetmath.cpp test.
* | Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
| |
* | merge with default branchGravatar Gael Guennebaud2014-03-28
|\|
* | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵Gravatar Gael Guennebaud2014-03-28
| | | | | | | | better than no vectorization)
* | Merged latest changes from parent.Gravatar Benoit Steiner2014-03-27
|\ \
* | | Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Implemented the AVX version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel.
| * | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵Gravatar Gael Guennebaud2014-03-27
|/ / | | | | | | the other fmadd variants plus some register moves...)
* | Fixed compilation error when FMA instructions are enabled.Gravatar Benoit Steiner2014-03-27
| |
* | Silenced "unused variable" warnings when compiling with FMA.Gravatar Benoit Steiner2014-03-27
| |
* | Vectorized the packing of a col-major matrix used as the right hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* | Vectorized the packing of a row-major matrix used as the left hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product.
* | Implemented the AVX version of the ptranspose packet primitive.Gravatar Benoit Steiner2014-03-27
| |
* | Change abi version when enabling AVX with GCCGravatar Gael Guennebaud2014-03-27
| |
* | Fix geo_* unit tests with respect to AVXGravatar Gael Guennebaud2014-03-27
| |
* | Implement pcplflip, palign, predux and the likes from AVC/complexesGravatar Gael Guennebaud2014-03-27
| |
| * Fix warningGravatar Gael Guennebaud2014-03-27
| |
| * Merged in infinitei/eigen (pull request PR-50)Gravatar Jitse Niesen2014-03-27
| |\ | | | | | | | | | Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls
| * | immintrin.h did not come until intel version 11Gravatar Mark Borgerding2014-03-26
| | |
* | | Created the ptranspose packet primitive that can transpose an array of N ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
| | * Fixed compilation error due to obsolete internal::abs and internal::sqrt ↵Gravatar Abhijit Kundu2014-03-26
| |/ | | | | | | function calls
* | Made sure that the version of gemm_pack_rhs specialized for row major ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
* | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
* | Merged latest updates from the parent branchGravatar Benoit Steiner2014-03-26
|\ \
| | * Update gebp kernel to process a panle of 4 columns at once for the remaining ↵Gravatar Gael Guennebaud2014-03-26
| | | | | | | | | | | | ones.
| | * Remove remaining bits of the dead working bufferGravatar Gael Guennebaud2014-03-26
| |/
* | Vectorized the multiplication and division of complex numbers using AVX ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | instructions.
* | Used AVX instructions to vectorize the complex version of the pfirst and ↵Gravatar Benoit Steiner2014-03-26
| | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings.
| * Implement new 1 packet x 8 gebp kernelGravatar Gael Guennebaud2014-03-26
| |
| * add pbroadcast2/4 generic intrinsicsGravatar Gael Guennebaud2014-03-26
| |
* | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵Gravatar Benoit Steiner2014-03-25
| | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf>