aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Fix 128bit packet size assumptions in unit tests.Gravatar Gael Guennebaud2014-04-18
|
* Fix alignment assertion.Gravatar Gael Guennebaud2014-04-18
|
* Fix calls to lazy products (lazy product does not like matrices with 0 length)Gravatar Gael Guennebaud2014-04-18
|
* Smarter block size computationGravatar Gael Guennebaud2014-04-18
|
* Fix typo (was working with clang\!)Gravatar Gael Guennebaud2014-04-18
|
* Fixes for fixed sizes and non vectorizable typesGravatar Gael Guennebaud2014-04-17
|
* mergeGravatar Gael Guennebaud2014-04-17
|\
| * Implemented the pgather/pscatter packet primitives for the arm/NEON architectureGravatar Benoit Steiner2014-04-17
| |
* | Make our gemm bench a little more powerful.Gravatar Gael Guennebaud2014-04-17
| |
* | Various minor fixes in BTLGravatar Gael Guennebaud2014-04-17
| |
* | Optimize AVX pset1 for complexes and ploaddupGravatar Gael Guennebaud2014-04-17
| |
* | Reduce block sizes in unit tests.Gravatar Gael Guennebaud2014-04-17
| |
* | add unit tests for ploadquad and predux4, and split packetmath unit test wrt ↵Gravatar Gael Guennebaud2014-04-17
| | | | | | | | real/complex
* | Extend mixingtype unit test to check transposed cases.Gravatar Gael Guennebaud2014-04-17
| |
* | Fix and optimize mixed productsGravatar Gael Guennebaud2014-04-17
| |
* | Optimize ploaddup for AVXGravatar Gael Guennebaud2014-04-17
|/
* Fallback to lazy products for very small ones.Gravatar Gael Guennebaud2014-04-16
|
* Enable alloca on MAC OSXGravatar Gael Guennebaud2014-04-16
|
* New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵Gravatar Gael Guennebaud2014-04-16
| | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
* Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵Gravatar Benoit Steiner2014-04-14
| | | | a recent version of gcc (ie gcc 4.8).
* Updated the geo_parametrizedline_2 test for AVX.Gravatar Benoit Steiner2014-04-04
|
* Deleted some dead code.Gravatar Benoit Steiner2014-04-04
|
* Pulled the latest updates from the eigen trunk.Gravatar Benoit Steiner2014-04-01
|\
| * Make some actual verifications inside the autodiff unit testGravatar Christoph Hertzberg2014-04-01
| |
| * Fixed typo: symmretric -> symmetricGravatar Florian George2014-04-01
| |
| * Fix lapack buildGravatar Gael Guennebaud2014-04-01
| |
| * bug #775: propagate generator when workingaround cmake bug #9220Gravatar Gael Guennebaud2014-04-01
| |
| * Fix bug #776: it seems that mingw does not support weak linkingGravatar Gael Guennebaud2014-04-01
| |
| * Rename the vector() factories defined in blas/common.h into make_vector() to ↵Gravatar Benoit Steiner2014-04-01
| | | | | | | | prevent a possible name conflict with std::vector.
| * Fix no newline at end of file warningGravatar Gael Guennebaud2014-04-01
| |
* | BTL: add blazeGravatar Gael Guennebaud2014-03-31
| |
* | BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc.Gravatar Gael Guennebaud2014-03-31
| |
* | Finally, prefetching seems to help getting more stable performanceGravatar Gael Guennebaud2014-03-31
| |
* | Enable repetition in mixing type unit testGravatar Gael Guennebaud2014-03-31
| |
* | Workaround alignment warningsGravatar Gael Guennebaud2014-03-30
| |
* | Optimize gebp kernel:Gravatar Gael Guennebaud2014-03-30
| | | | | | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
* | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵Gravatar Benoit Steiner2014-03-28
| | | | | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
* | Properly align the input data to prevent false failures of the ↵Gravatar Benoit Steiner2014-03-28
| | | | | | | | packetmath.cpp test.
* | Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
| |
* | merge with default branchGravatar Gael Guennebaud2014-03-28
|\|
* | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵Gravatar Gael Guennebaud2014-03-28
| | | | | | | | better than no vectorization)
* | Merged latest changes from parent.Gravatar Benoit Steiner2014-03-27
|\ \
* | | Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Implemented the AVX version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
| | |
* | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel.
| * | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵Gravatar Gael Guennebaud2014-03-27
|/ / | | | | | | the other fmadd variants plus some register moves...)
* | Fixed compilation error when FMA instructions are enabled.Gravatar Benoit Steiner2014-03-27
| |
* | Silenced "unused variable" warnings when compiling with FMA.Gravatar Benoit Steiner2014-03-27
| |
* | Vectorized the packing of a col-major matrix used as the right hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* | Vectorized the packing of a row-major matrix used as the left hand side ↵Gravatar Benoit Steiner2014-03-27
| | | | | | | | argument in a matrix-matrix product.