Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix 128bit packet size assumptions in unit tests. | 2014-04-18 | |
| | |||
* | Fix alignment assertion. | 2014-04-18 | |
| | |||
* | Fix calls to lazy products (lazy product does not like matrices with 0 length) | 2014-04-18 | |
| | |||
* | Smarter block size computation | 2014-04-18 | |
| | |||
* | Fix typo (was working with clang\!) | 2014-04-18 | |
| | |||
* | Fixes for fixed sizes and non vectorizable types | 2014-04-17 | |
| | |||
* | merge | 2014-04-17 | |
|\ | |||
| * | Implemented the pgather/pscatter packet primitives for the arm/NEON architecture | 2014-04-17 | |
| | | |||
* | | Make our gemm bench a little more powerful. | 2014-04-17 | |
| | | |||
* | | Various minor fixes in BTL | 2014-04-17 | |
| | | |||
* | | Optimize AVX pset1 for complexes and ploaddup | 2014-04-17 | |
| | | |||
* | | Reduce block sizes in unit tests. | 2014-04-17 | |
| | | |||
* | | add unit tests for ploadquad and predux4, and split packetmath unit test wrt ↵ | 2014-04-17 | |
| | | | | | | | | real/complex | ||
* | | Extend mixingtype unit test to check transposed cases. | 2014-04-17 | |
| | | |||
* | | Fix and optimize mixed products | 2014-04-17 | |
| | | |||
* | | Optimize ploaddup for AVX | 2014-04-17 | |
|/ | |||
* | Fallback to lazy products for very small ones. | 2014-04-16 | |
| | |||
* | Enable alloca on MAC OSX | 2014-04-16 | |
| | |||
* | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | 2014-04-16 | |
| | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | ||
* | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | 2014-04-14 | |
| | | | | a recent version of gcc (ie gcc 4.8). | ||
* | Updated the geo_parametrizedline_2 test for AVX. | 2014-04-04 | |
| | |||
* | Deleted some dead code. | 2014-04-04 | |
| | |||
* | Pulled the latest updates from the eigen trunk. | 2014-04-01 | |
|\ | |||
| * | Make some actual verifications inside the autodiff unit test | 2014-04-01 | |
| | | |||
| * | Fixed typo: symmretric -> symmetric | 2014-04-01 | |
| | | |||
| * | Fix lapack build | 2014-04-01 | |
| | | |||
| * | bug #775: propagate generator when workingaround cmake bug #9220 | 2014-04-01 | |
| | | |||
| * | Fix bug #776: it seems that mingw does not support weak linking | 2014-04-01 | |
| | | |||
| * | Rename the vector() factories defined in blas/common.h into make_vector() to ↵ | 2014-04-01 | |
| | | | | | | | | prevent a possible name conflict with std::vector. | ||
| * | Fix no newline at end of file warning | 2014-04-01 | |
| | | |||
* | | BTL: add blaze | 2014-03-31 | |
| | | |||
* | | BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc. | 2014-03-31 | |
| | | |||
* | | Finally, prefetching seems to help getting more stable performance | 2014-03-31 | |
| | | |||
* | | Enable repetition in mixing type unit test | 2014-03-31 | |
| | | |||
* | | Workaround alignment warnings | 2014-03-30 | |
| | | |||
* | | Optimize gebp kernel: | 2014-03-30 | |
| | | | | | | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs | ||
* | | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵ | 2014-03-28 | |
| | | | | | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size. | ||
* | | Properly align the input data to prevent false failures of the ↵ | 2014-03-28 | |
| | | | | | | | | packetmath.cpp test. | ||
* | | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | |
| | | |||
* | | merge with default branch | 2014-03-28 | |
|\| | |||
* | | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵ | 2014-03-28 | |
| | | | | | | | | better than no vectorization) | ||
* | | Merged latest changes from parent. | 2014-03-27 | |
|\ \ | |||
* | | | Implemented the SSE version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | | |||
* | | | Implemented the AVX version of the gather and scatter packet primitives. | 2014-03-27 | |
| | | | |||
* | | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵ | 2014-03-27 | |
| | | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel. | ||
| * | | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵ | 2014-03-27 | |
|/ / | | | | | | | the other fmadd variants plus some register moves...) | ||
* | | Fixed compilation error when FMA instructions are enabled. | 2014-03-27 | |
| | | |||
* | | Silenced "unused variable" warnings when compiling with FMA. | 2014-03-27 | |
| | | |||
* | | Vectorized the packing of a col-major matrix used as the right hand side ↵ | 2014-03-27 | |
| | | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance. | ||
* | | Vectorized the packing of a row-major matrix used as the left hand side ↵ | 2014-03-27 | |
| | | | | | | | | argument in a matrix-matrix product. |