Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | Benoit Steiner | 2014-04-14 |
| | | | | a recent version of gcc (ie gcc 4.8). | ||
* | Updated the geo_parametrizedline_2 test for AVX. | Benoit Steiner | 2014-04-04 |
| | |||
* | Deleted some dead code. | Benoit Steiner | 2014-04-04 |
| | |||
* | Pulled the latest updates from the eigen trunk. | Benoit Steiner | 2014-04-01 |
|\ | |||
| * | Make some actual verifications inside the autodiff unit test | Christoph Hertzberg | 2014-04-01 |
| | | |||
| * | Fixed typo: symmretric -> symmetric | Florian George | 2014-04-01 |
| | | |||
| * | Fix lapack build | Gael Guennebaud | 2014-04-01 |
| | | |||
| * | bug #775: propagate generator when workingaround cmake bug #9220 | Gael Guennebaud | 2014-04-01 |
| | | |||
| * | Fix bug #776: it seems that mingw does not support weak linking | Gael Guennebaud | 2014-04-01 |
| | | |||
| * | Rename the vector() factories defined in blas/common.h into make_vector() to ↵ | Benoit Steiner | 2014-04-01 |
| | | | | | | | | prevent a possible name conflict with std::vector. | ||
| * | Fix no newline at end of file warning | Gael Guennebaud | 2014-04-01 |
| | | |||
* | | BTL: add blaze | Gael Guennebaud | 2014-03-31 |
| | | |||
* | | BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc. | Gael Guennebaud | 2014-03-31 |
| | | |||
* | | Finally, prefetching seems to help getting more stable performance | Gael Guennebaud | 2014-03-31 |
| | | |||
* | | Enable repetition in mixing type unit test | Gael Guennebaud | 2014-03-31 |
| | | |||
* | | Workaround alignment warnings | Gael Guennebaud | 2014-03-30 |
| | | |||
* | | Optimize gebp kernel: | Gael Guennebaud | 2014-03-30 |
| | | | | | | | | | | 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs | ||
* | | Vectorized the loop peeling of the inner loop of the block-panel matrix ↵ | Benoit Steiner | 2014-03-28 |
| | | | | | | | | multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size. | ||
* | | Properly align the input data to prevent false failures of the ↵ | Benoit Steiner | 2014-03-28 |
| | | | | | | | | packetmath.cpp test. | ||
* | | Add a mechanism to recursively access to half-size packet types | Gael Guennebaud | 2014-03-28 |
| | | |||
* | | merge with default branch | Gael Guennebaud | 2014-03-28 |
|\| | |||
* | | Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵ | Gael Guennebaud | 2014-03-28 |
| | | | | | | | | better than no vectorization) | ||
* | | Merged latest changes from parent. | Benoit Steiner | 2014-03-27 |
|\ \ | |||
* | | | Implemented the SSE version of the gather and scatter packet primitives. | Benoit Steiner | 2014-03-27 |
| | | | |||
* | | | Implemented the AVX version of the gather and scatter packet primitives. | Benoit Steiner | 2014-03-27 |
| | | | |||
* | | | Introduced pscatter/pgather packet primitives. They will be used to optimize ↵ | Benoit Steiner | 2014-03-27 |
| | | | | | | | | | | | | the loop peeling code of the block-panel matrix multiplication kernel. | ||
| * | | enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵ | Gael Guennebaud | 2014-03-27 |
|/ / | | | | | | | the other fmadd variants plus some register moves...) | ||
* | | Fixed compilation error when FMA instructions are enabled. | Benoit Steiner | 2014-03-27 |
| | | |||
* | | Silenced "unused variable" warnings when compiling with FMA. | Benoit Steiner | 2014-03-27 |
| | | |||
* | | Vectorized the packing of a col-major matrix used as the right hand side ↵ | Benoit Steiner | 2014-03-27 |
| | | | | | | | | argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance. | ||
* | | Vectorized the packing of a row-major matrix used as the left hand side ↵ | Benoit Steiner | 2014-03-27 |
| | | | | | | | | argument in a matrix-matrix product. | ||
* | | Implemented the AVX version of the ptranspose packet primitive. | Benoit Steiner | 2014-03-27 |
| | | |||
* | | Change abi version when enabling AVX with GCC | Gael Guennebaud | 2014-03-27 |
| | | |||
* | | Fix geo_* unit tests with respect to AVX | Gael Guennebaud | 2014-03-27 |
| | | |||
* | | Implement pcplflip, palign, predux and the likes from AVC/complexes | Gael Guennebaud | 2014-03-27 |
| | | |||
| * | Fix warning | Gael Guennebaud | 2014-03-27 |
| | | |||
| * | Merged in infinitei/eigen (pull request PR-50) | Jitse Niesen | 2014-03-27 |
| |\ | | | | | | | | | | Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls | ||
| * | | immintrin.h did not come until intel version 11 | Mark Borgerding | 2014-03-26 |
| | | | |||
* | | | Created the ptranspose packet primitive that can transpose an array of N ↵ | Benoit Steiner | 2014-03-26 |
| | | | | | | | | | | | | | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions. | ||
| | * | Fixed compilation error due to obsolete internal::abs and internal::sqrt ↵ | Abhijit Kundu | 2014-03-26 |
| |/ | | | | | | | function calls | ||
* | | Made sure that the version of gemm_pack_rhs specialized for row major ↵ | Benoit Steiner | 2014-03-26 |
| | | | | | | | | matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode). | ||
* | | Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵ | Benoit Steiner | 2014-03-26 |
| | | | | | | | | to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible. | ||
* | | Merged latest updates from the parent branch | Benoit Steiner | 2014-03-26 |
|\ \ | |||
| | * | Update gebp kernel to process a panle of 4 columns at once for the remaining ↵ | Gael Guennebaud | 2014-03-26 |
| | | | | | | | | | | | | ones. | ||
| | * | Remove remaining bits of the dead working buffer | Gael Guennebaud | 2014-03-26 |
| |/ | |||
* | | Vectorized the multiplication and division of complex numbers using AVX ↵ | Benoit Steiner | 2014-03-26 |
| | | | | | | | | instructions. | ||
* | | Used AVX instructions to vectorize the complex version of the pfirst and ↵ | Benoit Steiner | 2014-03-26 |
| | | | | | | | | | | | | ploaddup packet primitives. Silenced a few compilation warnings. | ||
| * | Implement new 1 packet x 8 gebp kernel | Gael Guennebaud | 2014-03-26 |
| | | |||
| * | add pbroadcast2/4 generic intrinsics | Gael Guennebaud | 2014-03-26 |
| | | |||
* | | Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵ | Benoit Steiner | 2014-03-25 |
| | | | | | | | | preverse<Packet2cd>, and preverse<Packet4cf> |