Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | improve block-size heuristic | 2010-07-20 | |
| | |||
* | fix openmp version | 2010-07-20 | |
| | |||
* | fix declaration of pack_lhs in trsm | 2010-07-20 | |
| | |||
* | uncomment commented code for debug | 2010-07-20 | |
| | |||
* | report a true assert when not checking for an assertion | 2010-07-20 | |
| | |||
* | it appears only the "on the left" case was tested | 2010-07-20 | |
| | |||
* | fix trmm and symm wrt lhs packing | 2010-07-20 | |
| | |||
* | fix compilation by including file in correct order | 2010-07-19 | |
| | |||
* | * fix SelfCwiseBinaryOp traits and handling of mixed types | 2010-07-19 | |
| | | | | * improve compilation error in case of type mismatch | ||
* | explicitely disable vectorization for mixed coeff based products | 2010-07-19 | |
| | |||
* | fix lhs packing in the case of real * complex products | 2010-07-19 | |
| | |||
* | port Jacobi to new ei_pset1/ei_pload API | 2010-07-19 | |
| | |||
* | * fix compilation of mixed scalar product | 2010-07-19 | |
| | | | | * optimize mixed scalar products | ||
* | * fix a couple of remaining issues with previous commit, | 2010-07-19 | |
| | | | | * merge ei_product_blocking_traits into ei_gepb_traits | ||
* | * _mm_loaddup_pd is slow | 2010-07-19 | |
| | | | | * optimize SSE ei_ploaddup<Packet4f> | ||
* | wip: extend the gebp kernel to optimize complex and mixed products | 2010-07-19 | |
| | |||
* | update mixing type test | 2010-07-15 | |
| | |||
* | update unit test for new API | 2010-07-15 | |
| | |||
* | add support for mixing type in trsv | 2010-07-13 | |
| | |||
* | optimize non fused MADD, and add a flatten attribute macro to enforce | 2010-07-13 | |
| | | | | inlining within a function | ||
* | matrix product: move the alpha factor to gebp instead of the packing, | 2010-07-12 | |
| | | | | clean some temporaries, etc. | ||
* | mixing types step 3: | 2010-07-11 | |
| | | | | | - improve support of colmajor by vector and matrix - matrix - now all configurations are well handled, but the perf are not always very good | ||
* | make colmaj * vector uses pointers only | 2010-07-11 | |
| | |||
* | mixing types in product step 2: | 2010-07-11 | |
| | | | | | | | | * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work... | ||
* | sync | 2010-07-10 | |
|\ | |||
| * | * generalize rowmajor by vector | 2010-07-10 | |
| | | | | | | | | * fix weird compilation error when constructing a matrix with a row by matrix product | ||
| * | fix compilation: make the check_coordinates* functions const | 2010-07-10 | |
| | | |||
| * | let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but ↵ | 2010-07-09 | |
| | | | | | | | | also not a speed regression, and replaces 3 instructions by 1 single instruction. | ||
| * | Added NEON/Complex.h, ~3.5x faster than scalar std::complex<float> | 2010-07-10 | |
| | | | | | | | | minor fix in AltiVec Complex.h | ||
| * | disable MSVC optimization when the underlying compiler is ICC | 2010-07-09 | |
| | | |||
| * | move ei_conj_if to a more appropriate file | 2010-07-09 | |
| | | |||
| * | forgot to commit ei_p4f_FORWARD; | 2010-07-09 | |
| | | |||
| * | forgot to add the Complex.h include for AltiVec. | 2010-07-09 | |
| | | |||
| * | Altivec port of Complex.h. | 2010-07-09 | |
| | | | | | | | | | | | | | | | | Note: For some reason g++ 4.4 is >200% slower than g++ 4.3 on altivec code. The same benchmark (bench_gemm) was tested, on the same hardware/OS (G4/Debian testing), with same CFLAGS. With some code reorganizing I managed to get some minor gain on 4.4, but I just could not reach 4.3 speed. This is most likely a bug, but I'm waiting to see if it's fixed on 4.5. I'll look into this a bit more. | ||
| * | Be consistent in how the tutorial pages link together. | 2010-07-09 | |
| | | |||
| * | Small changes to tutorial page 2 (matrix arithmetic): | 2010-07-09 | |
| | | | | | | | | | | | | * slightly more extensive discussion of aliasing * layout: put example code and output side-by-side * add some links, etc | ||
* | | fix a few weird issues with gcc 4.3 32bits and complex<float> | 2010-07-09 | |
| | | |||
| * | bench: use of Eigen/Array is deprecated + fix includes for iostream | 2010-07-09 | |
| | | |||
* | | fix SliceVectorizedTraversal for packetsize==1 | 2010-07-08 | |
| | | |||
* | | extend vectorization_logic | 2010-07-08 | |
| | | |||
| * | Added more redux types/examples in tutorial and fixed some display issues | 2010-07-08 | |
| | | |||
| * | Reductions/Broadcasting/Visitor Tutorial added to index | 2010-07-08 | |
| | | |||
| * | Reductions/Broadcasting/Visitor Tutorial added | 2010-07-08 | |
| | | |||
* | | scalars fitting in a single packet requires more work, step 1 | 2010-07-08 | |
| | | | | | | | | | | * add a, Alignable trait * update LinearVectorization assignment | ||
* | | compilation fix | 2010-07-08 | |
| | | |||
| * | enabling aligned loads/store for complex<double> is much more tricky, | 2010-07-07 | |
| | | | | | | | | so the temporary fix is to always perform unaligned load/store | ||
* | | an attempt to fix wrong unaligned store | 2010-07-07 | |
| | | |||
* | | update to support mixin types | 2010-07-07 | |
| | | |||
* | | support for real * complex matrix product - step 1 (works for some special ↵ | 2010-07-07 | |
| | | | | | | | | cases) | ||
| * | mention that array = matrix is fine too | 2010-07-07 | |
|/ |