aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/SSE/PacketMath.h
Commit message (Collapse)AuthorAge
* * fix compilation of mixed scalar productGravatar Gael Guennebaud2010-07-19
| | | | * optimize mixed scalar products
* * _mm_loaddup_pd is slowGravatar Gael Guennebaud2010-07-19
| | | | * optimize SSE ei_ploaddup<Packet4f>
* wip: extend the gebp kernel to optimize complex and mixed productsGravatar Gael Guennebaud2010-07-19
|
* mixing types in product step 2:Gravatar Gael Guennebaud2010-07-11
| | | | | | | | * pload* and pset1 are now templated on the packet type * gemv routines are now embeded into a structure with a consistent API with respect to gemm * some configurations of vector * matrix and matrix * matrix works fine, some need more work...
* syncGravatar Gael Guennebaud2010-07-10
|\
| * let ei_pset1 use _mm_loaddup_pd. Not a significant speed improvement, but ↵Gravatar Benoit Jacob2010-07-09
| | | | | | | | also not a speed regression, and replaces 3 instructions by 1 single instruction.
| * disable MSVC optimization when the underlying compiler is ICCGravatar Gael Guennebaud2010-07-09
| |
* | scalars fitting in a single packet requires more work, step 1Gravatar Gael Guennebaud2010-07-08
|/ | | | | * add a, Alignable trait * update LinearVectorization assignment
* optimize pmul for complex<double>Gravatar Gael Guennebaud2010-07-07
|
* s/IsVectorized/VectorizableGravatar Gael Guennebaud2010-07-07
|
* * add a IsVectorized mechanism (instead of packet-size>1...)Gravatar Gael Guennebaud2010-07-06
| | | | * vectorize complex<double>
* add support for vectorized conjugated productsGravatar Gael Guennebaud2010-07-06
|
* * extend the Has* packet traits and makes all functor use itGravatar Gael Guennebaud2010-07-05
| | | | * extend the packing routines to support conjugation
* fix very annoying warning (gcc 4.3): type qualifiers ignored on function ↵Gravatar Gael Guennebaud2010-06-25
| | | | return type
* email changeGravatar Gael Guennebaud2010-06-24
|
* (proper commit this time)Gravatar Konstantinos Margaritis2010-04-24
| | | | | | | replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
* Backed out changeset 6972c140f737874d88da0e225c7c27b4563a4518Gravatar Konstantinos Margaritis2010-04-24
|
* replaced _mm_prefetch in GeneralBlockPanelKernel.h, with ei_prefetch() ↵Gravatar oem2010-04-24
| | | | | | | | inline function. Implemented NEON and AltiVec versions, copied SSE version over from GeneralBlockPanelKernel.h. Also in GCC case (or rather !_MSC_VER) it's implemented using __builtin_prefetch(). NEON managed to give a small but welcome boost, 0.88GFLOPS -> 0.91GFLOPS.
* Reintroduced the if-clause for MSVC ei_ploadu via _loadu_.Gravatar Hauke Heibel2010-03-07
|
* merge with default branchGravatar Gael Guennebaud2010-03-04
|\
| * Provide "eigen" defines to decide which instruction set is usedGravatar Thomas Capricelli2010-02-24
| | | | | | | | | | | | (sse3, ssse3 and sse4), independantly from the compiler. Only those defines should be used in other places, and the user can rely on those to know which sets are used.
* | significant speedup in the matrix-matrix productsGravatar Gael Guennebaud2010-02-23
|/
* Added an ei_linspaced_op to create linearly spaced vectors.Gravatar Hauke Heibel2010-01-26
| | | | | | | | | Added setLinSpaced/LinSpaced functionality to DenseBase. Improved vectorized assignment - overcomes MSVC optimization issues. CwiseNullaryOp is now requiring functors to offer 1D and 2D operators. Adapted existing functors to the new CwiseNullaryOp requirements. Added ei_plset to create packages as [a, a+1, ..., a+size]. Added more nullaray unit tests.
* Fixed conservativeResize.Gravatar Hauke Heibel2010-01-11
| | | | | Fixed multiple overloads for operator=. Removed debug output.
* merge with default branchGravatar Gael Guennebaud2009-12-22
|\
| * * fix aliasing checks when the lhs is also transposed. At the same time,Gravatar Gael Guennebaud2009-12-16
| | | | | | | | | | | | significantly simplify the code of these checks while extending them to catch much more expressions! * move the enabling/disabling of vectorized sin/cos to the architecture traits
| * add SSE4 support, start with integer multiplicationGravatar Benoit Jacob2009-11-24
| |
* | Hey, finally the copyCoeff stuff is not only used to implement swap anymore :)Gravatar Gael Guennebaud2009-11-20
|/ | | | | | Add an internal pseudo expression allowing to optimize operators like +=, *= using the copyCoeff stuff. This allows to easily enforce aligned load for the destination matrix everywhere.
* * mergeGravatar Benoit Jacob2009-11-09
|\ | | | | | | * remove a ctor in QuaternionBase as it gives a strange error with GCC 4.4.2.
| * Let's try to stick to the original code, thus activate the fix of #62 only ↵Gravatar Hauke Heibel2009-11-04
| | | | | | | | for 64 bit builds.
| * Direct access of the packet structs fixes bug #62 and doe not seem toGravatar Hauke Heibel2009-11-04
|/ | | | influence compiler optimization.
* we were already aligning to 16 byte boundary fixed-size objects that are ↵Gravatar Benoit Jacob2009-10-05
| | | | | | | | multiple of 16 bytes; now we also align to 8byte boundary fixed-size objects that are multiple of 8 bytes. That's only useful for now for double, not e.g. for Vector2f, but that didn't seem to hurt. Am I missing something? Do you prefer that we don't align Vector2f at all? Also, improvements in test_unalignedassert.
* clean the commented asm instructions because now I'm sureGravatar Gael Guennebaud2009-09-17
| | | | the previous fix is ok
* fix #53: performance regression, hopefully I did not resurected anotherGravatar Gael Guennebaud2009-09-17
| | | | perf. issue...
* make custom asm directive volatileGravatar Gael Guennebaud2009-08-09
|
* * implement a second level of micro blocking (faster for small sizes)Gravatar Gael Guennebaud2009-08-07
| | | | * workaround GCC bad implementation of _mm_set1_p*
* finally directly calling the low-level products is fasterGravatar Gael Guennebaud2009-07-10
|
* only disable the inline ASM if we're NEITHER gcc nor icc. right ??Gravatar Benoit Jacob2009-06-26
|
* re-enable the fast unaligned loads for gcc and icc using inline assemblyGravatar Gael Guennebaud2009-06-24
| | | | (this allows to avoid incompatible pointer casts and to specify the dependency to the data explicitely)
* use the slower unaligned load intrinsics in ei_ploadu because GCC mess up ↵Gravatar Gael Guennebaud2009-06-23
| | | | with my tricks
* remove sentence "Eigen itself is part of the KDE project."Gravatar Benoit Jacob2009-05-22
| | | | it never made very precise sense. but now does it still make any?
* add vectorization of sqrt for floatGravatar Gael Guennebaud2009-03-27
|
* add SSE2 versions of sin, cos, log, exp using code from JulienGravatar Gael Guennebaud2009-03-25
| | | | | | | | Pommier. They are for float only, and they return exactly the same result as the standard versions in about 90% of the cases. Otherwise the max error is below 1e-7. However, for very large values (>1e3) the accuracy of sin and cos slighlty decrease. They are about 3 or 4 times faster than 4 calls to their respective standard versions. So, is it ok to enable them by default in their respective functors ?
* add vectorization of unary operator-() (the AltiVec version is probablyGravatar Gael Guennebaud2009-03-20
| | | | broken)
* add the vectorization of absGravatar Gael Guennebaud2009-03-09
|
* slight optimization of SSE base integer mul (thanks to Rohit Garg)Gravatar Gael Guennebaud2009-03-08
|
* add much faster versions of unaligned stores (and slightly fasterGravatar Gael Guennebaud2009-03-03
| | | | unaligned loads)
* * exit Sum.h, exit Prod.h, welcome vectorization of redux() !Gravatar Gael Guennebaud2009-02-12
| | | | * add vectorization for minCoeff and maxCoeff
* * add ei_predux_mul internal functionGravatar Gael Guennebaud2009-02-10
| | | | * apply Ricard Marxer's prod() patch with fixes for the vectorized path
* Add vectorization of Reverse (was more tricky than I thought) andGravatar Gael Guennebaud2009-02-06
| | | | simplify the index based functions