aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/SSE/PacketMath.h
Commit message (Collapse)AuthorAge
* bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵Gravatar Gael Guennebaud2016-09-08
| | | | some specializations in arch/SSE and arch/AVX)
* Implement pmadd for float and double to make it consistent with the ↵Gravatar Gael Guennebaud2016-08-23
| | | | vectorized path when FMA is available.
* Remove now-unused protate PacketMath funcGravatar Benoit Jacob2016-05-24
|
* Optimized implementation of the tanh function for SSEGravatar Benoit Steiner2016-02-10
|
* Remove custom unaligned loads for SSE. They were only useful for core2 CPU.Gravatar Gael Guennebaud2016-02-08
|
* Fix "," in non SSE4 modeGravatar Gael Guennebaud2015-11-05
|
* Add round, ceil and floor for SSE4.1/AVX (Bug #70)Gravatar Alexandre Avenel2015-11-01
|
* bug #1085: workaround gcc default ABI issueGravatar Gael Guennebaud2015-10-10
|
* _mm_hadd_epi32 is for SSSE3 only (and not SSE3)Gravatar Gael Guennebaud2015-10-07
|
* Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵Gravatar Gael Guennebaud2015-10-06
| | | | SSE3 integer vectorization, plus minor tweaks)
* Fix prototype of plset and generalize linspace functor.Gravatar Gael Guennebaud2015-08-07
|
* Let unpacket_traits<> exposes the required alignment and make use of it ↵Gravatar Gael Guennebaud2015-08-07
| | | | everywhere
* Added support for fast reciprocal square root computation.Gravatar Benoit Steiner2015-02-26
|
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Disable __m128* wrappers when compiling with AVX and -fabi-version=4Gravatar Gael Guennebaud2015-02-17
|
* Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵Gravatar Gael Guennebaud2015-02-17
| | | | type with default ABI)
* The usage of DenseIndex is deprecated, so let's replace DenseIndex by IndexGravatar Gael Guennebaud2015-02-16
|
* merge Tensor module within Eigen/unsupported and update gemv BLAS wrapperGravatar Gael Guennebaud2015-02-12
|\
* | FMA has been wrongly disabledGravatar Gael Guennebaud2015-02-10
| |
| * Pulled the latest changes from the trunkGravatar Benoit Steiner2015-02-06
| |\ | |/ |/|
* | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵Gravatar Benoit Jacob2015-01-30
| | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD
* | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,Gravatar Benoit Jacob2015-01-31
| | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.
* | Introduce unified macros to identify compiler, OS, and architecture. They ↵Gravatar Gael Guennebaud2014-11-04
| | | | | | | | are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively.
| * Pulled in the latest changes from the Eigen trunkGravatar Benoit Steiner2014-08-13
| |\ | |/ |/|
* | Fix many long to int implicit conversionsGravatar Gael Guennebaud2014-07-08
| |
| * Created the pblend packet primitive and implemented it using SSE and AVX ↵Gravatar Benoit Steiner2014-06-06
|/ | | | instructions.
* Make sure that calls to broadcast4 are 16 bytes alignedGravatar Gael Guennebaud2014-04-25
|
* Enable vectorization of pack_rhs with a column-major RHS.Gravatar Gael Guennebaud2014-04-25
| | | | Rename and generalize Kernel<*> to PacketBlock<*,N>.
* Enable fused madd for AltivecGravatar Gael Guennebaud2014-04-24
|
* Workaround gcc's default ABI not being able to distinghish between vector ↵Gravatar Gael Guennebaud2014-04-22
| | | | types of different sizes.
* New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵Gravatar Gael Guennebaud2014-04-16
| | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
* Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵Gravatar Benoit Steiner2014-04-14
| | | | a recent version of gcc (ie gcc 4.8).
* Add a mechanism to recursively access to half-size packet typesGravatar Gael Guennebaud2014-03-28
|
* Implemented the SSE version of the gather and scatter packet primitives.Gravatar Benoit Steiner2014-03-27
|
* Created the ptranspose packet primitive that can transpose an array of N ↵Gravatar Benoit Steiner2014-03-26
| | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
* Merged latest updates from the parent branchGravatar Benoit Steiner2014-03-26
|\
| * Implement new 1 packet x 8 gebp kernelGravatar Gael Guennebaud2014-03-26
| |
| * add pbroadcast2/4 generic intrinsicsGravatar Gael Guennebaud2014-03-26
| |
* | Added support for FMA instructionsGravatar Benoit Steiner2014-02-24
| |
* | Added support for AVX to Eigen.Gravatar Benoit Steiner2014-01-29
| |
| * Revert previous change and introduce a new workaround regarding gcc ↵Gravatar Gael Guennebaud2014-03-20
| | | | | | | | | | | | | | generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering.
| * Makes gcc to generate a pshufd instruction for pset1Gravatar Gael Guennebaud2014-03-20
|/
* Remove useless register keyword, and optimize predux_min/max for SSE4Gravatar Gael Guennebaud2014-01-25
|
* Fix bug #642: add vectorization of sqrt for doubles, and make sqrt really ↵Gravatar Gael Guennebaud2013-08-19
| | | | safe if EIGEN_FAST_MATH is disabled
* Add missing pconj specializationsGravatar Gael Guennebaud2013-05-17
|
* Add SSE4 min/max for integersGravatar Gael Guennebaud2013-03-20
|
* add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1Gravatar Gael Guennebaud2012-07-27
|
* Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.Gravatar Benoit Jacob2012-07-13
|
* Get rid of include directives inside namespace blocks (bug #339).Gravatar Jitse Niesen2012-04-15
|
* proper C++ castingGravatar Gael Guennebaud2012-01-31
|