Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵ | 2016-09-08 | |
| | | | | some specializations in arch/SSE and arch/AVX) | ||
* | Implement pmadd for float and double to make it consistent with the ↵ | 2016-08-23 | |
| | | | | vectorized path when FMA is available. | ||
* | Remove now-unused protate PacketMath func | 2016-05-24 | |
| | |||
* | Optimized implementation of the tanh function for SSE | 2016-02-10 | |
| | |||
* | Remove custom unaligned loads for SSE. They were only useful for core2 CPU. | 2016-02-08 | |
| | |||
* | Fix "," in non SSE4 mode | 2015-11-05 | |
| | |||
* | Add round, ceil and floor for SSE4.1/AVX (Bug #70) | 2015-11-01 | |
| | |||
* | bug #1085: workaround gcc default ABI issue | 2015-10-10 | |
| | |||
* | _mm_hadd_epi32 is for SSSE3 only (and not SSE3) | 2015-10-07 | |
| | |||
* | Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵ | 2015-10-06 | |
| | | | | SSE3 integer vectorization, plus minor tweaks) | ||
* | Fix prototype of plset and generalize linspace functor. | 2015-08-07 | |
| | |||
* | Let unpacket_traits<> exposes the required alignment and make use of it ↵ | 2015-08-07 | |
| | | | | everywhere | ||
* | Added support for fast reciprocal square root computation. | 2015-02-26 | |
| | |||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | 2015-02-18 | |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | ||
* | Disable __m128* wrappers when compiling with AVX and -fabi-version=4 | 2015-02-17 | |
| | |||
* | Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵ | 2015-02-17 | |
| | | | | type with default ABI) | ||
* | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | 2015-02-16 | |
| | |||
* | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper | 2015-02-12 | |
|\ | |||
* | | FMA has been wrongly disabled | 2015-02-10 | |
| | | |||
| * | Pulled the latest changes from the trunk | 2015-02-06 | |
| |\ | |/ |/| | |||
* | | bug #936, patch 2/3: Remove EIGEN_VECTORIZE_FMA, was redundant with ↵ | 2015-01-30 | |
| | | | | | | | | EIGEN_HAS_SINGLE_INSTRUCTION_MADD | ||
* | | bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_, | 2015-01-31 | |
| | | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA. | ||
* | | Introduce unified macros to identify compiler, OS, and architecture. They ↵ | 2014-11-04 | |
| | | | | | | | | are all defined in util/Macros.h and prefixed with EIGEN_COMP_, EIGEN_OS_, and EIGEN_ARCH_ respectively. | ||
| * | Pulled in the latest changes from the Eigen trunk | 2014-08-13 | |
| |\ | |/ |/| | |||
* | | Fix many long to int implicit conversions | 2014-07-08 | |
| | | |||
| * | Created the pblend packet primitive and implemented it using SSE and AVX ↵ | 2014-06-06 | |
|/ | | | | instructions. | ||
* | Make sure that calls to broadcast4 are 16 bytes aligned | 2014-04-25 | |
| | |||
* | Enable vectorization of pack_rhs with a column-major RHS. | 2014-04-25 | |
| | | | | Rename and generalize Kernel<*> to PacketBlock<*,N>. | ||
* | Enable fused madd for Altivec | 2014-04-24 | |
| | |||
* | Workaround gcc's default ABI not being able to distinghish between vector ↵ | 2014-04-22 | |
| | | | | types of different sizes. | ||
* | New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵ | 2014-04-16 | |
| | | | | | | speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. | ||
* | Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵ | 2014-04-14 | |
| | | | | a recent version of gcc (ie gcc 4.8). | ||
* | Add a mechanism to recursively access to half-size packet types | 2014-03-28 | |
| | |||
* | Implemented the SSE version of the gather and scatter packet primitives. | 2014-03-27 | |
| | |||
* | Created the ptranspose packet primitive that can transpose an array of N ↵ | 2014-03-26 | |
| | | | | | | packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions. | ||
* | Merged latest updates from the parent branch | 2014-03-26 | |
|\ | |||
| * | Implement new 1 packet x 8 gebp kernel | 2014-03-26 | |
| | | |||
| * | add pbroadcast2/4 generic intrinsics | 2014-03-26 | |
| | | |||
* | | Added support for FMA instructions | 2014-02-24 | |
| | | |||
* | | Added support for AVX to Eigen. | 2014-01-29 | |
| | | |||
| * | Revert previous change and introduce a new workaround regarding gcc ↵ | 2014-03-20 | |
| | | | | | | | | | | | | | | generating a shufps instruction instead of the more efficient pshufd instruction. The trick consists in introducing a new pload1 function to be used in low level product kernels for which bug #203 does not apply. Indeed, it turned out that using inline assembly prevents gcc of doing a good job at instructtion reordering. | ||
| * | Makes gcc to generate a pshufd instruction for pset1 | 2014-03-20 | |
|/ | |||
* | Remove useless register keyword, and optimize predux_min/max for SSE4 | 2014-01-25 | |
| | |||
* | Fix bug #642: add vectorization of sqrt for doubles, and make sqrt really ↵ | 2013-08-19 | |
| | | | | safe if EIGEN_FAST_MATH is disabled | ||
* | Add missing pconj specializations | 2013-05-17 | |
| | |||
* | Add SSE4 min/max for integers | 2013-03-20 | |
| | |||
* | add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1 | 2012-07-27 | |
| | |||
* | Automatic relicensing to MPL2 using Keirs script. Manual fixup follows. | 2012-07-13 | |
| | |||
* | Get rid of include directives inside namespace blocks (bug #339). | 2012-04-15 | |
| | |||
* | proper C++ casting | 2012-01-31 | |
| |