aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/SSE
Commit message (Collapse)AuthorAge
...
* First step toward a unification of packet log implementation, currently only ↵Gravatar Gael Guennebaud2018-11-26
| | | | | | SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
* Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"Gravatar Gael Guennebaud2018-11-26
|
* bug #1605: workaround ABI issue with vector types (aka __m128) versus scalar ↵Gravatar Gael Guennebaud2018-10-01
| | | | types (aka float)
* remove double ;;Gravatar Gael Guennebaud2018-07-12
|
* Fix compilation with MSVC by reverting to char* for _mm_prefetch except for ↵Gravatar Gael Guennebaud2018-06-07
| | | | PGI (the later being the one that has the wrong prototype).
* Define pcast<> for SSE types even when AVX is enabled. (otherwise float are ↵Gravatar Gael Guennebaud2018-05-29
| | | | silently reinterpreted as int instead of being converted)
* Fix compilation and SSE support with PGI compilerGravatar Gael Guennebaud2018-05-29
|
* MIsc. source and comment typosGravatar luz.paz2018-03-11
| | | | Found using `codespell` and `grep` from downstream FreeCAD
* bug #1436: fix compilation of Jacobi rotations with ARM NEON, some ↵Gravatar Gael Guennebaud2017-06-15
| | | | specializations of internal::conj_helper were missing.
* Make NaN propagatation consistent between the pmax/pmin and ↵Gravatar Rasmus Munk Larsen2017-01-24
| | | | | | std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details.
* bug #1363: fix mingw's ABI issueGravatar Gael Guennebaud2016-12-15
|
* Disable usage of SSE3 _mm_hadd_ps that is extremely slow.Gravatar Gael Guennebaud2016-11-22
|
* Disable usage of SSE3 haddpd that is extremely slow.Gravatar Gael Guennebaud2016-11-22
|
* Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.Gravatar Gael Guennebaud2016-11-02
|
* Add missing inline keywordsGravatar Gael Guennebaud2016-10-25
|
* Fixed a typoGravatar Benoit Steiner2016-10-25
|
* Add a pinsertlast function replacing the last entry of a packet by a scalar.Gravatar Gael Guennebaud2016-10-25
| | | | (useful to vectorize LinSpaced)
* Update comment for fast sqrt.Gravatar Rasmus Munk Larsen2016-10-04
|
* Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen ↵Gravatar Rasmus Munk Larsen2016-10-04
| | | | | | | | | | | | | | (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
* bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵Gravatar Gael Guennebaud2016-09-08
| | | | some specializations in arch/SSE and arch/AVX)
* bug #1167: simplify installation of header files using cmake's ↵Gravatar Gael Guennebaud2016-08-29
| | | | install(DIRECTORY ...) command.
* Implement pmadd for float and double to make it consistent with the ↵Gravatar Gael Guennebaud2016-08-23
| | | | vectorized path when FMA is available.
* Factorize the 4 copies of tanh implementations, make numext::tanh consistent ↵Gravatar Gael Guennebaud2016-08-23
| | | | with array::tanh, enable fast tanh in fast-math mode only.
* Remove now-unused protate PacketMath funcGravatar Benoit Jacob2016-05-24
|
* Improved implementation of ptanh for SSE and AVXGravatar Benoit Steiner2016-02-18
|
* Avoid implicit cast from double to float.Gravatar Benoit Steiner2016-02-10
|
* Optimized implementation of the tanh function for SSEGravatar Benoit Steiner2016-02-10
|
* Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCCGravatar Benoit Jacob2016-02-10
|
* Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088Gravatar Benoit Jacob2016-02-10
|
* Remove custom unaligned loads for SSE. They were only useful for core2 CPU.Gravatar Gael Guennebaud2016-02-08
|
* Fix compilation on old gcc+AVXGravatar Gael Guennebaud2016-01-21
|
* Add numext::sqrt function to enable custom optimized implementation.Gravatar Gael Guennebaud2016-01-21
| | | | | | | | This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it.
* Workaround clang -Wdocumentation warning about "/*<"Gravatar Gael Guennebaud2015-12-30
|
* Fix "," in non SSE4 modeGravatar Gael Guennebaud2015-11-05
|
* Add round, ceil and floor for SSE4.1/AVX (Bug #70)Gravatar Alexandre Avenel2015-11-01
|
* bug #1085: workaround gcc default ABI issueGravatar Gael Guennebaud2015-10-10
|
* _mm_hadd_epi32 is for SSSE3 only (and not SSE3)Gravatar Gael Guennebaud2015-10-07
|
* Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵Gravatar Gael Guennebaud2015-10-06
| | | | SSE3 integer vectorization, plus minor tweaks)
* Fix prototype of plset and generalize linspace functor.Gravatar Gael Guennebaud2015-08-07
|
* Let unpacket_traits<> exposes the required alignment and make use of it ↵Gravatar Gael Guennebaud2015-08-07
| | | | everywhere
* Added an optimized version of rsqrt for SSE and AVX that is used when ↵Gravatar Benoit Steiner2015-03-02
| | | | EIGEN_FAST_MATH is defined.
* Switch to truncated casting when converting floating point types to integer. ↵Gravatar Benoit Steiner2015-02-27
| | | | This ensures that vectorized casts are consistent with scalar casts
* Added support for vectorized type casting of tensorsGravatar Benoit Steiner2015-02-27
|
* Added support for fast reciprocal square root computation.Gravatar Benoit Steiner2015-02-26
|
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Remove some dead stores.Gravatar Gael Guennebaud2015-02-18
|
* Disable __m128* wrappers when compiling with AVX and -fabi-version=4Gravatar Gael Guennebaud2015-02-17
|
* Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵Gravatar Gael Guennebaud2015-02-17
| | | | type with default ABI)
* The usage of DenseIndex is deprecated, so let's replace DenseIndex by IndexGravatar Gael Guennebaud2015-02-16
|
* merge Tensor module within Eigen/unsupported and update gemv BLAS wrapperGravatar Gael Guennebaud2015-02-12
|\