Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | First step toward a unification of packet log implementation, currently only ↵ | Gael Guennebaud | 2018-11-26 | |
| | | | | | | SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions. | |||
* | Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" | Gael Guennebaud | 2018-11-26 | |
| | ||||
* | bug #1605: workaround ABI issue with vector types (aka __m128) versus scalar ↵ | Gael Guennebaud | 2018-10-01 | |
| | | | | types (aka float) | |||
* | remove double ;; | Gael Guennebaud | 2018-07-12 | |
| | ||||
* | Fix compilation with MSVC by reverting to char* for _mm_prefetch except for ↵ | Gael Guennebaud | 2018-06-07 | |
| | | | | PGI (the later being the one that has the wrong prototype). | |||
* | Define pcast<> for SSE types even when AVX is enabled. (otherwise float are ↵ | Gael Guennebaud | 2018-05-29 | |
| | | | | silently reinterpreted as int instead of being converted) | |||
* | Fix compilation and SSE support with PGI compiler | Gael Guennebaud | 2018-05-29 | |
| | ||||
* | MIsc. source and comment typos | luz.paz | 2018-03-11 | |
| | | | | Found using `codespell` and `grep` from downstream FreeCAD | |||
* | bug #1436: fix compilation of Jacobi rotations with ARM NEON, some ↵ | Gael Guennebaud | 2017-06-15 | |
| | | | | specializations of internal::conj_helper were missing. | |||
* | Make NaN propagatation consistent between the pmax/pmin and ↵ | Rasmus Munk Larsen | 2017-01-24 | |
| | | | | | | std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op. See #1373 for details. | |||
* | bug #1363: fix mingw's ABI issue | Gael Guennebaud | 2016-12-15 | |
| | ||||
* | Disable usage of SSE3 _mm_hadd_ps that is extremely slow. | Gael Guennebaud | 2016-11-22 | |
| | ||||
* | Disable usage of SSE3 haddpd that is extremely slow. | Gael Guennebaud | 2016-11-22 | |
| | ||||
* | Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX. | Gael Guennebaud | 2016-11-02 | |
| | ||||
* | Add missing inline keywords | Gael Guennebaud | 2016-10-25 | |
| | ||||
* | Fixed a typo | Benoit Steiner | 2016-10-25 | |
| | ||||
* | Add a pinsertlast function replacing the last entry of a packet by a scalar. | Gael Guennebaud | 2016-10-25 | |
| | | | | (useful to vectorize LinSpaced) | |||
* | Update comment for fast sqrt. | Rasmus Munk Larsen | 2016-10-04 | |
| | ||||
* | Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen ↵ | Rasmus Munk Larsen | 2016-10-04 | |
| | | | | | | | | | | | | | | (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments. Benchmark speed in Giga-sqrts/s Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ----------------------------------------- SSE AVX Fast=1 2.529G 4.380G Fast=0 1.944G 1.898G Fast=1 fixed 2.214G 3.739G This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible. | |||
* | bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵ | Gael Guennebaud | 2016-09-08 | |
| | | | | some specializations in arch/SSE and arch/AVX) | |||
* | bug #1167: simplify installation of header files using cmake's ↵ | Gael Guennebaud | 2016-08-29 | |
| | | | | install(DIRECTORY ...) command. | |||
* | Implement pmadd for float and double to make it consistent with the ↵ | Gael Guennebaud | 2016-08-23 | |
| | | | | vectorized path when FMA is available. | |||
* | Factorize the 4 copies of tanh implementations, make numext::tanh consistent ↵ | Gael Guennebaud | 2016-08-23 | |
| | | | | with array::tanh, enable fast tanh in fast-math mode only. | |||
* | Remove now-unused protate PacketMath func | Benoit Jacob | 2016-05-24 | |
| | ||||
* | Improved implementation of ptanh for SSE and AVX | Benoit Steiner | 2016-02-18 | |
| | ||||
* | Avoid implicit cast from double to float. | Benoit Steiner | 2016-02-10 | |
| | ||||
* | Optimized implementation of the tanh function for SSE | Benoit Steiner | 2016-02-10 | |
| | ||||
* | Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC | Benoit Jacob | 2016-02-10 | |
| | ||||
* | Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088 | Benoit Jacob | 2016-02-10 | |
| | ||||
* | Remove custom unaligned loads for SSE. They were only useful for core2 CPU. | Gael Guennebaud | 2016-02-08 | |
| | ||||
* | Fix compilation on old gcc+AVX | Gael Guennebaud | 2016-01-21 | |
| | ||||
* | Add numext::sqrt function to enable custom optimized implementation. | Gael Guennebaud | 2016-01-21 | |
| | | | | | | | | This changeset add two specializations for float/double on SSE. Those are mostly usefull with GCC for which std::sqrt add an extra and costly check on the result of _mm_sqrt_*. Clang does not add this burden. In this changeset, only DenseBase::norm() makes use of it. | |||
* | Workaround clang -Wdocumentation warning about "/*<" | Gael Guennebaud | 2015-12-30 | |
| | ||||
* | Fix "," in non SSE4 mode | Gael Guennebaud | 2015-11-05 | |
| | ||||
* | Add round, ceil and floor for SSE4.1/AVX (Bug #70) | Alexandre Avenel | 2015-11-01 | |
| | ||||
* | bug #1085: workaround gcc default ABI issue | Gael Guennebaud | 2015-10-10 | |
| | ||||
* | _mm_hadd_epi32 is for SSSE3 only (and not SSE3) | Gael Guennebaud | 2015-10-07 | |
| | ||||
* | Handle various TODOs in SSE vectorization (remove splitted storeu, enable ↵ | Gael Guennebaud | 2015-10-06 | |
| | | | | SSE3 integer vectorization, plus minor tweaks) | |||
* | Fix prototype of plset and generalize linspace functor. | Gael Guennebaud | 2015-08-07 | |
| | ||||
* | Let unpacket_traits<> exposes the required alignment and make use of it ↵ | Gael Guennebaud | 2015-08-07 | |
| | | | | everywhere | |||
* | Added an optimized version of rsqrt for SSE and AVX that is used when ↵ | Benoit Steiner | 2015-03-02 | |
| | | | | EIGEN_FAST_MATH is defined. | |||
* | Switch to truncated casting when converting floating point types to integer. ↵ | Benoit Steiner | 2015-02-27 | |
| | | | | This ensures that vectorized casts are consistent with scalar casts | |||
* | Added support for vectorized type casting of tensors | Benoit Steiner | 2015-02-27 | |
| | ||||
* | Added support for fast reciprocal square root computation. | Benoit Steiner | 2015-02-26 | |
| | ||||
* | bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path | Benoit Jacob | 2015-02-18 | |
| | | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower). | |||
* | Remove some dead stores. | Gael Guennebaud | 2015-02-18 | |
| | ||||
* | Disable __m128* wrappers when compiling with AVX and -fabi-version=4 | Gael Guennebaud | 2015-02-17 | |
| | ||||
* | Fix compilation with GCC/AVX (workaround __m128 and __m256 being the same ↵ | Gael Guennebaud | 2015-02-17 | |
| | | | | type with default ABI) | |||
* | The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index | Gael Guennebaud | 2015-02-16 | |
| | ||||
* | merge Tensor module within Eigen/unsupported and update gemv BLAS wrapper | Gael Guennebaud | 2015-02-12 | |
|\ |