Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | bug #1674: workaround clang fast-math aggressive optimizations | Gael Guennebaud | 2019-02-22 | |
| | ||||
* | bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization ↵ | Gael Guennebaud | 2019-02-03 | |
| | | | | (results are completely wrong otherwise) | |||
* | PR 571: Implements an accurate argument reduction algorithm for huge inputs ↵ | Gael Guennebaud | 2019-01-14 | |
| | | | | | | | | | of sin/cos and call it instead of falling back to std::sin/std::cos. This makes both the small and huge argument cases faster because: - for small inputs this removes the last pselect - for large inputs only the reduction part follows a scalar path, the rest use the same SIMD path as the small-argument case. | |||
* | Replace compiler's alignas/alignof extension by respective c++11 keywords ↵ | Gael Guennebaud | 2019-01-11 | |
| | | | | when available. This also fix a compilation issue with gcc-4.7. | |||
* | fix warning | Gael Guennebaud | 2019-01-09 | |
| | ||||
* | bug #1652: implements a much more accurate version of vectorized sin/cos. ↵ | Gael Guennebaud | 2019-01-09 | |
| | | | | | | | This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows: - no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs | |||
* | Implement a faster fix for sin/cos of large entries that also correctly ↵ | Gael Guennebaud | 2018-12-23 | |
| | | | | handle INF input. | |||
* | Make sure that psin/pcos return number in [-1,1] for large inputs (though ↵ | Gael Guennebaud | 2018-12-23 | |
| | | | | sin/cos on large entries is quite useless because it's inaccurate) | |||
* | Fix plog(+INF): it returned ~87 instead of +INF | Gael Guennebaud | 2018-12-23 | |
| | ||||
* | Extend the generic psin_float code to handle cosine and make SSE and AVX use ↵ | Gael Guennebaud | 2018-11-30 | |
| | | | | it (-> this adds pcos for AVX) | |||
* | bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird ↵ | Gael Guennebaud | 2018-11-27 | |
| | | | | pshiftright_and_cast and pcast_and_shiftleft functions. | |||
* | Update pshiftleft to pass the shift as a true compile-time integer. | Gael Guennebaud | 2018-11-27 | |
| | ||||
* | Unify SSE/AVX psin functions. | Gael Guennebaud | 2018-11-27 | |
| | | | | | | | | It is based on the SSE version which is much more accurate, though very slightly slower. This changeset also includes the following required changes: - add packet-float to packet-int type traits - add packet float<->int reinterpret casts - add faster pselect for AVX based on blendv | |||
* | Unify SSE and AVX pexp for double. | Gael Guennebaud | 2018-11-26 | |
| | ||||
* | Unify SSE and AVX implementation of pexp | Gael Guennebaud | 2018-11-26 | |
| | ||||
* | First step toward a unification of packet log implementation, currently only ↵ | Gael Guennebaud | 2018-11-26 | |
SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions. |