aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/AVX512/MathFunctions.h
Commit message (Collapse)AuthorAge
* HasExp added for AVX512 Packet8dGravatar Jakub Lichman2021-04-20
|
* Fix pfrexp/pldexp for half.Gravatar Antonio Sanchez2021-01-21
| | | | | | | | | | The recent addition of vectorized pow (!330) relies on `pfrexp` and `pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`. Adding tests for these packet ops also exposed an issue with handling negative values in `pfrexp`, returning an incorrect exponent. Added the missing implementations, corrected the exponent in `pfrexp1`, and added `packetmath` tests.
* Add log2() to Eigen.Gravatar Rasmus Munk Larsen2020-12-04
|
* Revert "Add log2() operator to Eigen"Gravatar Rasmus Munk Larsen2020-12-03
| | | | This reverts commit 4d91519a9be061da5d300079fca17dd0b9328050.
* Add log2() operator to EigenGravatar Rasmus Munk Larsen2020-12-03
|
* Fix a few issues for AVX512. This change enables vectorized versions of log, ↵Gravatar Rasmus Munk Larsen2020-12-01
| | | | exp, log1p, expm1 when AVX512DQ is not available.
* AVX512 missing ops.Gravatar Antonio Sanchez2020-11-30
| | | | | | | | | | This allows the `packetmath` tests to pass for AVX512 on skylake. Made `half` and `bfloat16` consistent in terms of ops they support. Note the `log` tests are currently disabled for `bfloat16` since they fail due to poor precision (they were previously disabled for `Packet8bf` via test function specialization -- I just removed that specialization and disabled it in the generic test).
* Improve polynomial evaluation with instruction-level parallelism for ↵Gravatar guoqiangqi2020-10-20
| | | | pexp_float and pexp<Packet16f>
* Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 ↵Gravatar Guoqiang QI2020-10-15
| | | | plog<Packet16f> op with generic api
* AVX path for BF16Gravatar Sheng Yang2020-07-14
|
* Support BFloat16 in EigenGravatar Teng Lu2020-06-20
|
* Fix for gcc build error when using Eigen headers with AVX512Gravatar Anuj Rawat2020-01-10
|
* 1. Fix a bug in psqrt and make it return 0 for +inf arguments.Gravatar Rasmus Munk Larsen2019-11-15
| | | | | | | | | | | | | | | | 2. Simplify handling of special cases by taking advantage of the fact that the builtin vrsqrt approximation handles negative, zero and +inf arguments correctly. This speeds up the SSE and AVX implementations by ~20%. 3. Make the Newton-Raphson formula used for rsqrt more numerically robust: Before: y = y * (1.5 - x/2 * y^2) After: y = y * (1.5 - y * (x/2) * y) Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision. 4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration. Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
* bug #1744: fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require ↵Gravatar Gael Guennebaud2019-11-15
| | | | plog/pexp, but the later was disabled on some compilers
* PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flagGravatar Sakshi Goynar2019-10-31
|
* Move implementation of vectorized error function erf() to ↵Gravatar Rasmus Munk Larsen2019-09-27
| | | | SpecialFunctionsImpl.h.
* Add generic PacketMath implementation of the Error Function (erf).Gravatar Rasmus Munk Larsen2019-09-19
|
* Implement vectorized versions of log1p and expm1 in Eigen using Kahan's ↵Gravatar Rasmus Munk Larsen2019-08-12
| | | | | | | | | | | | formulas, and change the scalar implementations to properly handle infinite arguments. Depending on instruction set, significant speedups are observed for the vectorized path: log1p wall time is reduced 60-93% (2.5x - 15x speedup) expm1 wall time is reduced 0-85% (1x - 7x speedup) The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly. Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
* fix plog(+inf) with AVX512Gravatar Gael Guennebaud2019-01-09
|
* Add psin/pcos on AVX512 -> almost for free, at last!Gravatar Gael Guennebaud2018-11-30
|
* Fix float-to-double warningGravatar Gael Guennebaud2018-10-16
|
* Fix avx512 plog(NaN) to return NaN instead of +infGravatar Gael Guennebaud2018-10-11
|
* Enable avx512 plog with clangGravatar Gael Guennebaud2018-10-11
|
* Re-enable FMA for fast sqrt functionsGravatar Mark D Ryan2018-07-30
|
* Fix AVX512 implementations of psqrtGravatar Mark D Ryan2018-06-25
| | | | | | | | | | | | | This commit fixes the AVX512 implementations of psqrt in the same way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d fixed the AVX2 version of this function. The AVX512 versions of psqrt incorrectly return -0.0 for negative values, instead of NaN. Fixing the issues requires adding some additional instructions that slow down the algorithms. A similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d shows that the corrected Packet16f code runs at 73% of the speed of the existing code, while the corrected Packed8d function runs at 68% of the original.
* fix AVX512 plogGravatar Jayaram Bobba2018-04-20
|
* AVX512: _mm512_rsqrt28_ps is available for AVX512ER onlyGravatar Gael Guennebaud2018-04-03
|
* AVX512: fix psqrt and prsqrtGravatar Gael Guennebaud2018-04-03
|
* Disabled some of the AVX512 primitives on compilers that don't support themGravatar Benoit Steiner2016-04-29
|
* Commented out the version of pexp<Packet8d> since it fails to compile with ↵Gravatar Benoit Steiner2016-02-04
| | | | gcc 5.3
* Added implementations of pexp, plog, psqrt, and prsqrt optimized for AVX512Gravatar Benoit Steiner2016-02-04