diff options
author | Rasmus Munk Larsen <rmlarsen@google.com> | 2019-11-15 17:09:46 -0800 |
---|---|---|
committer | Rasmus Munk Larsen <rmlarsen@google.com> | 2019-11-15 17:09:46 -0800 |
commit | f1e83073082f2733eec6235f2fdf251217a54ade (patch) | |
tree | a20a4945bf0083ffe1a4d4a617a7a2c4740ba00a /blas/single.cpp | |
parent | 2cb2915f908418c897773e0342f152768c13a0d8 (diff) |
1. Fix a bug in psqrt and make it return 0 for +inf arguments.
2. Simplify handling of special cases by taking advantage of the fact that the
builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:
Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)
Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.
4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.
Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
Diffstat (limited to 'blas/single.cpp')
0 files changed, 0 insertions, 0 deletions