diff options
author | Antonio Sanchez <cantonios@google.com> | 2020-12-22 22:49:06 -0800 |
---|---|---|
committer | Antonio Sanchez <cantonios@google.com> | 2020-12-22 23:25:23 -0800 |
commit | 070d303d56d46d2e018a58214da24ca629ea454f (patch) | |
tree | 3dfa72bf48ffdca0a67bd794596e4e452d50ed19 /Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h | |
parent | fdf2ee62c5174441076fb64c9737d89bbe102759 (diff) |
Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.
Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.
Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
Diffstat (limited to 'Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h')
-rw-r--r-- | Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h b/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h index a6d2de62b..9253d8cab 100644 --- a/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h +++ b/Eigen/src/Core/arch/Default/GenericPacketMathFunctions.h @@ -703,8 +703,8 @@ Packet psqrt_complex(const Packet& a) { // u = sqrt(0.5 * (x + sqrt(x^2 + y^2))) // v = 0.5 * (y / u) // and for x < 0, - // v = sign(y) * sqrt(0.5 * (x + sqrt(x^2 + y^2))) - // u = |0.5 * (y / v)| + // v = sign(y) * sqrt(0.5 * (-x + sqrt(x^2 + y^2))) + // u = 0.5 * (y / v) // // To avoid unnecessary over- and underflow, we compute sqrt(x^2 + y^2) as // l = max(|x|, |y|) * sqrt(1 + (min(|x|, |y|) / max(|x|, |y|))^2) , |