eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Better CUDA complex division.	Antonio Sanchez	2021-04-29
\| \| \| \| \| \|	The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.
*	Fix NVCC+ICC issues.	Antonio Sanchez	2021-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NVCC does not understand `__forceinline`, so we need to use `inline` when compiling for GPU. ICC specializes `std::complex` operators for `float` and `double` by default, which cannot be used on device and conflict with Eigen's workaround in CUDA/Complex.h. This can be prevented by defining `_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`. Added this define to the tests and to `Eigen/Core`, but this will not work if the user includes `<complex>` before `<Eigen/Core>`. ICC also seems to generate a duplicate `Map` symbol in `PlainObjectBase`: ``` error: "Map" has already been declared in the current scope static ConstMapType Map(const Scalar *data) ``` I tracked this down to `friend class Eigen::Map`. Putting the `friend` statements at the bottom of the class seems to resolve this issue. Fixes #2180
*	Specialize std::complex operators for use on GPU device.	Antonio Sanchez	2021-01-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NVCC and older versions of clang do not fully support `std::complex` on device, leading to either compile errors (Cannot call `__host__` function) or worse, runtime errors (Illegal instruction). For most functions, we can implement specialized `numext` versions. Here we specialize the standard operators (with the exception of stream operators and member function operators with a scalar that are already specialized in `<complex>`) so they can be used in device code as well. To import these operators into the current scope, use `EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces. This allow us to remove specializations of the sum/difference/product/quotient ops, and allow us to treat complex numbers like most other scalars (e.g. in tests).
*	Improved std::complex sqrt and rsqrt.	Antonio Sanchez	2021-01-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously `complex_sqrt` was only used for CUDA and MSVC), and implements custom `complex_rsqrt`. Also introduces `numext::rsqrt` to simplify implementation, and modified `numext::hypot` to adhere to IEEE IEC 6059 for special cases. The `complex_sqrt` and `complex_rsqrt` implementations were found to be significantly faster than `std::sqrt<std::complex<T>>` and `1/numext::sqrt<std::complex<T>>`. Benchmark file attached. ``` GCC 10, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448 BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545 BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062 BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248 BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474 BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127 BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028 BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711 Clang 11, Intel Xeon, x86_64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042 BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878 BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030 BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588 BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666 BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676 BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956 BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798 Clang 9, Pixel 2, aarch64: --------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------- BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031 BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926 BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591 BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375 BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383 BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438 BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689 BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944 ```
*	Only specialize complex `sqrt_impl` for CUDA if not MSVC.	Antonio Sanchez	2021-01-11
\| \| \| \| \|	We already specialize `sqrt_impl` on windows due to MSVC's mishandling of `inf` (!355).
*	Fix MSVC complex sqrt and packetmath test.	Antonio Sanchez	2021-01-08
\| \| \| \| \| \| \| \| \|	MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`. Here we replace it with a custom version (currently used on GPU). Also fixed the `packetmath` test, which previously skipped several corner cases since `CHECK_CWISE1` only tests the first `PacketSize` elements.
*	Add CUDA complex sqrt.	Antonio Sanchez	2020-12-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to support scalar `sqrt` of complex numbers `std::complex<T>` on device, requested by Tensorflow folks. Technically `std::complex` is not supported by NVCC on device (though it is by clang), so the default `sqrt(std::complex<T>)` function only works on the host. Here we create an overload to add back the functionality. Also modified the CMake file to add `--relaxed-constexpr` (or equivalent) flag for NVCC to allow calling constexpr functions from device functions, and added support for specifying compute architecture for NVCC (was already available for clang).
*	renaming CUDA* to GPU* for some header files	Deven Desai	2018-07-11
\|
*	moving Half headers from CUDA dir to GPU dir, removing the HIP versions	Deven Desai	2018-06-13
\|
*	Derivative of the incomplete Gamma function and the sample of a Gamma random ↵	Michael Figurnov	2018-06-06
\| \| \| \| \| \| \| \| \| \|	variable. In addition to igamma(a, x), this code implements: * igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter * gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
*	Exponentially scaled modified Bessel functions of order zero and one.	Michael Figurnov	2018-05-31
\| \| \| \| \| \|	The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(\|x\|) i0e(x) The code is ported from Cephes and tested against SciPy.
*	bug #1520: workaround some -Wfloat-equal warnings by calling std::equal_to	Gael Guennebaud	2018-04-11
\|
*	Move up the specialization of std::numeric_limits	Daniel Trebbien	2018-02-18
\| \| \| \| \|	This fixes a compilation error seen when building TensorFlow on macOS: https://github.com/tensorflow/tensorflow/issues/17067
*	Replace __float2half_rn with __float2half	nluehr	2017-11-28
\| \| \| \|	The latter provides a consistent definition for CUDA 8.0 and 9.0.
*	Fix incorrect integer cast in predux<half2>().	nluehr	2017-11-21
\| \| \| \|	Bug corrupts results on Maxwell and earlier GPU architectures.
*	Restore `__device__`	Henry Schreiner	2017-10-21
\|
*	Fixing missing inlines on device functions for newer CUDA cards	Henry Schreiner	2017-10-20
\|
*	Add C++11 max_digits10 for half.	Gael Guennebaud	2017-09-06
\|
*	Added support for CUDA 9.0.	Benoit Steiner	2017-08-31
\|
*	bug #1462: remove all occurences of the deprecated __CUDACC_VER__ macro by ↵	Gael Guennebaud	2017-08-24
\| \| \| \|	introducing EIGEN_CUDACC_VER
*	Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵	Gael Guennebaud	2017-07-17
\| \| \| \|	aliases
*	Added missing __device__ qualifier	Benoit Steiner	2017-06-13
\|\
\| *	Added missing __device__ qualifier	Benoit Steiner	2017-06-13
\| \|
* \|	fix compilation of Half in C++98 (issue introduced in previous commit)	Gael Guennebaud	2017-06-09
\| \|
* \|	Add missing std::numeric_limits specialization for half, and complete ↵	Gael Guennebaud	2017-06-09
\| \| \| \| \| \| \| \|	NumTraits<half>
* \|	Fixed nested angle barckets >> issue when compiling with cuda 8	Abhijit Kundu	2017-04-27
\|/
*	remove UTF8 symbols	Gael Guennebaud	2017-03-07
\|
*	Made most of the packet math primitives usable within CUDA kernel when ↵	Benoit Steiner	2017-02-28
\| \| \| \|	compiling with clang
*	Avoid unecessary float to double conversions.	Benoit Steiner	2017-02-27
\|
*	Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).	Srinivas Vasudevan	2016-12-05
\|
*	Fix small nit where I changed name of plog1p to pexpm1.	Srinivas Vasudevan	2016-12-02
\|
*	Added support for expm1 in Eigen.	Srinivas Vasudevan	2016-12-02
\|
*	Add a default constructor for the "fake" __half class when not using the	Rasmus Munk Larsen	2016-11-29
\| \| \| \|	__half class provided by CUDA.
*	Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs	Benoit Steiner	2016-11-16
\|
*	Merged eigen/eigen into default	Benoit Steiner	2016-11-03
\|\
\| *	Gate the code that refers to cuda fp16 primitives more thoroughly	Benoit Steiner	2016-11-01
\| \|
* \|	Deleted redundant implementation of predux	Benoit Steiner	2016-10-12
\| \|
* \|	Merged eigen/eigen into default	Benoit Steiner	2016-10-12
\|\\|
* \|	Take advantage of AVX512 instructions whenever possible to speedup the ↵	Benoit Steiner	2016-10-12
\| \| \| \| \| \| \| \|	processing of 16 bit floats.
\| *	Added missing AVX intrinsics for fp16: in particular, implemented predux ↵	Benoit Steiner	2016-10-06
\|/ \| \| \|	which is required by the matrix-vector code.
*	Properly characterize the CUDA packet primitives for fp16 as device only	Benoit Steiner	2016-10-04
\|
*	Added support for constand std::complex numbers on GPU	Benoit Steiner	2016-10-03
\|
*	Added missing typedefs	Benoit Steiner	2016-09-20
\|
*	Add CUDA-specific std::complex<T> specializations for scalar_sum_op, ↵	RJ Ryan	2016-09-20
\| \| \| \|	scalar_difference_op, scalar_product_op, and scalar_quotient_op.
*	use ::hlog if available.	Gael Guennebaud	2016-08-29
\|
*	bug #1167: simplify installation of header files using cmake's ↵	Gael Guennebaud	2016-08-29
\| \| \| \|	install(DIRECTORY ...) command.
*	Fix compilation with MSVC by using our portable numext::log1p implementation.	Gael Guennebaud	2016-08-22
\|
*	Fix compilation on CUDA 8 by removing call to h2log1p	Igor Babuschkin	2016-08-15
\|
*	Add log1p support for CUDA and half floats	Igor Babuschkin	2016-08-08
\|
*	Fixed the constructors of the new half_base class.	Benoit Steiner	2016-08-04
\|