eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Fix c++20 warnings about using enums in arithmetic expressions.	Rasmus Munk Larsen	2021-06-10
\|
*	Remove unused function scalar_cmp_with_cast.	Rasmus Munk Larsen	2021-02-24
\|
*	Disable vectorized pow for half/bfloat16.	Antonio Sanchez	2021-02-05
\| \| \| \| \| \| \| \| \|	We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.
*	Vectorize `pow(x, y)`. This closes ↵	Rasmus Munk Larsen	2021-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.
*	Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and ↵	Rasmus Munk Larsen	2020-10-13
\| \| \| \|	`predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.
*	Don't make assumptions about NaN-propagation for pmin/pmax - it various ↵	Rasmus Munk Larsen	2020-10-07
\| \| \| \| \| \|	across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.
*	BF16 for scalar_cmp_with_cast_op	Sheng Yang	2020-07-01
\|
*	Implement scalar_cmp_with_cast_op	ShengYang1	2020-06-09
\|
*	Extend support for Packet16b:	Rasmus Munk Larsen	2020-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ptranspose<,4> to support matmul and add unit test for Matrix<bool> Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
*	Add partial vectorization for matrices and tensors of bool. This speeds up ↵	Rasmus Munk Larsen	2020-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
*	Add missing arguments to numext::absdiff().	Rasmus Munk Larsen	2020-03-19
\|
*	Add absolute_difference coefficient-wise binary Array function	Joel Holdsworth	2020-03-19
\|
*	updates based on PR feedback	Deven Desai	2018-06-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC \|\| EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH \|\| EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
*	Adding support for using Eigen in HIP kernels.	Deven Desai	2018-06-06
\| \| \| \| \| \| \| \| \|	This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
*	Factories code between numext::hypot and scalar_hyot_op functor.	Gael Guennebaud	2018-04-04
\|
*	Adding EIGEN_DEVICE_FUNC in the Geometry module.	Robert Lukierski	2016-10-12
\| \| \| \| \|	Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).
*	bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵	Gael Guennebaud	2016-09-08
\| \| \| \|	some specializations in arch/SSE and arch/AVX)
*	bug #1232: refactor special functions as a new SpecialFunctions module, ↵	Gael Guennebaud	2016-07-08
\| \| \| \|	currently in unsupported/.
*	Cleanup unused functors.	Gael Guennebaud	2016-06-14
\|
*	Generalize expr.pow(scalar), pow(expr,scalar) and pow(scalar,expr).	Gael Guennebaud	2016-06-14
\| \| \| \|	Internal: scalar_pow_op (unary) is removed, and scalar_binary_pow_op is renamed scalar_pow_op.
*	Implement expr+scalar, scalar+expr, expr-scalar, and scalar-expr as binary ↵	Gael Guennebaud	2016-06-14
\| \| \| \| \| \|	expressions, and generalize supported scalar types. The following functors are now deprecated: scalar_add_op, scalar_sub_op, and scalar_rsub_op.
*	Add unittesting plugins to scalar_product_op and scalar_quotient_op to help ↵	Gael Guennebaud	2016-06-14
\| \| \| \|	chaking that types are properly propagated.
*	Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ↵	Gael Guennebaud	2016-06-13
\| \| \| \|	ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
*	Enable mixing types in numext::pow	Gael Guennebaud	2016-06-10
\|
*	Big 279: enable mixing types for comparisons, min, and max.	Gael Guennebaud	2016-06-10
\|
*	Relax mixing-type constraints for binary coefficient-wise operators:	Gael Guennebaud	2016-06-06
\| \| \| \| \| \| \| \| \| \|	- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: ,/,+,-,=,=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
*	Add TernaryFunctors and the betainc SpecialFunction.	Eugene Brevdo	2016-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments.
*	Roll back changes to core. Move include of TensorFunctors.h up to satisfy ↵	Rasmus Munk Larsen	2016-05-17
\| \| \| \|	dependence in TensorCostModel.h.
*	Improvements to parallelFor.	Rasmus Munk Larsen	2016-05-12
\| \| \| \|	Move some scalar functors from TensorFunctors. to Eigen core.
*	Added support for exclusive or	Benoit Steiner	2016-04-14
\|
*	Improved the cost estimate of the quotient op	Benoit Steiner	2016-03-25
\|
*	Started to model the cost of divisions more accurately.	Benoit Steiner	2016-03-25
\|
*	Resolve bad merge.	Eugene Brevdo	2016-03-08
\|
*	Made it possible to leverage several binary functor in a CUDA kernel	Benoit Steiner	2015-12-02
\| \| \| \|	Explicitely specified the return type of the various scalar_cmp_op functors.
*	Allow the vectorized version of the Binary and the Nullary functors to run ↵	Benoit Steiner	2015-11-11
\| \| \| \|	on GPU
*	Reimplement the tensor comparison operators by using the scalar_cmp_op ↵	Benoit Steiner	2015-11-06
\| \| \| \|	functors. This makes them more cuda friendly.
*	Some functors were not generic wrt packet-type.	Gael Guennebaud	2015-08-07
\|
*	Add special path for matrix<complex>/real.	Gael Guennebaud	2015-06-26
\| \| \| \|	This also fixes underflow issues when scaling complex matrices through complex/complex operator.
*	bug #872: Avoid deprecated binder1st/binder2nd usage by providing custom ↵	Christoph Hertzberg	2015-05-07
\| \| \| \|	functors for comparison operators
*	bug #701: workaround (min) and (max) blocking ADL by introducing ↵	Gael Guennebaud	2014-10-20
\| \| \| \|	numext::mini and numext::maxi internal functions and a EIGEN_NOT_A_MACRO macro.
*	Fix hypot() and hypotNorm() wrt NaN and INF values.	Gael Guennebaud	2014-09-02
\|
*	Split the huge Functors.h file	Gael Guennebaud	2013-11-06