aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/functors/BinaryFunctors.h
Commit message (Collapse)AuthorAge
* Fix c++20 warnings about using enums in arithmetic expressions.Gravatar Rasmus Munk Larsen2021-06-10
|
* Remove unused function scalar_cmp_with_cast.Gravatar Rasmus Munk Larsen2021-02-24
|
* Disable vectorized pow for half/bfloat16.Gravatar Antonio Sanchez2021-02-05
| | | | | | | | | We are potentially seeing some accuracy issues with these. Ideally we would hand off to `float`, but that's not trivial with the current setup. We may want to consider adding `ppow<Packet>` and `HasPow`, so implementations can more easily specialize this.
* Vectorize `pow(x, y)`. This closes ↵Gravatar Rasmus Munk Larsen2021-01-18
| | | | | | | | | | | | | | | | | | | | | | https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm. I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics: ``` max_rel_error = 8.34405e-07 rms_rel_error = 2.76654e-07 ``` If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`: ``` max_rel_error = 0.666667 rms = 6.8727e-05 count = 1335165689 argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45 ``` which seems reasonable, since these results are subnormals with only couple of significant bits left.
* Add packet generic ops `predux_fmin`, `predux_fmin_nan`, `predux_fmax`, and ↵Gravatar Rasmus Munk Larsen2020-10-13
| | | | `predux_fmax_nan` that implement reductions with `PropagateNaN`, and `PropagateNumbers` semantics. Add (slow) generic implementations for most reductions.
* Don't make assumptions about NaN-propagation for pmin/pmax - it various ↵Gravatar Rasmus Munk Larsen2020-10-07
| | | | | | across platforms. Change test to only test for NaN-propagation for pfmin/pfmax.
* BF16 for scalar_cmp_with_cast_opGravatar Sheng Yang2020-07-01
|
* Implement scalar_cmp_with_cast_opGravatar ShengYang12020-06-09
|
* Extend support for Packet16b:Gravatar Rasmus Munk Larsen2020-04-28
| | | | | | | | | | | | | | | | | * Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool> * work around a bug in slicing of Tensor<bool>. * Add tensor tests This speeds up matmul for boolean matrices by about 10x name old time/op new time/op delta BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5) BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5) BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5) BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5) BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5) BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5) BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
* Add partial vectorization for matrices and tensors of bool. This speeds up ↵Gravatar Rasmus Munk Larsen2020-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | boolean operations on Tensors by up to 25x. Benchmark numbers for the logical and of two NxN tensors: name old time/op new time/op delta BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96% BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07% BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87% BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59% BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87% BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45% BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57% BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83% BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01% BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93% BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11% BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31% BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35% BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07% BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08% BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55% BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
* Add missing arguments to numext::absdiff().Gravatar Rasmus Munk Larsen2020-03-19
|
* Add absolute_difference coefficient-wise binary Array functionGravatar Joel Holdsworth2020-03-19
|
* updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
* Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
| | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
* Factories code between numext::hypot and scalar_hyot_op functor.Gravatar Gael Guennebaud2018-04-04
|
* Adding EIGEN_DEVICE_FUNC in the Geometry module.Gravatar Robert Lukierski2016-10-12
| | | | | Additional CUDA necessary fixes in the Core (mostly usage of EIGEN_USING_STD_MATH).
* bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵Gravatar Gael Guennebaud2016-09-08
| | | | some specializations in arch/SSE and arch/AVX)
* bug #1232: refactor special functions as a new SpecialFunctions module, ↵Gravatar Gael Guennebaud2016-07-08
| | | | currently in unsupported/.
* Cleanup unused functors.Gravatar Gael Guennebaud2016-06-14
|
* Generalize expr.pow(scalar), pow(expr,scalar) and pow(scalar,expr).Gravatar Gael Guennebaud2016-06-14
| | | | Internal: scalar_pow_op (unary) is removed, and scalar_binary_pow_op is renamed scalar_pow_op.
* Implement expr+scalar, scalar+expr, expr-scalar, and scalar-expr as binary ↵Gravatar Gael Guennebaud2016-06-14
| | | | | | expressions, and generalize supported scalar types. The following functors are now deprecated: scalar_add_op, scalar_sub_op, and scalar_rsub_op.
* Add unittesting plugins to scalar_product_op and scalar_quotient_op to help ↵Gravatar Gael Guennebaud2016-06-14
| | | | chaking that types are properly propagated.
* Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ↵Gravatar Gael Guennebaud2016-06-13
| | | | ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
* Enable mixing types in numext::powGravatar Gael Guennebaud2016-06-10
|
* Big 279: enable mixing types for comparisons, min, and max.Gravatar Gael Guennebaud2016-06-10
|
* Relax mixing-type constraints for binary coefficient-wise operators:Gravatar Gael Guennebaud2016-06-06
| | | | | | | | | | - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
* Add TernaryFunctors and the betainc SpecialFunction.Gravatar Eugene Brevdo2016-06-02
| | | | | | | | | | | | | | | | | | | TernaryFunctors and their executors allow operations on 3-tuples of inputs. API fully implemented for Arrays and Tensors based on binary functors. Ported the cephes betainc function (regularized incomplete beta integral) to Eigen, with support for CPU and GPU, floats, doubles, and half types. Added unit tests in array.cpp and cxx11_tensor_cuda.cu Collapsed revision * Merged helper methods for betainc across floats and doubles. * Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase. * Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper. * betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup. * Update TernaryOp and SpecialFunctions (betainc) based on review comments.
* Roll back changes to core. Move include of TensorFunctors.h up to satisfy ↵Gravatar Rasmus Munk Larsen2016-05-17
| | | | dependence in TensorCostModel.h.
* Improvements to parallelFor.Gravatar Rasmus Munk Larsen2016-05-12
| | | | Move some scalar functors from TensorFunctors. to Eigen core.
* Added support for exclusive orGravatar Benoit Steiner2016-04-14
|
* Improved the cost estimate of the quotient opGravatar Benoit Steiner2016-03-25
|
* Started to model the cost of divisions more accurately.Gravatar Benoit Steiner2016-03-25
|
* Resolve bad merge.Gravatar Eugene Brevdo2016-03-08
|
* Made it possible to leverage several binary functor in a CUDA kernelGravatar Benoit Steiner2015-12-02
| | | | Explicitely specified the return type of the various scalar_cmp_op functors.
* Allow the vectorized version of the Binary and the Nullary functors to run ↵Gravatar Benoit Steiner2015-11-11
| | | | on GPU
* Reimplement the tensor comparison operators by using the scalar_cmp_op ↵Gravatar Benoit Steiner2015-11-06
| | | | functors. This makes them more cuda friendly.
* Some functors were not generic wrt packet-type.Gravatar Gael Guennebaud2015-08-07
|
* Add special path for matrix<complex>/real.Gravatar Gael Guennebaud2015-06-26
| | | | This also fixes underflow issues when scaling complex matrices through complex/complex operator.
* bug #872: Avoid deprecated binder1st/binder2nd usage by providing custom ↵Gravatar Christoph Hertzberg2015-05-07
| | | | functors for comparison operators
* bug #701: workaround (min) and (max) blocking ADL by introducing ↵Gravatar Gael Guennebaud2014-10-20
| | | | numext::mini and numext::maxi internal functions and a EIGEN_NOT_A_MACRO macro.
* Fix hypot() and hypotNorm() wrt NaN and INF values.Gravatar Gael Guennebaud2014-09-02
|
* Split the huge Functors.h fileGravatar Gael Guennebaud2013-11-06