aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/ProductEvaluators.h
Commit message (Collapse)AuthorAge
* Fix c++20 warnings about using enums in arithmetic expressions.Gravatar Rasmus Munk Larsen2021-06-10
|
* Cast anonymous enums to int when used in expressions.Gravatar Rasmus Munk Larsen2021-02-24
|
* Eliminate boolean product warnings by factoring out aGravatar Christoph Hertzberg2021-01-05
| | | `combine_scalar_factors` helper function.
* add EIGEN_DEVICE_FUNC to methodsGravatar acxz2020-12-01
|
* Simplify expression for inner product fallback in Gemv product evaluator.Gravatar Rasmus Munk Larsen2020-11-12
|
* Optimize matrix*matrix and matrix*vector products when they correspond to ↵Gravatar Rasmus Munk Larsen2020-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inner products at runtime. This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k). name old time/op new time/op delta BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~ BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~ BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66% BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~ BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26% BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~ BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~ BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~ BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35% BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00% BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57% BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52% BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72% BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89% BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~ BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~ BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91% BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~ BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57% BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07% BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11% BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23% BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~ BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~
* Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing ↵Gravatar Masaki Murooka2020-03-20
| | | | EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
* Fix regression: .conjugate() was popped out but not re-introduced.Gravatar Gael Guennebaud2019-02-18
|
* GEMM: catch all scalar-multiple variants when falling-back to a coeff-based ↵Gravatar Gael Guennebaud2019-02-18
| | | | | | | product. Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal, and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
* bug #1680: improve MSVC inlining by declaring many triavial constructors and ↵Gravatar Gael Guennebaud2019-02-15
| | | | accessors as STRONG_INLINE.
* PR 526: Speed up multiplication of small, dynamically sized matricesGravatar Mark D Ryan2018-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Packet16f, Packet8f and Packet8d types are too large to use with dynamically sized matrices typically processed by the SliceVectorizedTraversal specialization of the dense_assignment_loop. Using these types is likely to lead to little or no vectorization. Significant slowdown in the multiplication of these small matrices can be observed when building with AVX and AVX512 enabled. This patch introduces a new dense_assignment_kernel that is used when computing small products whose operands have dynamic dimensions. It ensures that the PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used when computing the product. I tested all 969 possible combinations of M, K, and N that are handled by the dense_assignment_loop on x86 builds. Although a few combinations are slowed down by this patch they are far outnumbered by the cases that are sped up, as the following results demonstrate. Disabling Packed8d on AVX512 builds: Total Cases: 969 Better: 511 Worse: 85 Same: 373 Max Improvement: 169.00% (4 8 6) Max Degradation: 36.50% (8 5 3) Median Improvement: 35.46% Median Degradation: 17.41% Total FLOPs Improvement: 19.42% Disabling Packet16f and Packed8f on AVX512 builds: Total Cases: 969 Better: 658 Worse: 5 Same: 306 Max Improvement: 214.05% (8 6 5) Max Degradation: 22.26% (16 2 1) Median Improvement: 60.05% Median Degradation: 13.32% Total FLOPs Improvement: 59.58% Disabling Packed8f on AVX builds: Total Cases: 969 Better: 663 Worse: 96 Same: 210 Max Improvement: 155.29% (4 10 5) Max Degradation: 35.12% (8 3 2) Median Improvement: 34.28% Median Degradation: 15.05% Total FLOPs Improvement: 26.02%
* Fix logic in diagonal*dense product in a corner case.Gravatar Gael Guennebaud2018-09-22
| | | | The problem was for: diag(1x1) * mat(1,n)
* Fix doxy and misc. typosGravatar luz.paz"2018-08-01
| | | | | | | | | | | | | | | | Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` --- Eigen/src/Core/ProductEvaluators.h | 4 ++-- Eigen/src/Core/arch/GPU/Half.h | 2 +- Eigen/src/Core/util/Memory.h | 2 +- Eigen/src/Geometry/Hyperplane.h | 2 +- Eigen/src/Geometry/Transform.h | 2 +- Eigen/src/Geometry/Translation.h | 12 ++++++------ doc/PreprocessorDirectives.dox | 2 +- doc/TutorialGeometry.dox | 2 +- test/boostmultiprec.cpp | 2 +- test/triangular.cpp | 2 +- 10 files changed, 16 insertions(+), 16 deletions(-)
* merging updates from upstreamGravatar Deven Desai2018-07-11
|\
| * Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵Gravatar Gael Guennebaud2018-07-09
| | | | | | | | | | | | stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
* | updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
* | syncing this fork with upstreamGravatar Deven Desai2018-06-13
|\ \
| | * Extend CUDA support to matrix inversion and selfadjointeigensolverGravatar Andrea Bocci2018-06-11
| | |
| | * bug #1562: optimize evaluation of small products of the form s*A*B by ↵Gravatar Gael Guennebaud2018-07-02
| | | | | | | | | | | | rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x...
| | * bug #1560 fix product with a 1x1 diagonal matrixGravatar Gael Guennebaud2018-06-25
| |/
| * Missing line during manual rebase of PR-374Gravatar Christoph Hertzberg2018-06-07
| |
* | Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
| | | | | | | | | | | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
| * Adding EIGEN_DEVICE_FUNC to Products, especially Dense2Dense AssignmentGravatar Robert Lukierski2018-03-14
|/ | | | | specializations. Otherwise causes problems with small fixed size matrix multiplication (call to 0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
* Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small ↵Gravatar Basil Fierz2017-10-26
| | | | | | vector calculations When working with MSVC often small vector operations are not properly inlined. This behaviour is observed even on the most recent compiler versions.
* Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH ↵Gravatar Gael Guennebaud2017-07-17
| | | | aliases
* bug #1435: fix aliasing issue in exressions like: A = C - B*A;Gravatar Gael Guennebaud2017-06-08
|
* Operators += and -= do not resize!Gravatar Gael Guennebaud2016-12-02
|
* Fix a performance regression in (mat*mat)*vec for which mat*mat was ↵Gravatar Gael Guennebaud2016-11-30
| | | | evaluated multiple times.
* Fix regression in X = (X*X.transpose())/s with X rectangular by deferring ↵Gravatar Gael Guennebaud2016-10-26
| | | | resizing of the destination after the creation of the evaluator of the source expression.
* Fix ICC warningsGravatar Gael Guennebaud2016-10-25
|
* Use explicit type casting to generate packets of zeros.Gravatar Benoit Steiner2016-10-04
|
* bug #1308: fix compilation of some small products involving nullary-expressions.Gravatar Gael Guennebaud2016-09-29
|
* Add debug info.Gravatar Gael Guennebaud2016-09-26
|
* bug #1311: fix alignment logic in some cases of ↵Gravatar Gael Guennebaud2016-09-26
| | | | (scalar*small).lazyProduct(small)
* bug #1308: fix compilation of vector * rowvector::nullary.Gravatar Gael Guennebaud2016-09-25
|
* bug #1283: quick fix for products involving uncommon general block access to ↵Gravatar Gael Guennebaud2016-08-31
| | | | vectors.
* Optimize expression matching "d?=a-b*c" as "d?=a; d?=b*c;"Gravatar Gael Guennebaud2016-08-23
|
* Fix vectorization logic for coeff-based product for some corner cases.Gravatar Gael Guennebaud2016-07-31
|
* Vectorize more small product expressions by letting the general assignement ↵Gravatar Gael Guennebaud2016-07-28
| | | | logic decides on the sizes that are OK for vectorization.
* Allows the compiler to inline outer products (the change from default to ↵Gravatar Gael Guennebaud2016-07-22
| | | | | | dont-inline in changeset 737bed19c1fdb01568706bca19666531dda681a7 was not motivated)
* Re-enable some specializations for Assignment<.,Product<>>Gravatar Gael Guennebaud2016-07-05
|
* Fix template resolution.Gravatar Gael Guennebaud2016-07-04
|
* Implement scalar multiples and division by a scalar as a binary-expression ↵Gravatar Gael Guennebaud2016-06-14
| | | | | | | | | | | | with a constant expression. This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages: - it makes it clear on each side the scalar is applied, - it clearly reflects that we are dealing with a binary-expression, - the complexity of the type is hidden through macros defined at the end of Macros.h, - distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions) - "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr" - scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
* Disable shortcuts for res ?= prod when the scalar types do not match exactly.Gravatar Gael Guennebaud2016-06-06
|
* Relax mixing-type constraints for binary coefficient-wise operators:Gravatar Gael Guennebaud2016-06-06
| | | | | | | | | | - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
* bug #1181: help MSVC inlining.Gravatar Gael Guennebaud2016-05-31
|
* Fix static/inline order.Gravatar Gael Guennebaud2016-05-25
|
* bug #1207: Add and fix logical-op warningsGravatar Christoph Hertzberg2016-05-11
|
* Make use of is_same_dense helper instead of extract_data to detect ↵Gravatar Gael Guennebaud2016-04-13
| | | | input/outputs are the same.
* Fix incomplete previous patch on matrix comparision.Gravatar Gael Guennebaud2016-04-13
|