aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/util/XprHelper.h
Commit message (Collapse)AuthorAge
* Fix more enum arithmetic.Gravatar Rasmus Munk Larsen2021-06-15
|
* Make iterators default constructible and assignable, by making...Gravatar Christoph Hertzberg2021-04-09
|
* Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar Steve Bronder2021-03-24
| | | | | | innerStride(), outerStride(), and size()"" This reverts commit 5f0b4a4010af4cbf6161a0d1a03a747addc44a5d.
* Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵Gravatar David Tellenbach2021-03-05
| | | | | | | innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
* Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵Gravatar Steve Bronder2021-03-04
| | | | outerStride(), and size()
* Fixed/masked more implicit copy constructor warningsGravatar Christoph Hertzberg2021-02-27
| | | | (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
* Revert "avoid selecting half-packets when unnecessary"Gravatar Rasmus Munk Larsen2020-02-25
| | | This reverts commit 5ca10480b0756e40b0723d90adeba8506291fc7c
* Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"Gravatar Rasmus Munk Larsen2020-02-25
| | | This reverts commit 44df2109c8c700222643a9a45f144676348f4df1
* Revert "do not pick full-packet if it'd result in more operations"Gravatar Rasmus Munk Larsen2020-02-25
| | | This reverts commit e9cc0cd353803a818204e48054bd89699b84e6c6
* do not pick full-packet if it'd result in more operationsGravatar Francesco Mazzoli2020-02-07
| | | | | See comment and <https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
* Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZEGravatar Francesco Mazzoli2020-02-07
| | | | See comment for details.
* avoid selecting half-packets when unnecessaryGravatar Francesco Mazzoli2020-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | See <https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation> for an explanation of the problem this solves. In short, for some reason, before this commit the half-packet is selected when the array / matrix size is not a multiple of `unpacket_traits<PacketType>::size`, where `PacketType` starts out being the full Packet. For example, for some data of 100 `float`s, `Packet4f` will be selected rather than `Packet8f`, because 100 is not a multiple of 8, the size of `Packet8f`. This commit switches to selecting the half-packet if the size is less than the packet size, which seems to make more sense. As I stated in the SO post I'm not sure that I'm understanding the issue correctly, but this fix resolves the issue in my program. Moreover, `make check` passes, with the exception of line 614 and 616 in `test/packetmath.cpp`, which however also fail on master on my machine: CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0); ... CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
* Bug #1788: Fix rule-of-three violations inside the stable modules.Gravatar Christoph Hertzberg2019-12-19
| | | | | This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587)
* Make is_valid_index_type return false for float and double when ↵Gravatar Rasmus Munk Larsen2019-06-05
| | | | EIGEN_HAS_TYPE_TRAITS is off.
* Add masked_store_available to unpacket_traitsGravatar Eugene Zhulenev2019-05-02
|
* Adding lowlevel APIs for optimized RHS packet load in TensorFlowGravatar Anuj Rawat2019-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SpatialConvolution Low-level APIs are added in order to optimized packet load in gemm_pack_rhs in TensorFlow SpatialConvolution. The optimization is for scenario when a packet is split across 2 adjacent columns. In this case we read it as two 'partial' packets and then merge these into 1. Currently this only works for Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other packet types (such as Packet8d) also. This optimization shows significant speedup in SpatialConvolution with certain parameters. Some examples are below. Benchmark parameters are specified as: Batch size, Input dim, Depth, Num of filters, Filter dim Speedup numbers are specified for number of threads 1, 2, 4, 8, 16. AVX512: Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------|------------------------------------------ 128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X 128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X 32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X 128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X 32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X 128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X AVX2: Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------|------------------------------------------ 128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X 32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X 128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X 128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X 128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X In the higher level benchmark cifar10, we observe a runtime improvement of around 6% for AVX512 on Intel Skylake server (8 cores). On lower level PackRhs micro-benchmarks specified in TensorFlow tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe the following runtime numbers: AVX512: Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup ---------------------------------------------------------------|----------------------------|-------------------------|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X AVX2: Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup ---------------------------------------------------------------|----------------------------|-------------------------|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X All benchmarks on Intel Skylake server with 8 cores.
* Introducing "vectorized" byte on unpacket_traits structsGravatar Gustavo Lima Chaves2018-12-19
| | | | | | | | | | | | | | | | | | | | | This is a preparation to a change on gebp_traits, where a new template argument will be introduced to dictate the packet size, so it won't be bound to the current/max packet size only anymore. By having packet types defined early on gebp_traits, one has now to act on packet types, not scalars anymore, for the enum values defined on that class. One approach for reaching the vectorizable/size properties one needs there could be getting the packet's scalar again with unpacket_traits<>, then the size/Vectorizable enum entries from packet_traits<>. It turns out guards like "#ifndef EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet variations of packet_traits<> for some types (and it makes sense to keep that). In other words, one can't go back to the scalar and create a new PacketType, as this will always lead to the maximum packet type for the architecture. The less costly/invasive solution for that, thus, is to add the vectorizable info on every unpacket_traits struct as well.
* PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::typeGravatar Justin Carpentier2018-08-10
| | | | The type should be RowMajor
* Add internall::is_identity compile-time helperGravatar Gael Guennebaud2018-07-11
|
* Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵Gravatar Gael Guennebaud2018-07-09
| | | | | | stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
* Extend CUDA support to matrix inversion and selfadjointeigensolverGravatar Andrea Bocci2018-06-11
|
* Make is_same_dense compatible with different scalar types.Gravatar Gael Guennebaud2018-07-03
|
* fix stupid typoGravatar Gael Guennebaud2018-05-18
|
* is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's ↵Gravatar Gael Guennebaud2018-05-18
| | | | rather use __is_enum(T) for old MSVC versions
* add some internal checksGravatar Gael Guennebaud2018-05-18
|
* std::integral_constant is not C++03 compatibleGravatar Christoph Hertzberg2017-09-14
|
* Fix compilation of Vector::operator()(enum) by treating enums as IndexGravatar Gael Guennebaud2017-09-07
|
* Merged in ggael/eigen-flexidexing (pull request PR-294)Gravatar Gael Guennebaud2017-01-26
|\ | | | | | | generalized operator() for indexed access and slicing
* | bug #1381: fix sparse.diagonal() used as a rvalue.Gravatar Gael Guennebaud2017-01-25
| | | | | | | | | | | | | | | | The problem was that is "sparse" is not const, then sparse.diagonal() must have the LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference, const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing zero coefficient. The trick is to return a reference to a local member of evaluator<SparseMatrix>.
| * Make variable_if_dynamic<T> implicitely convertible to TGravatar Gael Guennebaud2017-01-11
|/
* Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage ↵Gravatar Gael Guennebaud2016-12-27
| | | | order so that methods like .outerSize()/.innerSize() work properly.
* Harmless typoGravatar Gael Guennebaud2016-12-27
|
* Fix a performance regression in (mat*mat)*vec for which mat*mat was ↵Gravatar Gael Guennebaud2016-11-30
| | | | evaluated multiple times.
* bug #1328: workaround a compilation issue with gcc 4.2Gravatar Gael Guennebaud2016-10-20
|
* Improve cost estimation of complex divisionGravatar Gael Guennebaud2016-09-21
|
* Fix compilation on 32 bits systems.Gravatar Gael Guennebaud2016-09-09
|
* bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵Gravatar Gael Guennebaud2016-09-08
| | | | some specializations in arch/SSE and arch/AVX)
* Generalize ScalarBinaryOpTraits to any complex-real combination as defined ↵Gravatar Gael Guennebaud2016-09-06
| | | | by NumTraits (instead of supporting std::complex only).
* bug #1286: automatically detect the available prototypes of functors passed ↵Gravatar Gael Guennebaud2016-08-31
| | | | | | | | | to CwiseNullaryExpr such that functors have only to implement the operators that matters among: operator()() operator()(i) operator()(i,j) Linear access is also automatically detected based on the availability of operator()(i,j).
* Simplify ScalarBinaryOpTraits by removing the Defined enum, and extend its ↵Gravatar Gael Guennebaud2016-07-20
| | | | documentation.
* Make scalar_product_op the default (instead of void)Gravatar Gael Guennebaud2016-07-18
|
* Fix missing specialization.Gravatar Gael Guennebaud2016-06-24
|
* Relax promote_scalar_arg logic to enable promotion to Expr::Scalar if ↵Gravatar Gael Guennebaud2016-06-24
| | | | | | | conversion to Expr::Literal fails. This is useful to cancel expression template at the scalar level, e.g. with AutoDiff<AutoDiff<>>. This patch also defers calls to NumTraits in cases for which types are not directly compatible.
* Introduce a NumTraits<T>::Literal type to be used for literals, andGravatar Gael Guennebaud2016-06-23
| | | | | | | improve mixing type support in operations between arrays and scalars: - 2 * ArrayXcf is now optimized in the sense that the integer 2 is properly promoted to a float instead of a complex<float> (fix a regression) - 2.1 * ArrayXi is now forbiden (previously, 2.1 was converted to 2) - This mechanism should be applicable to any custom scalar type, assuming NumTraits<T>::Literal is properly defined (it defaults to T)
* Implement scalar multiples and division by a scalar as a binary-expression ↵Gravatar Gael Guennebaud2016-06-14
| | | | | | | | | | | | with a constant expression. This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages: - it makes it clear on each side the scalar is applied, - it clearly reflects that we are dealing with a binary-expression, - the complexity of the type is hidden through macros defined at the end of Macros.h, - distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions) - "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr" - scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
* Clean handling for void type in EIGEN_CHECK_BINARY_COMPATIBILIYGravatar Gael Guennebaud2016-06-06
|
* Relax mixing-type constraints for binary coefficient-wise operators:Gravatar Gael Guennebaud2016-06-06
| | | | | | | | | | - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
* Remove dead code.Gravatar Gael Guennebaud2016-06-02
|
* Implement generic scalar*expr and expr*scalar operator based on ↵Gravatar Gael Guennebaud2016-06-02
| | | | | | scalar_product_traits. This is especially useful for custom scalar types, e.g., to enable float*expr<multi_prec> without conversion.
* bug #1181: help MSVC inlining.Gravatar Gael Guennebaud2016-05-31
|