Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix more enum arithmetic. | Rasmus Munk Larsen | 2021-06-15 |
| | |||
* | Make iterators default constructible and assignable, by making... | Christoph Hertzberg | 2021-04-09 |
| | |||
* | Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵ | Steve Bronder | 2021-03-24 |
| | | | | | | innerStride(), outerStride(), and size()"" This reverts commit 5f0b4a4010af4cbf6161a0d1a03a747addc44a5d. | ||
* | Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), ↵ | David Tellenbach | 2021-03-05 |
| | | | | | | | innerStride(), outerStride(), and size()" This reverts commit 6cbb3038ac48cb5fe17eba4dfbf26e3e798041f1 because it breaks clang-10 builds on x86 and aarch64 when C++11 is enabled. | ||
* | Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), ↵ | Steve Bronder | 2021-03-04 |
| | | | | outerStride(), and size() | ||
* | Fixed/masked more implicit copy constructor warnings | Christoph Hertzberg | 2021-02-27 |
| | | | | (cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992) | ||
* | Revert "avoid selecting half-packets when unnecessary" | Rasmus Munk Larsen | 2020-02-25 |
| | | | This reverts commit 5ca10480b0756e40b0723d90adeba8506291fc7c | ||
* | Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE" | Rasmus Munk Larsen | 2020-02-25 |
| | | | This reverts commit 44df2109c8c700222643a9a45f144676348f4df1 | ||
* | Revert "do not pick full-packet if it'd result in more operations" | Rasmus Munk Larsen | 2020-02-25 |
| | | | This reverts commit e9cc0cd353803a818204e48054bd89699b84e6c6 | ||
* | do not pick full-packet if it'd result in more operations | Francesco Mazzoli | 2020-02-07 |
| | | | | | See comment and <https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>. | ||
* | Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE | Francesco Mazzoli | 2020-02-07 |
| | | | | See comment for details. | ||
* | avoid selecting half-packets when unnecessary | Francesco Mazzoli | 2020-02-07 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | See <https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation> for an explanation of the problem this solves. In short, for some reason, before this commit the half-packet is selected when the array / matrix size is not a multiple of `unpacket_traits<PacketType>::size`, where `PacketType` starts out being the full Packet. For example, for some data of 100 `float`s, `Packet4f` will be selected rather than `Packet8f`, because 100 is not a multiple of 8, the size of `Packet8f`. This commit switches to selecting the half-packet if the size is less than the packet size, which seems to make more sense. As I stated in the SO post I'm not sure that I'm understanding the issue correctly, but this fix resolves the issue in my program. Moreover, `make check` passes, with the exception of line 614 and 616 in `test/packetmath.cpp`, which however also fail on master on my machine: CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0); ... CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1); | ||
* | Bug #1788: Fix rule-of-three violations inside the stable modules. | Christoph Hertzberg | 2019-12-19 |
| | | | | | This fixes deprecated-copy warnings when compiling with GCC>=9 Also protect some additional Base-constructors from getting called by user code code (#1587) | ||
* | Make is_valid_index_type return false for float and double when ↵ | Rasmus Munk Larsen | 2019-06-05 |
| | | | | EIGEN_HAS_TYPE_TRAITS is off. | ||
* | Add masked_store_available to unpacket_traits | Eugene Zhulenev | 2019-05-02 |
| | |||
* | Adding lowlevel APIs for optimized RHS packet load in TensorFlow | Anuj Rawat | 2019-04-20 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SpatialConvolution Low-level APIs are added in order to optimized packet load in gemm_pack_rhs in TensorFlow SpatialConvolution. The optimization is for scenario when a packet is split across 2 adjacent columns. In this case we read it as two 'partial' packets and then merge these into 1. Currently this only works for Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other packet types (such as Packet8d) also. This optimization shows significant speedup in SpatialConvolution with certain parameters. Some examples are below. Benchmark parameters are specified as: Batch size, Input dim, Depth, Num of filters, Filter dim Speedup numbers are specified for number of threads 1, 2, 4, 8, 16. AVX512: Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------|------------------------------------------ 128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X 128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X 32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X 128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X 32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X 128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X AVX2: Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16) ----------------------------|------------------------------------------ 128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X 32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X 128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X 128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X 128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X In the higher level benchmark cifar10, we observe a runtime improvement of around 6% for AVX512 on Intel Skylake server (8 cores). On lower level PackRhs micro-benchmarks specified in TensorFlow tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe the following runtime numbers: AVX512: Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup ---------------------------------------------------------------|----------------------------|-------------------------|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X AVX2: Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup ---------------------------------------------------------------|----------------------------|-------------------------|--------- BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X All benchmarks on Intel Skylake server with 8 cores. | ||
* | Introducing "vectorized" byte on unpacket_traits structs | Gustavo Lima Chaves | 2018-12-19 |
| | | | | | | | | | | | | | | | | | | | | | This is a preparation to a change on gebp_traits, where a new template argument will be introduced to dictate the packet size, so it won't be bound to the current/max packet size only anymore. By having packet types defined early on gebp_traits, one has now to act on packet types, not scalars anymore, for the enum values defined on that class. One approach for reaching the vectorizable/size properties one needs there could be getting the packet's scalar again with unpacket_traits<>, then the size/Vectorizable enum entries from packet_traits<>. It turns out guards like "#ifndef EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet variations of packet_traits<> for some types (and it makes sense to keep that). In other words, one can't go back to the scalar and create a new PacketType, as this will always lead to the maximum packet type for the architecture. The less costly/invasive solution for that, thus, is to add the vectorizable info on every unpacket_traits struct as well. | ||
* | PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type | Justin Carpentier | 2018-08-10 |
| | | | | The type should be RowMajor | ||
* | Add internall::is_identity compile-time helper | Gael Guennebaud | 2018-07-11 |
| | |||
* | Introduce the macro ei_declare_local_nested_eval to help allocating on the ↵ | Gael Guennebaud | 2018-07-09 |
| | | | | | | stack local temporaries via alloca, and let outer-products makes a good use of it. If successful, we should use it everywhere nested_eval is used to declare local dense temporaries. | ||
* | Extend CUDA support to matrix inversion and selfadjointeigensolver | Andrea Bocci | 2018-06-11 |
| | |||
* | Make is_same_dense compatible with different scalar types. | Gael Guennebaud | 2018-07-03 |
| | |||
* | fix stupid typo | Gael Guennebaud | 2018-05-18 |
| | |||
* | is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's ↵ | Gael Guennebaud | 2018-05-18 |
| | | | | rather use __is_enum(T) for old MSVC versions | ||
* | add some internal checks | Gael Guennebaud | 2018-05-18 |
| | |||
* | std::integral_constant is not C++03 compatible | Christoph Hertzberg | 2017-09-14 |
| | |||
* | Fix compilation of Vector::operator()(enum) by treating enums as Index | Gael Guennebaud | 2017-09-07 |
| | |||
* | Merged in ggael/eigen-flexidexing (pull request PR-294) | Gael Guennebaud | 2017-01-26 |
|\ | | | | | | | generalized operator() for indexed access and slicing | ||
* | | bug #1381: fix sparse.diagonal() used as a rvalue. | Gael Guennebaud | 2017-01-25 |
| | | | | | | | | | | | | | | | | The problem was that is "sparse" is not const, then sparse.diagonal() must have the LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference, const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing zero coefficient. The trick is to return a reference to a local member of evaluator<SparseMatrix>. | ||
| * | Make variable_if_dynamic<T> implicitely convertible to T | Gael Guennebaud | 2017-01-11 |
|/ | |||
* | Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage ↵ | Gael Guennebaud | 2016-12-27 |
| | | | | order so that methods like .outerSize()/.innerSize() work properly. | ||
* | Harmless typo | Gael Guennebaud | 2016-12-27 |
| | |||
* | Fix a performance regression in (mat*mat)*vec for which mat*mat was ↵ | Gael Guennebaud | 2016-11-30 |
| | | | | evaluated multiple times. | ||
* | bug #1328: workaround a compilation issue with gcc 4.2 | Gael Guennebaud | 2016-10-20 |
| | |||
* | Improve cost estimation of complex division | Gael Guennebaud | 2016-09-21 |
| | |||
* | Fix compilation on 32 bits systems. | Gael Guennebaud | 2016-09-09 |
| | |||
* | bug #1195: move NumTraits::Div<>::Cost to internal::scalar_div_cost (with ↵ | Gael Guennebaud | 2016-09-08 |
| | | | | some specializations in arch/SSE and arch/AVX) | ||
* | Generalize ScalarBinaryOpTraits to any complex-real combination as defined ↵ | Gael Guennebaud | 2016-09-06 |
| | | | | by NumTraits (instead of supporting std::complex only). | ||
* | bug #1286: automatically detect the available prototypes of functors passed ↵ | Gael Guennebaud | 2016-08-31 |
| | | | | | | | | | to CwiseNullaryExpr such that functors have only to implement the operators that matters among: operator()() operator()(i) operator()(i,j) Linear access is also automatically detected based on the availability of operator()(i,j). | ||
* | Simplify ScalarBinaryOpTraits by removing the Defined enum, and extend its ↵ | Gael Guennebaud | 2016-07-20 |
| | | | | documentation. | ||
* | Make scalar_product_op the default (instead of void) | Gael Guennebaud | 2016-07-18 |
| | |||
* | Fix missing specialization. | Gael Guennebaud | 2016-06-24 |
| | |||
* | Relax promote_scalar_arg logic to enable promotion to Expr::Scalar if ↵ | Gael Guennebaud | 2016-06-24 |
| | | | | | | | conversion to Expr::Literal fails. This is useful to cancel expression template at the scalar level, e.g. with AutoDiff<AutoDiff<>>. This patch also defers calls to NumTraits in cases for which types are not directly compatible. | ||
* | Introduce a NumTraits<T>::Literal type to be used for literals, and | Gael Guennebaud | 2016-06-23 |
| | | | | | | | improve mixing type support in operations between arrays and scalars: - 2 * ArrayXcf is now optimized in the sense that the integer 2 is properly promoted to a float instead of a complex<float> (fix a regression) - 2.1 * ArrayXi is now forbiden (previously, 2.1 was converted to 2) - This mechanism should be applicable to any custom scalar type, assuming NumTraits<T>::Literal is properly defined (it defaults to T) | ||
* | Implement scalar multiples and division by a scalar as a binary-expression ↵ | Gael Guennebaud | 2016-06-14 |
| | | | | | | | | | | | | with a constant expression. This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages: - it makes it clear on each side the scalar is applied, - it clearly reflects that we are dealing with a binary-expression, - the complexity of the type is hidden through macros defined at the end of Macros.h, - distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions) - "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr" - scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor) | ||
* | Clean handling for void type in EIGEN_CHECK_BINARY_COMPATIBILIY | Gael Guennebaud | 2016-06-06 |
| | |||
* | Relax mixing-type constraints for binary coefficient-wise operators: | Gael Guennebaud | 2016-06-06 |
| | | | | | | | | | | - Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits | ||
* | Remove dead code. | Gael Guennebaud | 2016-06-02 |
| | |||
* | Implement generic scalar*expr and expr*scalar operator based on ↵ | Gael Guennebaud | 2016-06-02 |
| | | | | | | scalar_product_traits. This is especially useful for custom scalar types, e.g., to enable float*expr<multi_prec> without conversion. | ||
* | bug #1181: help MSVC inlining. | Gael Guennebaud | 2016-05-31 |
| |