| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
That was hurting users with compilers that would object to proceed with
that:
"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
LhsPacketSize = Traits::LhsPacketSize,
^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
|
|
|
|
|
|
| |
We take advantage of smaller SIMD registers as well, in that case.
Gains up to 3x for select input sizes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
|
|
|
|
| |
Found using `codespell` and `grep` from downstream FreeCAD
|
| |
|
|
|
|
|
| |
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
|
|
|
|
|
|
|
|
|
|
| |
performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
|
|
|
|
|
|
|
|
| |
conversions.
This fixes "conversion from pointer to same-sized integral type" warnings by ICC.
Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only,
let's be safe.
|
|
|
|
| |
parameter, and add a first_default_aligned variante calling first_aligned with the requirement of the largest packet for the given scalar type.
|
|
|
|
| |
user-controllable EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES macros. This changeset also removes EIGEN_ALIGN (replaced by EIGEN_MAX_ALIGN_BYTES>0), EIGEN_ALIGN_STATICALLY (replaced by EIGEN_MAX_STATIC_ALIGN_BYTES>0), EIGEN_USER_ALIGN*, EIGEN_ALIGN_DEFAULT (replaced by EIGEN_ALIGN_MAX).
|
|
|
|
| |
work correctly even when the input coefficients aren't aligned.
|
| |
|
|
|
|
| |
better than no vectorization)
|
|\ |
|
| |
| |
| |
| | |
warnings. This also fix the issue of the previous changeset in a much nicer way.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
boolean)
Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned
Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled.
Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
computation, only to assemble unaligned packets from aligned packet loads)
(transplanted from 221f54698c2f6690da8c0f44c1e31e55118dedab
)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* * *
License disclaimer changed to BSD license for MKL_support.h
* * *
Pardiso support fixed, test added.
blas/lapack tests fixed: Scalar parameter was added in Cholesky, product_matrix_vector_triangular remaned to triangular_matrix_vector_product.
* * *
PARDISO test was added physically.
|
|
|
|
|
|
|
| |
default windows.h
(transplanted from 49b6e9143e1d74441924c0c313536e263e12a55c
)
|
|
|
|
|
|
|
| |
Renamed meta_{true|false} to {true|false}_type, meta_if to conditional, is_same_type to is_same, un{ref|pointer|const} to remove_{reference|pointer|const} and makeconst to add_const.
Changed boolean type 'ret' member to 'value'.
Changed 'ret' members refering to types to 'type'.
Adapted all code occurences.
|
| |
|
|
|
|
| |
should return CoeffReturnType, not "Scalar", if the expression is potentially a Lvalue.
|
|\ |
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
- improve support of colmajor by vector and matrix - matrix
- now all configurations are well handled, but the perf are not always very good
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
* pload* and pset1 are now templated on the packet type
* gemv routines are now embeded into a structure with
a consistent API with respect to gemm
* some configurations of vector * matrix and matrix * matrix works fine,
some need more work...
|
|/ |
|
|
|
|
| |
* fix weird compilation error when constructing a matrix with a row by matrix product
|
| |
|
| |
|
|
|
|
| |
* vectorize complex<double>
|
| |
|
| |
|
|
|
|
| |
As discussed on the list (too long to explain here).
|
|
|
|
|
|
|
| |
sizeof(Scalar), and that assumption breaks with double on linux x86-32.
* Rename ei_alignmentOffset to ei_first_aligned
* Rewrite its documentation and part of its body
* The variant taking a MatrixBase doesn't need a separate size argument.
|
|
|
|
| |
aligned on the scalar type size (e.g., for double on 32 bits system without -malign-double)
|