| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- added a MapBase base xpr on top of which Map and the specialization
of Block are implemented
- MapBase forces both aligned loads (and aligned stores, see below) in expressions
such as "x.block(...) += other_expr"
* Significant vectorization improvement:
- added a AlignedBit flag meaning the first coeff/packet is aligned,
this allows to not generate extra code to deal with the first unaligned part
- removed all unaligned stores when no unrolling
- removed unaligned loads in Sum when the input as the DirectAccessBit flag
* Some code simplification in CacheFriendly product
* Some minor documentation improvements
|
|
|
|
|
| |
Assign, in preparation for new Swap impl reusing Assign code.
remove last remnant of old Inverse class in Transform.
|
|
|
|
|
|
|
|
| |
- added explicit enum to int conversion where needed
- if a function is not defined as declared and the return type is "tricky"
then the type must be typedefined somewhere. A "tricky return type" can be:
* a template class with a default parameter which depends on another template parameter
* a nested template class, or type of a nested template class
|
|
|
|
|
|
| |
and vector * row-major products. Currently, it is enabled only is the matrix
has DirectAccessBit flag and the product is "large enough".
Added the respective unit tests in test/product/cpp.
|
|
|
|
| |
Cwise.
|
|
|
|
|
| |
* add comment in Product.h about CanVectorizeInner
* fix typo in test/product.cpp
|
|
|
|
|
|
|
| |
* added complete implementation of sparse matrix product
(with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
=> Eigen outperforms in all transposed/density configurations !
|
|
|
|
|
|
|
|
| |
* rework PacketMath and DummyPacketMath, make these actual template
specializations instead of just overriding by non-template inline
functions
* introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix
* remove Matrix::map() methods, use Map constructors instead.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* introduce packet(int), make use of it in linear vectorized paths
--> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
doesn't imply the block xpr should have it too.
|
| |
|
|
|
|
|
|
| |
(could come back to redux after it has been vectorized,
and could serve as a starting point for that)
also make the abs2 functor vectorizable (for real types).
|
|
|
|
|
|
|
|
|
| |
packet access, it is not certain that it will bring a performance
improvement: benchmarking needed.
* improve logic choosing slice vectorization.
* fix typo in SSE packet math, causing crash in unaligned case.
* fix bug in Product, causing crash in unaligned case.
* add TEST_SSE3 CMake option.
|
|
|
|
| |
enums to int is enough to get compile time constants with ICC.
|
|
|
|
|
|
| |
* make Matrix2f (and similar) vectorized using linear path
* fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4
(cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
|
|
|
|
|
|
|
|
|
| |
now have the Like1D flag.
* Big renaming:
packetCoeff ---> packet
VectorizableBit ---> PacketAccessBit
Like1DArrayBit ---> LinearAccessBit
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.
* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
to allow vectorization of row-vectors.
* fix compilation in benchmark, and a warning in Transpose.
|
|
|
|
|
| |
* fix a couple of compilation issues when unrolling is disabled
* reduce default unrolling limit to a more reasonable value
|
|
|
|
| |
(see notes in Core/util/StaticAssert.h for details)
|
|
|
|
|
| |
as it speed up compilation.
* fix minor typo introduced in the previous commit
|
|
|
|
|
|
| |
in MatrixBase work
* removed product_selector and cleaned Product.h a bit
* cleaned Assign.h a bit
|
|
|
|
|
| |
* bugfix in Assign and cache friendly product (weird that worked before)
* improved argument evaluation in Product
|
|
|
|
|
|
|
|
|
|
| |
Triangular class
- full meta-unrolling in Part
- move inverseProduct() to MatrixBase
- compilation fix in ProductWIP: introduce a meta-selector to only do
direct access on types that support it.
- phase out the old Product, remove the WIP_DIRTY stuff.
- misc renaming and fixes
|
|
|
|
|
|
| |
* Fix a mistake in CwiseNullary.
* Added a CoreDeclarions header that declares only the forward declarations
and related basic stuffs.
|
|
|
|
| |
-finline-limit=1000 to gcc to get good performance. By the way some cleanup.
|
|
|
|
|
|
|
|
| |
* Fix compilation of Inverse.h with vectorisation
* Introduce EIGEN_GNUC_AT_LEAST(x,y) macro doing future-proof (e.g. gcc v5.0) check
* Only use ProductWIP if vectorisation is enabled
* rename EIGEN_ALWAYS_INLINE -> EIGEN_INLINE with fall-back to inline keyword
* some cleanup/indentation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Introduce a new highly optimized matrix-matrix product for large
matrices. The code is still highly experimental and it is activated
only if you define EIGEN_WIP_PRODUCT at compile time.
Currently the third dimension of the product must be a factor of
the packet size (x4 for floats) and the right handed side matrix
must be column major.
Moreover, currently c = a*b; actually computes c += a*b !!
Therefore, the code is provided for experimentation purpose only !
These limitations will be fixed soon or later to become the default
product implementation.
|
|
|
|
| |
in benchmark. Still not as fasts as explicit eval(), strangely.
|
|
|
|
|
|
| |
extended cache optimal product to work in any row/column
major situations, and a few bugfixes (forgot to add the
Cholesky header, vectorization of CwiseBinary)
|
|
|
|
| |
Added a test for Triangular.
|
|
|
|
|
|
|
|
|
|
|
| |
m.upper() = a+b;
only updates the upper triangular part of m.
Note that:
m = (a+b).upper();
updates all coefficients of m (but half of the additions
will be skiped)
Updated back/forward substitution to better use Eigen's capability.
|
|
|
|
|
|
|
|
| |
- vector to vector assign
- PartialRedux
- Vectorization criteria of Product
- returned type of normalized
- SSE integer mul
|
|
|
|
|
|
|
|
| |
- support dynamic sizes
- support arbitrary matrix size when the matrix can be seen as a 1D array
(except for fixed size matrices where the size in Bytes must be a factor of 16,
this is to allow compact storage of a vector of matrices)
Note that the explict vectorization is still experimental and far to be completely tested.
|
| |
|
|
|
|
|
|
| |
fully optimized.
* Even if LargeBit is set, only parallelize for large enough objects
(controlled by EIGEN_PARALLELIZATION_TRESHOLD).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
vectorized paths as well
- kill the vector assignment unroller. implement in
operator= the logic for assigning a row-vector in
a col-vector.
- CMakeLists support for building tests/examples
with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
by ones() which prevents underflows from perturbing
bench results.
|
| |
|
|
* rename OperatorEquals -> Assign
* move Util.h and FwDecl.h to a util/ subdir
|