aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/Assign.h
Commit message (Collapse)AuthorAge
* * Big change in Block and Map:Gravatar Gael Guennebaud2008-08-09
| | | | | | | | | | | | | | - added a MapBase base xpr on top of which Map and the specialization of Block are implemented - MapBase forces both aligned loads (and aligned stores, see below) in expressions such as "x.block(...) += other_expr" * Significant vectorization improvement: - added a AlignedBit flag meaning the first coeff/packet is aligned, this allows to not generate extra code to deal with the first unaligned part - removed all unaligned stores when no unrolling - removed unaligned loads in Sum when the input as the DirectAccessBit flag * Some code simplification in CacheFriendly product * Some minor documentation improvements
* introduce copyCoeff and copyPacket methods in MatrixBase, used byGravatar Benoit Jacob2008-08-05
| | | | | Assign, in preparation for new Swap impl reusing Assign code. remove last remnant of old Inverse class in Transform.
* Several compilation fixes for MSVC and NVCC, basically:Gravatar Gael Guennebaud2008-07-29
| | | | | | | | - added explicit enum to int conversion where needed - if a function is not defined as declared and the return type is "tricky" then the type must be typedefined somewhere. A "tricky return type" can be: * a template class with a default parameter which depends on another template parameter * a nested template class, or type of a nested template class
* Add a *very efficient* evaluation path for both col-major matrix * vectorGravatar Gael Guennebaud2008-07-12
| | | | | | and vector * row-major products. Currently, it is enabled only is the matrix has DirectAccessBit flag and the product is "large enough". Added the respective unit tests in test/product/cpp.
* some performance fixes in Assign.h reported by Gael. Some doc update inGravatar Benoit Jacob2008-07-10
| | | | Cwise.
* * do the ActualPacketAccesBit change as discussed on listGravatar Benoit Jacob2008-07-04
| | | | | * add comment in Product.h about CanVectorizeInner * fix typo in test/product.cpp
* * added innerSize / outerSize functions to MatrixBaseGravatar Gael Guennebaud2008-06-28
| | | | | | | * added complete implementation of sparse matrix product (with a little glue in Eigen/Core) * added an exhaustive bench of sparse products including GMM++ and MTL4 => Eigen outperforms in all transposed/density configurations !
* * rework Map, allow vectorizationGravatar Benoit Jacob2008-06-27
| | | | | | | | * rework PacketMath and DummyPacketMath, make these actual template specializations instead of just overriding by non-template inline functions * introduce ei_ploadt and ei_pstoret, make use of them in Map and Matrix * remove Matrix::map() methods, use Map constructors instead.
* * add bench/benchVecAdd.cpp by Gael, fix crash (ei_pload on non-aligned)Gravatar Benoit Jacob2008-06-26
| | | | | | | | | | | | * introduce packet(int), make use of it in linear vectorized paths --> completely fixes the slowdown noticed in benchVecAdd. * generalize coeff(int) to linear-access xprs * clarify the access flag bits * rework api dox in Coeffs.h and util/Constants.h * improve certain expressions's flags, allowing more vectorization * fix bug in Block: start(int) and end(int) returned dyn*dyn size * fix bug in Block: just because the Eval type has packet access doesn't imply the block xpr should have it too.
* optimize linear vectorization both in Assign and Sum (optimal amortized perf)Gravatar Gael Guennebaud2008-06-23
|
* split sum away from redux and vectorize it.Gravatar Benoit Jacob2008-06-23
| | | | | | (could come back to redux after it has been vectorized, and could serve as a starting point for that) also make the abs2 functor vectorizable (for real types).
* * implement slice vectorization. Because it uses unalignedGravatar Benoit Jacob2008-06-22
| | | | | | | | | packet access, it is not certain that it will bring a performance improvement: benchmarking needed. * improve logic choosing slice vectorization. * fix typo in SSE packet math, causing crash in unaligned case. * fix bug in Product, causing crash in unaligned case. * add TEST_SSE3 CMake option.
* move "enum" back to "const int" int ei_assign_impl: in fact, castingGravatar Gael Guennebaud2008-06-20
| | | | enums to int is enough to get compile time constants with ICC.
* * more cleaning in ProductGravatar Gael Guennebaud2008-06-19
| | | | | | * make Matrix2f (and similar) vectorized using linear path * fix a couple of warnings and compilation issues with ICC and gcc 3.3/3.4 (cannot get Transform compiles with gcc 3.3/3.4, see the FIXME)
* * Block: row and column expressions in the inner directionGravatar Benoit Jacob2008-06-16
| | | | | | | | | now have the Like1D flag. * Big renaming: packetCoeff ---> packet VectorizableBit ---> PacketAccessBit Like1DArrayBit ---> LinearAccessBit
* aaargh.Gravatar Benoit Jacob2008-06-16
|
* fix bug in computation of unrolling limit: div instead of mulGravatar Benoit Jacob2008-06-16
|
* * Big rework of Assign.h:Gravatar Benoit Jacob2008-06-16
| | | | | | | | | | | | | | | ** Much better organization ** Fix a few bugs ** Add the ability to unroll only the inner loop ** Add an unrolled path to the Like1D vectorization. Not well tested. ** Add placeholder for sliced vectorization. Unimplemented. * Rework of corrected_flags: ** improve rules determining vectorizability ** for vectors, the storage-order is indifferent, so we tweak it to allow vectorization of row-vectors. * fix compilation in benchmark, and a warning in Transpose.
* * move some compile time "if" to their respective unroller (assign and dot)Gravatar Gael Guennebaud2008-06-07
| | | | | * fix a couple of compilation issues when unrolling is disabled * reduce default unrolling limit to a more reasonable value
* added a static assertion mechanismGravatar Gael Guennebaud2008-06-04
| | | | (see notes in Core/util/StaticAssert.h for details)
* * replace compile-time-if by meta-selector in Assign.hGravatar Gael Guennebaud2008-05-31
| | | | | as it speed up compilation. * fix minor typo introduced in the previous commit
* * updated the assignement operator macro so that overloadsGravatar Gael Guennebaud2008-05-28
| | | | | | in MatrixBase work * removed product_selector and cleaned Product.h a bit * cleaned Assign.h a bit
* * change Flagged to take into account NestByValue onlyGravatar Gael Guennebaud2008-05-28
| | | | | * bugfix in Assign and cache friendly product (weird that worked before) * improved argument evaluation in Product
* - introduce Part and Extract classes, splitting and extending the formerGravatar Benoit Jacob2008-05-27
| | | | | | | | | | Triangular class - full meta-unrolling in Part - move inverseProduct() to MatrixBase - compilation fix in ProductWIP: introduce a meta-selector to only do direct access on types that support it. - phase out the old Product, remove the WIP_DIRTY stuff. - misc renaming and fixes
* * Added several cast to int of the enums (needed for some compilers)Gravatar Gael Guennebaud2008-05-12
| | | | | | * Fix a mistake in CwiseNullary. * Added a CoreDeclarions header that declares only the forward declarations and related basic stuffs.
* put inline keywords everywhere appropriate. So we don't need anymore to passGravatar Benoit Jacob2008-05-12
| | | | -finline-limit=1000 to gcc to get good performance. By the way some cleanup.
* * Give Konstantinos a copyright lineGravatar Benoit Jacob2008-05-12
| | | | | | | | * Fix compilation of Inverse.h with vectorisation * Introduce EIGEN_GNUC_AT_LEAST(x,y) macro doing future-proof (e.g. gcc v5.0) check * Only use ProductWIP if vectorisation is enabled * rename EIGEN_ALWAYS_INLINE -> EIGEN_INLINE with fall-back to inline keyword * some cleanup/indentation
* * Started support for unaligned vectorization.Gravatar Gael Guennebaud2008-05-05
| | | | | | | | | | | | | * Introduce a new highly optimized matrix-matrix product for large matrices. The code is still highly experimental and it is activated only if you define EIGEN_WIP_PRODUCT at compile time. Currently the third dimension of the product must be a factor of the packet size (x4 for floats) and the right handed side matrix must be column major. Moreover, currently c = a*b; actually computes c += a*b !! Therefore, the code is provided for experimentation purpose only ! These limitations will be fixed soon or later to become the default product implementation.
* Make products always eval into expressions. Improves performanceGravatar Benoit Jacob2008-05-02
| | | | in benchmark. Still not as fasts as explicit eval(), strangely.
* Enable vectorization of product with dynamic matrices,Gravatar Gael Guennebaud2008-05-01
| | | | | | extended cache optimal product to work in any row/column major situations, and a few bugfixes (forgot to add the Cholesky header, vectorization of CwiseBinary)
* Fixed a couple of issues introduced in previous commits.Gravatar Gael Guennebaud2008-04-26
| | | | Added a test for Triangular.
* Added triangular assignement, e.g.:Gravatar Gael Guennebaud2008-04-26
| | | | | | | | | | | m.upper() = a+b; only updates the upper triangular part of m. Note that: m = (a+b).upper(); updates all coefficients of m (but half of the additions will be skiped) Updated back/forward substitution to better use Eigen's capability.
* Various fixes in:Gravatar Gael Guennebaud2008-04-25
| | | | | | | | - vector to vector assign - PartialRedux - Vectorization criteria of Product - returned type of normalized - SSE integer mul
* Make the explicit vectorization much more flexible:Gravatar Gael Guennebaud2008-04-25
| | | | | | | | - support dynamic sizes - support arbitrary matrix size when the matrix can be seen as a 1D array (except for fixed size matrices where the size in Bytes must be a factor of 16, this is to allow compact storage of a vector of matrices) Note that the explict vectorization is still experimental and far to be completely tested.
* give up on OpenMP... for nowGravatar Benoit Jacob2008-04-18
|
* * Start of the LU module, with matrix inversion already there andGravatar Benoit Jacob2008-04-14
| | | | | | fully optimized. * Even if LargeBit is set, only parallelize for large enough objects (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
* - cleaner use of OpenMP (no code duplication anymore)Gravatar Benoit Jacob2008-04-11
| | | | | | | | | | | | | | using a macro and _Pragma. - use OpenMP also in cacheOptimalProduct and in the vectorized paths as well - kill the vector assignment unroller. implement in operator= the logic for assigning a row-vector in a col-vector. - CMakeLists support for building tests/examples with -fopenmp and/or -msse2 - updates in bench/, especially replace identity() by ones() which prevents underflows from perturbing bench results.
* Merge Gael's experimental OpenMP parallelization support into Assign.h.Gravatar Benoit Jacob2008-04-11
|
* * rename XprCopy -> NestedGravatar Benoit Jacob2008-04-10
* rename OperatorEquals -> Assign * move Util.h and FwDecl.h to a util/ subdir