aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* * Patch by Konstantinos Margaritis: AltiVec vectorization.Gravatar Benoit Jacob2008-05-03
| | | | * Fix several warnings, temporarily disable determinant test.
* slighly improved the cache friendly product to use mul-add onlyGravatar Gael Guennebaud2008-05-03
|
* added packet mul-add function (ei_pmad) and updated Product to use it.Gravatar Gael Guennebaud2008-05-03
| | | | | this change nothing for current SSE architecture but might be helpful for altivec/cell and up comming AMD processors.
* Removed ei_pload1, use posix_memalign to allocate aligned memory,Gravatar Gael Guennebaud2008-05-02
| | | | | and make Product ok when only one side is vectorizable (and the product is still vectorized)
* added a test for triangular matricesGravatar Gael Guennebaud2008-05-02
|
* Make products always eval into expressions. Improves performanceGravatar Benoit Jacob2008-05-02
| | | | in benchmark. Still not as fasts as explicit eval(), strangely.
* fix flag and cost computations for nested expressionsGravatar Gael Guennebaud2008-05-01
|
* nullary xpr are now vectorizedGravatar Gael Guennebaud2008-05-01
|
* Enable vectorization of product with dynamic matrices,Gravatar Gael Guennebaud2008-05-01
| | | | | | extended cache optimal product to work in any row/column major situations, and a few bugfixes (forgot to add the Cholesky header, vectorization of CwiseBinary)
* some cleaning in Cholesky and removed evil ei_sqrt of complexGravatar Gael Guennebaud2008-04-27
|
* * added ei_sqrt for complexGravatar Gael Guennebaud2008-04-27
| | | | | * updated Cholesky to support complex * correct result_type for abs and abs2 functors
* added Cholesky moduleGravatar Gael Guennebaud2008-04-27
|
* Fixed a couple of issues introduced in previous commits.Gravatar Gael Guennebaud2008-04-26
| | | | Added a test for Triangular.
* Added triangular assignement, e.g.:Gravatar Gael Guennebaud2008-04-26
| | | | | | | | | | | m.upper() = a+b; only updates the upper triangular part of m. Note that: m = (a+b).upper(); updates all coefficients of m (but half of the additions will be skiped) Updated back/forward substitution to better use Eigen's capability.
* Added Triangular expression to extract upper or lower (strictly or not)Gravatar Gael Guennebaud2008-04-26
| | | | | | | | | | | part of a matrix. Triangular also provide an optimised method for forward and backward substitution. Further optimizations regarding assignments and products might come later. Updated determinant() to take into account triangular matrices. Started the QR module with a QR decompostion algorithm. Help needed to build a QR algorithm (eigen solver) based on it.
* fix a bug in determinant of 4x4 matrices and a small type issue in InverseGravatar Gael Guennebaud2008-04-26
|
* added a tough test to check the determinant that currently failsGravatar Gael Guennebaud2008-04-25
|
* Various fixes in:Gravatar Gael Guennebaud2008-04-25
| | | | | | | | - vector to vector assign - PartialRedux - Vectorization criteria of Product - returned type of normalized - SSE integer mul
* Make the explicit vectorization much more flexible:Gravatar Gael Guennebaud2008-04-25
| | | | | | | | - support dynamic sizes - support arbitrary matrix size when the matrix can be seen as a 1D array (except for fixed size matrices where the size in Bytes must be a factor of 16, this is to allow compact storage of a vector of matrices) Note that the explict vectorization is still experimental and far to be completely tested.
* forgot to add a file in the previous commitGravatar Gael Guennebaud2008-04-24
|
* Fix a couple of issue with the vectorization. In particular, default ei_p* ↵Gravatar Gael Guennebaud2008-04-24
| | | | | | | | | functions are provided to handle not suported types seemlessly. Added a generic null-ary expression with null-ary functors. They replace Zero, Ones, Identity and Random.
* give up on OpenMP... for nowGravatar Benoit Jacob2008-04-18
|
* - add _packetCoeff() to Inverse, allowing vectorization.Gravatar Benoit Jacob2008-04-16
| | | | | | | | | - let Inverse take template parameter MatrixType instead of ExpressionType, in order to reduce executable code size when taking inverses of xpr's. - introduce ei_corrected_matrix_flags : the flags template parameter to the Matrix class is only a suggestion. This is also useful in ei_eval.
* +5% optimization in 4x4 inverse:Gravatar Benoit Jacob2008-04-15
| | | | | -only evaluate block expressions for which that is beneficial -don't check for invertibility unless requested
* for 4x4 matrices implement the special algorithm that Markos proposed,Gravatar Benoit Jacob2008-04-15
| | | | falling back to the general algorithm in the bad case.
* - optimized determinant calculations for small matrices (size <= 4)Gravatar Benoit Jacob2008-04-14
| | | | | | | (only 30 muls for size 4) - rework the matrix inversion: now using cofactor technique for size<=3, so the ugly unrolling is only used for size 4 anymore, and even there I'm looking to get rid of it.
* when evaluating an xpr, the result can now be vectorizableGravatar Benoit Jacob2008-04-14
| | | | even if the xpr itself wasn't vectorizable.
* * Start of the LU module, with matrix inversion already there andGravatar Benoit Jacob2008-04-14
| | | | | | fully optimized. * Even if LargeBit is set, only parallelize for large enough objects (controlled by EIGEN_PARALLELIZATION_TRESHOLD).
* * Add fixed-size template versions of corner(), start(), end().Gravatar Benoit Jacob2008-04-12
| | | | | | | * Use them to write an unrolled path in echelon.cpp, as an experiment before I do this LU module. * For floating-point types, make ei_random() use an amplitude of 1.
* - cleaner use of OpenMP (no code duplication anymore)Gravatar Benoit Jacob2008-04-11
| | | | | | | | | | | | | | using a macro and _Pragma. - use OpenMP also in cacheOptimalProduct and in the vectorized paths as well - kill the vector assignment unroller. implement in operator= the logic for assigning a row-vector in a col-vector. - CMakeLists support for building tests/examples with -fopenmp and/or -msse2 - updates in bench/, especially replace identity() by ones() which prevents underflows from perturbing bench results.
* Merge Gael's experimental OpenMP parallelization support into Assign.h.Gravatar Benoit Jacob2008-04-11
|
* added a vectorized version of Product::_cacheOptimalProduct,Gravatar Gael Guennebaud2008-04-10
| | | | | added the possibility to disable the vectorization using EIGEN_DONT_VECTORIZE (some architectures has SSE support by default)
* * add typedefs for matrices/vectors with LargeBitGravatar Benoit Jacob2008-04-10
| | | | | | | | * add -pedantic to CXXFLAGS * cleanup intricated expressions with && and || which gave warnings because of "missing" parentheses * fix compile error in NumTraits, apparently discovered by -pedantic
* split those files in util/Gravatar Benoit Jacob2008-04-10
| | | | some more renaming
* * rename XprCopy -> NestedGravatar Benoit Jacob2008-04-10
| | | | | * rename OperatorEquals -> Assign * move Util.h and FwDecl.h to a util/ subdir
* fix priority operator bugs in the computationGravatar Gael Guennebaud2008-04-09
| | | | of the VectorizableBit flag, now benchmark.cpp is properly vectorized
* a better bugfix in ei_matrix_operator_equals_packet_unrollerGravatar Gael Guennebaud2008-04-09
|
* bugfix in ei_matrix_operator_equals_packet_unrollerGravatar Gael Guennebaud2008-04-09
|
* Added initial experimental support for explicit vectorization.Gravatar Gael Guennebaud2008-04-09
| | | | | | | | | | | | | Currently only the following platform/operations are supported: - SSE2 compatible architecture - compiler compatible with intel's SSE2 intrinsics - float, double and int data types - fixed size matrices with a storage major dimension multiple of 4 (or 2 for double) - scalar-matrix product, component wise: +,-,*,min,max - matrix-matrix product only if the left matrix is vectorizable and column major or the right matrix is vectorizable and row major, e.g.: a.transpose() * b is not vectorized with the default column major storage. To use it you must define EIGEN_VECTORIZE and EIGEN_INTEL_PLATFORM.
* finish making use of CoeffReadCost and the new XprCopy everywhereGravatar Benoit Jacob2008-04-08
| | | | seems appropriate to me.
* - merge ei_xpr_copy and ei_eval_if_needed_before_nestingGravatar Benoit Jacob2008-04-06
| | | | | | | - make use of CoeffReadCost to determine when to unroll the loops, for now only in Product.h and in OperatorEquals.h performance remains the same: generally still not as good as before the big changes.
* fix compilation (finish removal of EIGEN_UNROLLED_LOOPS)Gravatar Benoit Jacob2008-04-05
|
* fixes as discussed with Gael on IRC. Mainly, in Fuzzy.h, and Dot.h, useGravatar Benoit Jacob2008-04-05
| | | | | | ei_xpr_copy to evaluate args when needed. Had to introduce an ugly trick with ei_unref as when the XprCopy type is a reference one can't directly access member typedefs such as Scalar.
* * make use of the EvalBeforeNestingBit and EvalBeforeAssigningBitGravatar Gael Guennebaud2008-04-05
| | | | | | | | | | | in ei_xpr_copy and operator=, respectively. * added Matrix::lazyAssign() when EvalBeforeAssigningBit must be skipped (mainly internal use only) * all expressions are now stored by const reference * added Temporary xpr: .temporary() must be called on any temporary expression not directly returned by a function (mainly internal use only) * moved all functors in the Functors.h header * added some preliminaries stuff for the explicit vectorization
* * added cwise comparisonsGravatar Gael Guennebaud2008-04-03
| | | | | | | * added "all" and "any" special redux operators * added support bool matrices * added support for cost model of STL functors via ei_functor_traits (By default ei_functor_traits query the functor member Cost)
* current state of the mess. One line fails in the tests, andGravatar Benoit Jacob2008-04-03
| | | | | | | | | | useless copies are made when evaluating nested expressions. Changes: - kill LazyBit, introduce EvalBeforeNestingBit and EvalBeforeAssigningBit - product and random don't evaluate immediately anymore - eval() always evaluates - change the value of Dynamic to some large positive value, in preparation of future simplifications
* More clever evaluation of arguments: now it occurs in earlier, in operator*,Gravatar Benoit Jacob2008-04-03
| | | | | | | | before the Product<> type is constructed. This resets template depth on each intermediate evaluation, and gives simpler code. Introducing ei_eval_if_expensive<Derived, n> which evaluates Derived if it's worth it given that each of its coeffs will be accessed n times. Operator* uses this with adequate values of n to evaluate args exactly when needed.
* fix a compilation issue with gcc-3.3 and ei_result_ofGravatar Gael Guennebaud2008-04-03
|
* -new: recursive costs system, useful to determine automaticallyGravatar Benoit Jacob2008-04-03
| | | | | | | | when to evaluate arguments and when to meta-unroll. -use it in Product to determine when to eval args. not yet used to determine when to unroll. for now, not used anywhere else but that'll follow. -fix badness of my last commit
* - remove Eval/EvalOMP (moving them to a disabled/ subdir in orderGravatar Benoit Jacob2008-03-31
| | | | | | | to preserve SVN history). They are made useless by the new ei_eval_unless_lazy. - introduce a generic Eval member typedef so one can do e.g. T t; U u; Product<T, U>::Eval m; m = t*u;