| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
all per plot settings have been moved to a single file, go_mean now takes an
optional second argument "tiny" to generate plots for tiny matrices, and
output of comparison information wrt to previous benchs (if any).
|
|
|
|
| |
- added GOTO library
|
|
|
|
| |
* various improvements in BTL including trisolver and cholesky bench
|
|
|
|
|
|
|
| |
products.
* Minor update of the cores of the Cholesky algorithms to make them more friendly
wrt to matrix-vector products => speedup x5 !
|
|
|
|
|
|
|
|
|
|
|
| |
- remove all invertibility checking, will be redundant with LU
- general case: adapt to matrix storage order for better perf
- size 4 case: handle corner cases without falling back to gen case.
- rationalize with selectors instead of compile time if
- add C-style computeInverse()
* update inverse test.
* in snippets, default cout precision to 3 decimal places
* add some cmake module from kdelibs to support btl with cmake 2.4
|
|
|
|
|
|
|
|
| |
- removed the ugly X11 and PNG gnuplots terminals
- use enhanced postscript terminal
- use imagemagick to generate the png files (with compression)
- disable the fortran impl by default since it is as meaningless as a "C impl"
- update line settings
|
|
|
|
|
|
|
|
| |
colmajor).
It basically performs 4 dot products at once reducing loads of the vector and improving
instructions scheduling. With 3 cache friendly algorithms, we now handle all product
configurations with outstanding perf for large matrices.
|
| |
|
| |
|
| |
|
|
|
|
| |
* some simplifications and fixes in cache friendly products
|
| |
|
| |
|
| |
|
|
|
|
|
| |
of some functions as well as the experimental C code used to design
efficient eigen's matrix vector products.
|
|
|
|
| |
in the benchmark suite
|
|
|
|
|
|
| |
BTL_CONFIG="-a ata" ctest -V -R eigen
run the all benchmarks having "ata" in their name for all
libraries matching the regexp "eigen"
|
|
|
|
|
| |
about a +25% speedup which is still nice as i expected zero or even
negative benefit.
|
|
|
|
| |
zero benefit... turns out to be 20x speedup. Something is wrong.
|
|
|
|
|
|
|
|
|
|
|
|
| |
the modifications to initial code follow:
* changed build system from plain makefiles to cmake
* added eigen2 (4 versions: vec/novec and fixed/dynamic), GMM++, MTL4 interfaces
* added "transposed matrix * vector" product action
* updated blitz interface to use condensed products instead of hand coded loops
* removed some deprecated interfaces
* changed default storage order to column major for all libraries
* new generic bench timer strategy which is supposed to be more accurate
* various code clean-up
|
| |
|
|
|
|
|
| |
- fix compilation in product.cpp with std::complex
- fix bug in MatrixBase::operator!=
|
|
|
|
| |
can be seen in Eigen/src/Core/Cwise.h.
|
|
|
|
|
| |
* added some tests for product and swap
* overload .swap() for dynamic-sized matrix of same size
|
|
|
|
|
| |
(equivalent to the GEMM blas routine)
* added a GEMM benchmark
|
|
|
|
|
|
|
| |
might be twice faster fot small fixed size matrix
* added a sparse triangular solver (sparse version
of inverseProduct)
* various other improvements in the Sparse module
|
|
|
|
|
|
|
| |
* added complete implementation of sparse matrix product
(with a little glue in Eigen/Core)
* added an exhaustive bench of sparse products including GMM++ and MTL4
=> Eigen outperforms in all transposed/density configurations !
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* added some glue to Eigen/Core (SparseBit, ei_eval, Matrix)
* add two new sparse matrix types:
HashMatrix: based on std::map (for random writes)
LinkedVectorMatrix: array of linked vectors
(for outer coherent writes, e.g. to transpose a matrix)
* add a SparseSetter class to easily set/update any kind of matrices, e.g.:
{ SparseSetter<MatrixType,RandomAccessPattern> wrapper(mymatrix);
for (...) wrapper->coeffRef(rand(),rand()) = rand(); }
* automatic shallow copy for RValue
* and a lot of mess !
plus:
* remove the remaining ArrayBit related stuff
* don't use alloca in product for very large memory allocation
|
|
|
|
|
|
|
|
|
|
|
|
| |
* introduce packet(int), make use of it in linear vectorized paths
--> completely fixes the slowdown noticed in benchVecAdd.
* generalize coeff(int) to linear-access xprs
* clarify the access flag bits
* rework api dox in Coeffs.h and util/Constants.h
* improve certain expressions's flags, allowing more vectorization
* fix bug in Block: start(int) and end(int) returned dyn*dyn size
* fix bug in Block: just because the Eval type has packet access
doesn't imply the block xpr should have it too.
|
|
|
|
| |
* add vdw benchmark from Tim's real-world use case
|
|
|
|
|
|
|
|
| |
- uses the common "Compressed Column Storage" scheme
- supports every unary and binary operators with xpr template
assuming binaryOp(0,0) == 0 and unaryOp(0) = 0 (otherwise a sparse
matrix doesnot make sense)
- this is the first commit, so of course, there are still several shorcommings !
|
| |
|
|
|
|
|
|
|
|
| |
* use ProductReturnType<>::Type to get the correct Product xpr type
* Product is no longer instanciated for xpr types which are evaluated
* vectorization of "a.transpose() * b" for the normal product (small and fixed-size matrix)
* some cleanning
* removed ArrayBase
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
** Much better organization
** Fix a few bugs
** Add the ability to unroll only the inner loop
** Add an unrolled path to the Like1D vectorization. Not well tested.
** Add placeholder for sliced vectorization. Unimplemented.
* Rework of corrected_flags:
** improve rules determining vectorizability
** for vectors, the storage-order is indifferent, so we tweak it
to allow vectorization of row-vectors.
* fix compilation in benchmark, and a warning in Transpose.
|
| |
|
|
|
|
| |
* Fix several warnings, temporarily disable determinant test.
|
|
|
|
| |
in benchmark. Still not as fasts as explicit eval(), strangely.
|
|
|
|
| |
Added a test for Triangular.
|
| |
|
|
|
|
|
|
|
| |
(only 30 muls for size 4)
- rework the matrix inversion: now using cofactor technique for size<=3,
so the ugly unrolling is only used for size 4 anymore, and even there
I'm looking to get rid of it.
|
|
|
|
|
|
|
| |
* Use them to write an unrolled path in echelon.cpp, as an
experiment before I do this LU module.
* For floating-point types, make ei_random() use an amplitude
of 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
using a macro and _Pragma.
- use OpenMP also in cacheOptimalProduct and in the
vectorized paths as well
- kill the vector assignment unroller. implement in
operator= the logic for assigning a row-vector in
a col-vector.
- CMakeLists support for building tests/examples
with -fopenmp and/or -msse2
- updates in bench/, especially replace identity()
by ones() which prevents underflows from perturbing
bench results.
|
| |
|
|
|
|
|
| |
* rename OperatorEquals -> Assign
* move Util.h and FwDecl.h to a util/ subdir
|
|
|
|
|
|
|
| |
- make use of CoeffReadCost to determine when to unroll the loops,
for now only in Product.h and in OperatorEquals.h
performance remains the same: generally still not as good as before the
big changes.
|
| |
|