aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/products
Commit message (Collapse)AuthorAge
...
* Make sure that the block size computation is tested by our unit test.Gravatar Gael Guennebaud2015-02-26
|
* Implement a more generic blocking-size selection algorithm. See explanations ↵Gravatar Gael Guennebaud2015-02-26
| | | | | | | inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
* Fix typos in block-size testing code, and set peeling on k to 8.Gravatar Gael Guennebaud2015-02-26
|
* So I extensively measured the impact of the offset in this prefetch. I tried ↵Gravatar Benoit Jacob2015-02-25
| | | | | | | | | | | | | | offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
* Fix my recent prefetch changes:Gravatar Benoit Jacob2015-02-23
| | | | | | | | | | | - the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.
* rotating kernel: avoid compiling anything outside of ARMGravatar Benoit Jacob2015-02-18
|
* remove a newly introduced redundant typedef - sorry.Gravatar Benoit Jacob2015-02-18
|
* bug #955 - Implement a rotating kernel alternative in the 3px4 gebp pathGravatar Benoit Jacob2015-02-18
| | | | | | | | This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
* Fixed template parameter.Gravatar Hauke Heibel2015-02-18
|
* mergeGravatar Gael Guennebaud2015-02-18
|\
* | Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)Gravatar Gael Guennebaud2015-02-18
| |
| * bug #958 - Allow testing specific blocking sizesGravatar Benoit Jacob2015-02-18
|/ | | | | | | | | | | | | | This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
* Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵Gravatar Gael Guennebaud2015-02-18
| | | | might be lower than the number of requested ones
* Fix bug #945: workaround MSVC warningGravatar Gael Guennebaud2015-02-18
|
* Merged in chtz/eigen-indexconversion (pull request PR-92)Gravatar Gael Guennebaud2015-02-16
|\ | | | | | | | | | | | | | | | | | | | | | | bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that *explicitly* set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
| * The usage of DenseIndex is deprecated, so let's replace DenseIndex by IndexGravatar Gael Guennebaud2015-02-16
| |
| * Remove deprecated usage of expr::Index.Gravatar Gael Guennebaud2015-02-16
| |
* | bug #953 - Fix prefetches in 3px4 product kernelGravatar Benoit Jacob2015-02-13
|/ | | | This gives a 10% speedup on nexus 4 and on nexus 5.
* Pulled latest fixesGravatar Benoit Steiner2015-02-06
|\
| * Fix symmetric productGravatar Gael Guennebaud2015-02-06
| |
* | Pulled the latest changes from the trunkGravatar Benoit Steiner2015-02-06
|\ \ | |/ |/|
| * bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,Gravatar Benoit Jacob2015-01-31
| | | | | | | | | | | | | | | | | | because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.
| * bug #936, patch 1/3: some cleanup and renaming for consistency.Gravatar Benoit Jacob2015-01-30
| |
| * bug #935: Add asm comments in GEBP kernels to work around a bugGravatar Benoit Jacob2015-01-30
| | | | | | | | | | | | | | | | | | in both GCC and Clang on ARM/NEON, whereby they spill registers, severely harming performance. The reason why the asm comments make a difference is that they prevent the compiler from reordering code across these boundaries, which has the effect of extending the lifetime of local variables and increasing register pressure on this register-tight code.
* | Ensured that contractions that can be reduced to a matrix vector product ↵Gravatar Benoit Steiner2015-01-06
| | | | | | | | work correctly even when the input coefficients aren't aligned.
* | Generalized the matrix vector product code.Gravatar Benoit Steiner2014-10-31
| |
* | Made the blocking computation aware of the l3 cacheGravatar Benoit Steiner2014-10-15
| | | | | | | | Also optimized the blocking parameters to take into account the number of threads used for a computation
| * bug #887: use ei_declare_aligned_stack_constructed_variable instead of ↵Gravatar Gael Guennebaud2014-10-06
| | | | | | | | manual new[]/delete[] pairs in AMD and Paralellizer
* | Generalized the gebp apisGravatar Benoit Steiner2014-10-02
| |
| * Using Index instead of hard coded int type to prevent potential implicit ↵Gravatar Georg Drenkhahn2014-09-22
| | | | | | | | integer conversion
| * Make constructors explicit if they could lead to unintended implicit conversionGravatar Christoph Hertzberg2014-09-23
| |
| * Merged eigen/eigen into defaultGravatar Konstantinos Margaritis2014-09-21
| |\
| | * Remove deprecated code not used by evaluatorsGravatar Gael Guennebaud2014-09-18
| | |
| | * merge with default branchGravatar Gael Guennebaud2014-09-14
| | |\ | |_|/ |/| |
| * | Initial VSX commitGravatar Konstantinos Margaritis2014-08-29
| | |
* | | Fix bug #852: define Traits type in general_matrix_matrix_product when ↵Gravatar Kevin Locke2014-08-08
| | | | | | | | | | | | EIGEN_USE_BLAS is defined
* | | fix for MKL_BLAS not defined in MKL 11.2Gravatar Yan Zhou2014-09-08
|/ /
| * Various minor fixesGravatar Gael Guennebaud2014-07-30
| |
| * Refactor TriangularView to handle both dense and sparse objects. Introduce a ↵Gravatar Gael Guennebaud2014-07-22
| | | | | | | | glu_shape<S1,S2> helper to assemble sparse/dense shapes with triagular/seladjoint views.
| * merge with default branchGravatar Gael Guennebaud2014-07-10
| |\ | |/ |/|
* | Fix many long to int implicit conversionsGravatar Gael Guennebaud2014-07-08
| |
| * Split StorageKind promotion into two helpers: one for products, and one for ↵Gravatar Gael Guennebaud2014-07-01
| | | | | | | | coefficient-wise operations.
| * Backport changes from old to new expression enginesGravatar Gael Guennebaud2014-06-20
| |
| * merge with default branchGravatar Gael Guennebaud2014-06-20
| |\ | |/ |/|
| * 1- Introduce sub-evaluator types for unary, binary, product, and map ↵Gravatar Gael Guennebaud2014-06-20
| | | | | | | | | | | | expressions to ease specializing them. 2- Remove a lot of code which should not be there with evaluators, in particular coeff/packet methods implemented in the expressions.
* | fixed warning: -Wunused-local-typedefsGravatar Mark Borgerding2014-06-17
| |
* | Missed to remove IACA_END in previous commitGravatar Christoph Hertzberg2014-05-05
| |
* | Removed IACA-definesGravatar Christoph Hertzberg2014-05-05
| | | | | | | | This caused redefinition warnings if IACA headers were included from elsewhere. For a clean solution we should define our own EIGEN_IACA_* macros
* | TRMM: Make sure we have enough memory in rhs block to enforce alignment.Gravatar Gael Guennebaud2014-04-25
| |
* | Make sure that calls to broadcast4 are 16 bytes alignedGravatar Gael Guennebaud2014-04-25
| |