eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Make sure that the block size computation is tested by our unit test.	Gael Guennebaud	2015-02-26
\|
*	Implement a more generic blocking-size selection algorithm. See explanations ↵	Gael Guennebaud	2015-02-26
\| \| \| \| \| \| \|	inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)
*	Fix typos in block-size testing code, and set peeling on k to 8.	Gael Guennebaud	2015-02-26
\|
*	So I extensively measured the impact of the offset in this prefetch. I tried ↵	Benoit Jacob	2015-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
*	Fix my recent prefetch changes:	Benoit Jacob	2015-02-23
\| \| \| \| \| \| \| \| \| \| \|	- the first prefetch is actually harmful on Haswell with FMA, but it is the most beneficial on ARM. - the second prefetch... I was very stupid and multiplied by sizeof(scalar) and offset of a scalar* pointer. The old offset was 64; pk = 8, so 64=pk*8. So this effectively restores the older offset. Actually, there were two prefetches here, one with offset 48 and one with offset 64. I could not confirm any benefit from this strange 48 offset on either the haswell or my ARM device.
*	rotating kernel: avoid compiling anything outside of ARM	Benoit Jacob	2015-02-18
\|
*	remove a newly introduced redundant typedef - sorry.	Benoit Jacob	2015-02-18
\|
*	bug #955 - Implement a rotating kernel alternative in the 3px4 gebp path	Benoit Jacob	2015-02-18
\| \| \| \| \| \| \| \|	This is substantially faster on ARM, where it's important to minimize the number of loads. This is specific to the case where all packet types are of size 4. I made my best attempt to minimize how dirty this is... opinions welcome. Eventually one could have a generic rotated kernel, but it would take some work to get there. Also, on sandy bridge, in my experience, it's not beneficial (even about 1% slower).
*	Fixed template parameter.	Hauke Heibel	2015-02-18
\|
*	merge	Gael Guennebaud	2015-02-18
\|\
* \|	Clean a bit computeProductBlockingSizes (use Index type, remove CEIL macro)	Gael Guennebaud	2015-02-18
\| \|
\| *	bug #958 - Allow testing specific blocking sizes	Benoit Jacob	2015-02-18
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
*	Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵	Gael Guennebaud	2015-02-18
\| \| \| \|	might be lower than the number of requested ones
*	Fix bug #945: workaround MSVC warning	Gael Guennebaud	2015-02-18
\|
*	Merged in chtz/eigen-indexconversion (pull request PR-92)	Gael Guennebaud	2015-02-16
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bug #877, bug #572: Get rid of Index conversion warnings, summary of changes: - Introduce a global typedef Eigen::Index making Eigen::DenseIndex and AnyExpr<>::Index deprecated (default is std::ptrdiff_t). - Eigen::Index is used throughout the API to represent indices, offsets, and sizes. - Classes storing an array of indices uses the type StorageIndex to store them. This is a template parameter of the class. Default is int. - Methods that explicitly set or return an element of such an array take or return a StorageIndex type. In all other cases, the Index type is used.
\| *	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	Gael Guennebaud	2015-02-16
\| \|
\| *	Remove deprecated usage of expr::Index.	Gael Guennebaud	2015-02-16
\| \|
* \|	bug #953 - Fix prefetches in 3px4 product kernel	Benoit Jacob	2015-02-13
\|/ \| \| \|	This gives a 10% speedup on nexus 4 and on nexus 5.
*	Pulled latest fixes	Benoit Steiner	2015-02-06
\|\
\| *	Fix symmetric product	Gael Guennebaud	2015-02-06
\| \|
* \|	Pulled the latest changes from the trunk	Benoit Steiner	2015-02-06
\|\ \ \| \|/ \|/\|
\| *	bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,	Benoit Jacob	2015-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.
\| *	bug #936, patch 1/3: some cleanup and renaming for consistency.	Benoit Jacob	2015-01-30
\| \|
\| *	bug #935: Add asm comments in GEBP kernels to work around a bug	Benoit Jacob	2015-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in both GCC and Clang on ARM/NEON, whereby they spill registers, severely harming performance. The reason why the asm comments make a difference is that they prevent the compiler from reordering code across these boundaries, which has the effect of extending the lifetime of local variables and increasing register pressure on this register-tight code.
* \|	Ensured that contractions that can be reduced to a matrix vector product ↵	Benoit Steiner	2015-01-06
\| \| \| \| \| \| \| \|	work correctly even when the input coefficients aren't aligned.
* \|	Generalized the matrix vector product code.	Benoit Steiner	2014-10-31
\| \|
* \|	Made the blocking computation aware of the l3 cache	Benoit Steiner	2014-10-15
\| \| \| \| \| \| \| \|	Also optimized the blocking parameters to take into account the number of threads used for a computation
\| *	bug #887: use ei_declare_aligned_stack_constructed_variable instead of ↵	Gael Guennebaud	2014-10-06
\| \| \| \| \| \| \| \|	manual new[]/delete[] pairs in AMD and Paralellizer
* \|	Generalized the gebp apis	Benoit Steiner	2014-10-02
\| \|
\| *	Using Index instead of hard coded int type to prevent potential implicit ↵	Georg Drenkhahn	2014-09-22
\| \| \| \| \| \| \| \|	integer conversion
\| *	Make constructors explicit if they could lead to unintended implicit conversion	Christoph Hertzberg	2014-09-23
\| \|
\| *	Merged eigen/eigen into default	Konstantinos Margaritis	2014-09-21
\| \|\
\| \| *	Remove deprecated code not used by evaluators	Gael Guennebaud	2014-09-18
\| \| \|
\| \| *	merge with default branch	Gael Guennebaud	2014-09-14
\| \| \|\ \| \|_\|/ \|/\| \|
\| * \|	Initial VSX commit	Konstantinos Margaritis	2014-08-29
\| \| \|
* \| \|	Fix bug #852: define Traits type in general_matrix_matrix_product when ↵	Kevin Locke	2014-08-08
\| \| \| \| \| \| \| \| \| \| \| \|	EIGEN_USE_BLAS is defined
* \| \|	fix for MKL_BLAS not defined in MKL 11.2	Yan Zhou	2014-09-08
\|/ /
\| *	Various minor fixes	Gael Guennebaud	2014-07-30
\| \|
\| *	Refactor TriangularView to handle both dense and sparse objects. Introduce a ↵	Gael Guennebaud	2014-07-22
\| \| \| \| \| \| \| \|	glu_shape<S1,S2> helper to assemble sparse/dense shapes with triagular/seladjoint views.
\| *	merge with default branch	Gael Guennebaud	2014-07-10
\| \|\ \| \|/ \|/\|
* \|	Fix many long to int implicit conversions	Gael Guennebaud	2014-07-08
\| \|
\| *	Split StorageKind promotion into two helpers: one for products, and one for ↵	Gael Guennebaud	2014-07-01
\| \| \| \| \| \| \| \|	coefficient-wise operations.
\| *	Backport changes from old to new expression engines	Gael Guennebaud	2014-06-20
\| \|
\| *	merge with default branch	Gael Guennebaud	2014-06-20
\| \|\ \| \|/ \|/\|
\| *	1- Introduce sub-evaluator types for unary, binary, product, and map ↵	Gael Guennebaud	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \|	expressions to ease specializing them. 2- Remove a lot of code which should not be there with evaluators, in particular coeff/packet methods implemented in the expressions.
* \|	fixed warning: -Wunused-local-typedefs	Mark Borgerding	2014-06-17
\| \|
* \|	Missed to remove IACA_END in previous commit	Christoph Hertzberg	2014-05-05
\| \|
* \|	Removed IACA-defines	Christoph Hertzberg	2014-05-05
\| \| \| \| \| \| \| \|	This caused redefinition warnings if IACA headers were included from elsewhere. For a clean solution we should define our own EIGEN_IACA_* macros
* \|	TRMM: Make sure we have enough memory in rhs block to enforce alignment.	Gael Guennebaud	2014-04-25
\| \|
* \|	Make sure that calls to broadcast4 are 16 bytes aligned	Gael Guennebaud	2014-04-25
\| \|