eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
\| *	bug #958 - Allow testing specific blocking sizes	Benoit Jacob	2015-02-18
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	This is only a debugging/testing patch. It allows testing specific product blocking sizes, typically to study the impact on performance. Example usage: int testk, testm, testn; #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZES #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K testk #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M testm #define EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N testn #include <Eigen/Core>
*	Fix bug #945: workaround MSVC warning	Gael Guennebaud	2015-02-18
\|
*	bug #953 - Fix prefetches in 3px4 product kernel	Benoit Jacob	2015-02-13
\| \| \| \|	This gives a 10% speedup on nexus 4 and on nexus 5.
*	Pulled the latest changes from the trunk	Benoit Steiner	2015-02-06
\|\
\| *	bug #936, patch 1.5/3: rename _FUSED_ macros to _SINGLE_INSTRUCTION_,	Benoit Jacob	2015-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	because this is what they are about. "Fused" means "no intermediate rounding between the mul and the add, only one rounding at the end". Instead, what we are concerned about here is whether a temporary register is needed, i.e. whether the MUL and ADD are separate instructions. Concretely, on ARM NEON, a single-instruction mul-add is always available: VMLA. But a true fused mul-add is only available on VFPv4: VFMA.
\| *	bug #936, patch 1/3: some cleanup and renaming for consistency.	Benoit Jacob	2015-01-30
\| \|
\| *	bug #935: Add asm comments in GEBP kernels to work around a bug	Benoit Jacob	2015-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in both GCC and Clang on ARM/NEON, whereby they spill registers, severely harming performance. The reason why the asm comments make a difference is that they prevent the compiler from reordering code across these boundaries, which has the effect of extending the lifetime of local variables and increasing register pressure on this register-tight code.
* \|	Made the blocking computation aware of the l3 cache	Benoit Steiner	2014-10-15
\| \| \| \| \| \| \| \|	Also optimized the blocking parameters to take into account the number of threads used for a computation
* \|	Generalized the gebp apis	Benoit Steiner	2014-10-02
\| \|
\| *	Initial VSX commit	Konstantinos Margaritis	2014-08-29
\|/
*	Missed to remove IACA_END in previous commit	Christoph Hertzberg	2014-05-05
\|
*	Removed IACA-defines	Christoph Hertzberg	2014-05-05
\| \| \| \|	This caused redefinition warnings if IACA headers were included from elsewhere. For a clean solution we should define our own EIGEN_IACA_* macros
*	Product kernel: skip loop on columns if there is no remaining rows	Gael Guennebaud	2014-04-25
\|
*	Fix for mixed products	Gael Guennebaud	2014-04-25
\|
*	Disable 3pX4 kernel on Altivec: despite this platform has 32 registers, this ↵	Gael Guennebaud	2014-04-25
\| \| \| \|	version seems significantly slower.
*	Minor optimizations in product kernel:	Gael Guennebaud	2014-04-25
\| \| \| \| \|	- use pbroadcast4 (helpful when AVX is not available) - process all remaining rows at once (significant speedup for small matrices)
*	Enable vectorization of pack_rhs with a column-major RHS.	Gael Guennebaud	2014-04-25
\| \| \| \|	Rename and generalize Kernel<> to PacketBlock<,N>.
*	Enable fused madd for Altivec	Gael Guennebaud	2014-04-24
\|
*	Smarter block size computation	Gael Guennebaud	2014-04-18
\|
*	Fix and optimize mixed products	Gael Guennebaud	2014-04-17
\|
*	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵	Gael Guennebaud	2014-04-16
\| \| \| \| \| \|	speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
*	Finally, prefetching seems to help getting more stable performance	Gael Guennebaud	2014-03-31
\|
*	Optimize gebp kernel:	Gael Guennebaud	2014-03-30
\| \| \| \| \|	1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
*	Vectorized the loop peeling of the inner loop of the block-panel matrix ↵	Benoit Steiner	2014-03-28
\| \| \| \|	multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
*	merge with default branch	Gael Guennebaud	2014-03-28
\|\
* \|	Fixed compilation error when FMA instructions are enabled.	Benoit Steiner	2014-03-27
\| \|
* \|	Silenced "unused variable" warnings when compiling with FMA.	Benoit Steiner	2014-03-27
\| \|
* \|	Vectorized the packing of a col-major matrix used as the right hand side ↵	Benoit Steiner	2014-03-27
\| \| \| \| \| \| \| \|	argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* \|	Vectorized the packing of a row-major matrix used as the left hand side ↵	Benoit Steiner	2014-03-27
\| \| \| \| \| \| \| \|	argument in a matrix-matrix product.
\| *	Fix warning	Gael Guennebaud	2014-03-27
\| \|
* \|	Made sure that the version of gemm_pack_rhs specialized for row major ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \|	matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
* \|	Merged latest updates from the parent branch	Benoit Steiner	2014-03-26
\|\ \
\| \| *	Update gebp kernel to process a panle of 4 columns at once for the remaining ↵	Gael Guennebaud	2014-03-26
\| \|/ \| \| \| \| \| \|	ones.
\| *	Implement new 1 packet x 8 gebp kernel	Gael Guennebaud	2014-03-26
\| \|
* \|	Merged latest changes from the parent	Benoit Steiner	2014-03-18
\|\ \
* \ \	Merged latest changes from the main trunk	Benoit Steiner	2014-02-24
\|\ \ \
* \| \| \|	Added support for FMA instructions	Benoit Steiner	2014-02-24
\| \| \| \|
\| \| \| *	Improved the efficiency if the block-panel matrix multiplication code: the ↵	Benoit Steiner	2014-01-02
\| \| \|/ \| \| \| \| \| \| \| \| \|	change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize.
\| \| *	Use vectorization when packing row-major rhs matrices. (bug #717)	Benoit Steiner	2013-12-17
\| \|/
\| *	Implement bug #317: use a template function call to suppress unused variable ↵	Gael Guennebaud	2014-02-24
\| \| \| \| \| \| \| \|	warnings. This also fix the issue of the previous changeset in a much nicer way.
\| *	Workaround clang ABI change with unsed arguments (ugly fix)	Gael Guennebaud	2014-02-24
\|/
*	Fix "routine is both "inline" and "noinline"" warnings	Gael Guennebaud	2013-02-28
\|
*	Fix bug #551: compilation issue when using EIGEN_DEFAULT_DENSE_INDEX_TYPE	Gael Guennebaud	2013-02-09
\|
*	fix bug #495: remove too aggressive EIGEN_FLATTEN_ATTRIB attribute	Gael Guennebaud	2012-08-02
\| \| \| \|	(after some benchmarking, it was not useful anymore)
*	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	Benoit Jacob	2012-07-13
\|
*	bug #466: better fix for the race condition: this new patch add an ↵	Gael Guennebaud	2012-06-14
\| \| \| \| \| \| \|	initParallel() function which must be called at the initialization time of any multi-threaded application calling Eigen from multiple threads.
*	Fix bug #466: race condition destected by helgrind in manage_caching_sizes.	Gael Guennebaud	2012-06-08
\| \| \| \|	After all, the solution based on threadprivate is not that costly.
*	Get rid of include directives inside namespace blocks (bug #339).	Jitse Niesen	2012-04-15
\|
*	fix conjugation in packet_lhs	Gael Guennebaud	2012-02-05
\|
*	add missing inline keyword (linking issue)	Gael Guennebaud	2012-01-26
\|