eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Optimize AVX pset1 for complexes and ploaddup	Gael Guennebaud	2014-04-17
\|
*	Fix and optimize mixed products	Gael Guennebaud	2014-04-17
\|
*	Optimize ploaddup for AVX	Gael Guennebaud	2014-04-17
\|
*	Fallback to lazy products for very small ones.	Gael Guennebaud	2014-04-16
\|
*	Enable alloca on MAC OSX	Gael Guennebaud	2014-04-16
\|
*	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵	Gael Guennebaud	2014-04-16
\| \| \| \| \| \|	speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
*	Optimized SSE unaligned loads and stores when compiling a 64bit target with ↵	Benoit Steiner	2014-04-14
\| \| \| \|	a recent version of gcc (ie gcc 4.8).
*	Deleted some dead code.	Benoit Steiner	2014-04-04
\|
*	Finally, prefetching seems to help getting more stable performance	Gael Guennebaud	2014-03-31
\|
*	Workaround alignment warnings	Gael Guennebaud	2014-03-30
\|
*	Optimize gebp kernel:	Gael Guennebaud	2014-03-30
\| \| \| \| \|	1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs
*	Vectorized the loop peeling of the inner loop of the block-panel matrix ↵	Benoit Steiner	2014-03-28
\| \| \| \|	multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size.
*	Add a mechanism to recursively access to half-size packet types	Gael Guennebaud	2014-03-28
\|
*	merge with default branch	Gael Guennebaud	2014-03-28
\|\
* \|	Enable vectorization of gemv for PacketSize>4 through unaligned loads (still ↵	Gael Guennebaud	2014-03-28
\| \| \| \| \| \| \| \|	better than no vectorization)
* \|	Merged latest changes from parent.	Benoit Steiner	2014-03-27
\|\ \
* \| \|	Implemented the SSE version of the gather and scatter packet primitives.	Benoit Steiner	2014-03-27
\| \| \|
* \| \|	Implemented the AVX version of the gather and scatter packet primitives.	Benoit Steiner	2014-03-27
\| \| \|
* \| \|	Introduced pscatter/pgather packet primitives. They will be used to optimize ↵	Benoit Steiner	2014-03-27
\| \| \| \| \| \| \| \| \| \| \| \|	the loop peeling code of the block-panel matrix multiplication kernel.
\| * \|	enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates ↵	Gael Guennebaud	2014-03-27
\|/ / \| \| \| \| \| \|	the other fmadd variants plus some register moves...)
* \|	Fixed compilation error when FMA instructions are enabled.	Benoit Steiner	2014-03-27
\| \|
* \|	Silenced "unused variable" warnings when compiling with FMA.	Benoit Steiner	2014-03-27
\| \|
* \|	Vectorized the packing of a col-major matrix used as the right hand side ↵	Benoit Steiner	2014-03-27
\| \| \| \| \| \| \| \|	argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance.
* \|	Vectorized the packing of a row-major matrix used as the left hand side ↵	Benoit Steiner	2014-03-27
\| \| \| \| \| \| \| \|	argument in a matrix-matrix product.
* \|	Implemented the AVX version of the ptranspose packet primitive.	Benoit Steiner	2014-03-27
\| \|
* \|	Implement pcplflip, palign, predux and the likes from AVC/complexes	Gael Guennebaud	2014-03-27
\| \|
\| *	Fix warning	Gael Guennebaud	2014-03-27
\| \|
* \|	Created the ptranspose packet primitive that can transpose an array of N ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \|	packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions.
* \|	Made sure that the version of gemm_pack_rhs specialized for row major ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \|	matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode).
* \|	Specialized the pload1 packet primitive for Packet8f and Packet4d in order ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \|	to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible.
* \|	Merged latest updates from the parent branch	Benoit Steiner	2014-03-26
\|\ \
\| \| *	Update gebp kernel to process a panle of 4 columns at once for the remaining ↵	Gael Guennebaud	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \|	ones.
\| \| *	Remove remaining bits of the dead working buffer	Gael Guennebaud	2014-03-26
\| \|/
* \|	Vectorized the multiplication and division of complex numbers using AVX ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \|	instructions.
* \|	Used AVX instructions to vectorize the complex version of the pfirst and ↵	Benoit Steiner	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \|	ploaddup packet primitives. Silenced a few compilation warnings.
\| *	Implement new 1 packet x 8 gebp kernel	Gael Guennebaud	2014-03-26
\| \|
\| *	add pbroadcast2/4 generic intrinsics	Gael Guennebaud	2014-03-26
\| \|
* \|	Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, ↵	Benoit Steiner	2014-03-25
\| \| \| \| \| \| \| \|	preverse<Packet2cd>, and preverse<Packet4cf>
* \|	Used AVX instructions to vectorize the predux_min<Packet8f>, ↵	Benoit Steiner	2014-03-24
\| \| \| \| \| \| \| \|	predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives.
* \|	Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to ↵	Benoit Steiner	2014-03-21
\| \| \| \| \| \| \| \|	true to prevent build failures when vectorization is disabled.
* \|	Merged latest changes from the parent	Benoit Steiner	2014-03-18
\|\ \
* \ \	Merged latest changes from the main trunk	Benoit Steiner	2014-02-24
\|\ \ \
* \ \ \	Pulled latest changes from the Eigen main trunk	Benoit Steiner	2014-02-24
\|\ \ \ \
\| \| * \| \|	Merged eigen/eigen into default	Benoit Steiner	2014-02-24
\| \|/\| \| \|
* \| \| \| \|	Added support for FMA instructions	Benoit Steiner	2014-02-24
\| \| \| \| \|
\| * \| \| \|	Merged the latest version of the code from eigen/eigen	Benoit Steiner	2014-02-18
\|/\| \| \| \|
* \| \| \| \|	Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a ↵	Benoit Steiner	2014-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.
* \| \| \| \|	Added support for AVX to Eigen.	Benoit Steiner	2014-01-29
\| \| \| \| \|
\| \| \| \| *	Improved the efficiency if the block-panel matrix multiplication code: the ↵	Benoit Steiner	2014-01-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize.
\| \| \| \| *	Fix bug #222. Make temporary matrix column-major independently of ↵	Christoph Hertzberg	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EIGEN_DEFAULT_TO_ROW_MAJOR