eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Eliminate boolean product warnings by factoring out a	Christoph Hertzberg	2021-01-05
\| \| \|	`combine_scalar_factors` helper function.
*	Remove redundant branch for handling dynamic vector*vector. This will be ↵	Rasmus Munk Larsen	2020-11-12
\| \| \| \|	handled by the equivalent branch in the specialization for GemvProduct.
*	Optimize matrixmatrix and matrixvector products when they correspond to ↵	Rasmus Munk Larsen	2020-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inner products at runtime. This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k). name old time/op new time/op delta BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~ BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~ BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66% BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~ BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26% BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~ BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~ BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~ BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35% BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00% BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57% BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52% BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72% BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89% BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~ BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~ BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91% BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~ BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57% BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07% BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11% BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23% BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~ BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~
*	bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1	Gael Guennebaud	2019-09-10
\|
*	Make GEMM fallback to GEMV for runtime vectors.	Gael Guennebaud	2019-02-07
\| \| \| \| \|	This is a more general and simpler version of changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2
*	Backed out changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2	Gael Guennebaud	2019-02-07
\|
*	Speed up Eigen matrixvector and vectormatrix multiplication.	Rasmus Munk Larsen	2019-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on * (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on * (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching
*	Fix regression introduced by the previous fix for AVX512.	Gael Guennebaud	2018-09-20
\| \| \| \|	It brokes the complex-complex case on SSE.
*	Fix gebp kernel for real+complex in case only reals are vectorized (e.g., ↵	Gael Guennebaud	2018-09-20
\| \| \| \| \| \|	AVX512). This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
*	bug #1572: use c++11 atomic instead of volatile if c++11 is available, and ↵	Gael Guennebaud	2018-07-17
\| \| \| \|	disable multi-threaded GEMM on non-x86 without c++11.
*	bug #1562: optimize evaluation of small products of the form sAB by ↵	Gael Guennebaud	2018-07-02
\| \| \| \|	rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x...
*	Make the threshold from gemm to coeff-based-product configurable, and add ↵	Gael Guennebaud	2017-08-24
\| \| \| \|	some explanations.
*	bug #1369: fix type mismatch warning.	Gael Guennebaud	2016-12-28
\| \| \| \| \|	Returned values of omp thread id and numbers are int, o let's use int instead of Index here.
*	Add a simple cost model to prevent Eigen's parallel GEMM from using too many ↵	Rasmus Munk Larsen	2016-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	threads when the inner dimension is small. Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K. Improvements in Wall time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 3088 1610 +47.9% BM_OuterishProd/64/4 3562 2414 +32.2% BM_OuterishProd/64/32 8861 7815 +11.8% BM_OuterishProd/128/1 11363 6504 +42.8% BM_OuterishProd/128/4 11128 9794 +12.0% BM_OuterishProd/128/64 27691 27396 +1.1% BM_OuterishProd/256/1 33214 28123 +15.3% BM_OuterishProd/256/4 34312 36818 -7.3% BM_OuterishProd/256/128 174866 176398 -0.9% BM_OuterishProd/512/1 7963684 104224 +98.7% BM_OuterishProd/512/4 7987913 112867 +98.6% BM_OuterishProd/512/256 8198378 1306500 +84.1% BM_OuterishProd/1k/1 7356256 324432 +95.6% BM_OuterishProd/1k/4 8129616 331621 +95.9% BM_OuterishProd/1k/512 27265418 7517538 +72.4% Improvements in CPU time: Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00 CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_OuterishProd/64/1 6169 1608 +73.9% BM_OuterishProd/64/4 7117 2412 +66.1% BM_OuterishProd/64/32 17702 15616 +11.8% BM_OuterishProd/128/1 45415 6498 +85.7% BM_OuterishProd/128/4 44459 9786 +78.0% BM_OuterishProd/128/64 110657 109489 +1.1% BM_OuterishProd/256/1 265158 28101 +89.4% BM_OuterishProd/256/4 274234 183885 +32.9% BM_OuterishProd/256/128 1397160 1408776 -0.8% BM_OuterishProd/512/1 78947048 520703 +99.3% BM_OuterishProd/512/4 86955578 1349742 +98.4% BM_OuterishProd/512/256 74701613 15584661 +79.1% BM_OuterishProd/1k/1 78352601 3877911 +95.1% BM_OuterishProd/1k/4 78521643 3966221 +94.9% BM_OuterishProd/1k/512 258104736 89480530 +65.3%
*	Relax mixing-type constraints for binary coefficient-wise operators:	Gael Guennebaud	2016-06-06
\| \| \| \| \| \| \| \| \| \|	- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP> - Remove the "functor_is_product_like" helper (was pretty ugly) - Currently, OP is not used, but it is available to the user for fine grained tuning - Currently, only the following operators have been generalized: ,/,+,-,=,=,/=,+=,-= - TODO: generalize all other binray operators (comparisons,pow,etc.) - TODO: handle "scalar op array" operators (currently only * is handled) - TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
*	Introduce internal's UIntPtr and IntPtr types for pointer to integer ↵	Gael Guennebaud	2016-05-26
\| \| \| \| \| \| \| \|	conversions. This fixes "conversion from pointer to same-sized integral type" warnings by ICC. Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only, let's be safe.
*	Re-enable blocking on rows in non-l3 blocking mode.	Gael Guennebaud	2016-01-26
\|
*	bug #1151: remove useless critical section	Gael Guennebaud	2016-01-21
\|
*	Fixes internal compiler error while compiling with VC2015 Update1 x64.	Nikolay Fedorov	2015-12-03
\|
*	Enable runtime stack alignment in gemm_blocking_space.	Gael Guennebaud	2015-08-06
\|
*	bug #973: update macro-level control of alignement by introducing ↵	Gael Guennebaud	2015-07-29
\| \| \| \|	user-controllable EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES macros. This changeset also removes EIGEN_ALIGN (replaced by EIGEN_MAX_ALIGN_BYTES>0), EIGEN_ALIGN_STATICALLY (replaced by EIGEN_MAX_STATIC_ALIGN_BYTES>0), EIGEN_USER_ALIGN*, EIGEN_ALIGN_DEFAULT (replaced by EIGEN_ALIGN_MAX).
*	Remove a few deprecated internal expressions	Gael Guennebaud	2015-06-19
\|
*	bug #978: early return for vanishing products	Gael Guennebaud	2015-03-06
\|
*	Avoid packing rhs multiple-times when blocking on the lhs only.	Gael Guennebaud	2015-02-26
\|
*	Fix a regression when using OpenMP, and fix bug #714: the number of threads ↵	Gael Guennebaud	2015-02-18
\| \| \| \|	might be lower than the number of requested ones
*	The usage of DenseIndex is deprecated, so let's replace DenseIndex by Index	Gael Guennebaud	2015-02-16
\|
*	Remove deprecated usage of expr::Index.	Gael Guennebaud	2015-02-16
\|
*	Pulled the latest changes from the trunk	Benoit Steiner	2015-02-06
\|\
* \|	Made the blocking computation aware of the l3 cache	Benoit Steiner	2014-10-15
\| \| \| \| \| \| \| \|	Also optimized the blocking parameters to take into account the number of threads used for a computation
* \|	Generalized the gebp apis	Benoit Steiner	2014-10-02
\| \|
\| *	Remove deprecated code not used by evaluators	Gael Guennebaud	2014-09-18
\| \|
\| *	Backport changes from old to new expression engines	Gael Guennebaud	2014-06-20
\| \|
\| *	merge with default branch	Gael Guennebaud	2014-06-20
\| \|\ \| \|/ \|/\|
\| *	1- Introduce sub-evaluator types for unary, binary, product, and map ↵	Gael Guennebaud	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \|	expressions to ease specializing them. 2- Remove a lot of code which should not be there with evaluators, in particular coeff/packet methods implemented in the expressions.
* \|	Fix calls to lazy products (lazy product does not like matrices with 0 length)	Gael Guennebaud	2014-04-18
\| \|
* \|	Fixes for fixed sizes and non vectorizable types	Gael Guennebaud	2014-04-17
\| \|
* \|	Fallback to lazy products for very small ones.	Gael Guennebaud	2014-04-16
\| \|
* \|	New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge ↵	Gael Guennebaud	2014-04-16
\| \| \| \| \| \| \| \| \| \| \| \|	speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4.
* \|	Deleted some dead code.	Benoit Steiner	2014-04-04
\| \|
* \|	merge with default branch	Gael Guennebaud	2014-03-28
\|\ \
* \ \	Merged latest updates from the parent branch	Benoit Steiner	2014-03-26
\|\ \ \
\| \| * \|	Remove remaining bits of the dead working buffer	Gael Guennebaud	2014-03-26
\| \|/ /
\| \| *	Fix evaluators unit test (i.e., when only EIGEN_ENABLE_EVALUATORS is defined	Gael Guennebaud	2014-03-10
\| \| \|
\| \| *	Get rid of GeneralProduct<> for GemmProduct	Gael Guennebaud	2014-02-21
\| \| \|
* \| \|	Reverted the definition of the EIGEN_ALIGN to its former meaning (i.e. a ↵	Benoit Steiner	2014-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	boolean) Created a new EIGEN_ALIGN_BYTES define to encode how the data should be aligned Fixed a few remaining alignment issues exposed when the Eigen code is compiled with avx enabled. Created a new EIGEN_ALIGN_DEFAULT define, which is set to the minimum alignment value required for the chosen instruction set. Use this value instead of EIGEN_ALIGN32 to preserve the existing alignment on SSE/Altivec/Neon.
* \| \|	Added support for AVX to Eigen.	Benoit Steiner	2014-01-29
\| \|/ \|/\|
\| *	Improved the efficiency if the block-panel matrix multiplication code: the ↵	Benoit Steiner	2014-01-02
\|/ \| \| \|	change reduces the pressure on the L1 cache by removing the calls to gebp_traits::unpackRhs(). Instead the packetization of the rhs blocks is done on the fly in gebp_traits::loadRhs(). This adds numerous calls to pset1<ResPacket> (since we're packetizing on the fly in the inner loop) but this is more than compensated by the fact that we're decreasing the memory transfers by a factor RhsPacketSize.
*	Fix "routine is both "inline" and "noinline"" warnings	Gael Guennebaud	2013-02-28
\|
*	bug #482: pass scalar arguments by const references. Still remains a few ↵	Gael Guennebaud	2013-02-25
\| \| \| \|	cases that might affect the ABI (see the bug entry)
*	Automatic relicensing to MPL2 using Keirs script. Manual fixup follows.	Benoit Jacob	2012-07-13
\|