eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Fully qualify Eigen::internal::aligned_free	Sam Hasinoff	2019-03-02
\| \| \| \| \| \| \|	This helps avoids a conflict on certain Windows toolchains (potentially due to some ADL name resolution bug) in the case where aligned_free is defined in the global namespace. In any case, tightening this up is harmless.
*	fix alignment in ploadquad	Gael Guennebaud	2019-02-22
\|
*	AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM	Gael Guennebaud	2019-02-21
\|
*	bug #1674: workaround clang fast-math aggressive optimizations	Gael Guennebaud	2019-02-22
\|
*	Fix compilation on ARM.	Gael Guennebaud	2019-02-22
\|
*	Speed up col/row-wise reverse for fixed size matrices by propagating ↵	Gael Guennebaud	2019-02-21
\| \| \| \|	compile-time sizes.
*	Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.	Rasmus Munk Larsen	2019-02-21
\|
*	Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.	Gael Guennebaud	2019-02-20
\|
*	Update documentation of Matrix and Array type aliases.	Gael Guennebaud	2019-02-20
\|
*	Protect c++11 type alias with Eigen's macro, and add respective unit test.	Gael Guennebaud	2019-02-20
\|
*	Merged in ra_bauke/eigen (pull request PR-180)	Gael Guennebaud	2019-02-20
\|\ \| \| \| \| \| \| \| \| \| \|	alias template for matrix and array classes, see also bug #864 Approved-by: Heiko Bauke <heiko.bauke@mail.de>
* \|	bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:	Gael Guennebaud	2019-02-20
\| \| \| \| \| \| \| \| \| \|	- this helps clang 5 and 6 to support alignas in STL's containers. - this makes the public API of our (and users) classes cleaner
* \|	Commas at the end of enumerator lists are not allowed in C++03	Christoph Hertzberg	2019-02-19
\| \|
* \|	Add C++17 detection macro, and make sure throw(xpr) is not used if the ↵	Gael Guennebaud	2019-02-19
\| \| \| \| \| \| \| \|	compiler is in c++17 mode.
* \|	Fix harmless Scalar vs RealScalar cast.	Gael Guennebaud	2019-02-18
\| \|
* \|	Fix regression: .conjugate() was popped out but not re-introduced.	Gael Guennebaud	2019-02-18
\| \|
* \|	Set cost of conjugate to 0 (in practice it boils down to a no-op).	Gael Guennebaud	2019-02-18
\| \| \| \| \| \| \| \| \| \|	This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
* \|	GEMM: catch all scalar-multiple variants when falling-back to a coeff-based ↵	Gael Guennebaud	2019-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	product. Before only sAB was caught which was both inconsistent with GEMM, sub-optimal, and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
* \|	Guard C++11-style default constructor. Also, this is only needed for MSVC	Christoph Hertzberg	2019-02-16
\| \|
* \|	bug #1680: improve MSVC inlining by declaring many triavial constructors and ↵	Gael Guennebaud	2019-02-15
\| \| \| \| \| \| \| \|	accessors as STRONG_INLINE.
* \|	bug #1678: Fix lack of __FMA__ macro on MSVC with AVX512	Gael Guennebaud	2019-02-15
\| \|
* \|	bug #1678: workaround MSVC compilation issues with AVX512	Gael Guennebaud	2019-02-15
\| \|
* \|	Revert ↵	Rasmus Munk Larsen	2019-02-14
\| \| \| \| \| \| \| \| \| \| \| \|	https://bitbucket.org/eigen/eigen/commits/b55b5c7280a0481f01fe5ec764d55c443a8b6496 .
* \|	Fix compilation of empty products of the form: Mx0 * 0xN	Gael Guennebaud	2019-02-11
\| \|
* \|	Make GEMM fallback to GEMV for runtime vectors.	Gael Guennebaud	2019-02-07
\| \| \| \| \| \| \| \| \| \|	This is a more general and simpler version of changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2
* \|	Backed out changeset 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2	Gael Guennebaud	2019-02-07
\| \|
* \|	bug #1676: workaround GCC's bug in c++17 mode.	Gael Guennebaud	2019-02-07
\| \|
* \|	Remove duplicated comment line	Eugene Zhulenev	2019-02-04
\| \|
* \|	Fix GeneralBlockPanelKernel Android compilation	Eugene Zhulenev	2019-02-04
\| \|
* \|	bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization ↵	Gael Guennebaud	2019-02-03
\| \| \| \| \| \| \| \|	(results are completely wrong otherwise)
* \|	Merged in rmlarsen/eigen (pull request PR-578)	Rasmus Larsen	2019-02-02
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Speed up Eigen matrixvector and vectormatrix multiplication. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* \| \|	Speed up row-major matrix-vector product on ARM	Sameer Agarwal	2019-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The row-major matrix-vector multiplication code uses a threshold to check if processing 8 rows at a time would thrash the cache. This change introduces two modifications to this logic. 1. A smaller threshold for ARM and ARM64 devices. The value of this threshold was determined empirically using a Pixel2 phone, by benchmarking a large number of matrix-vector products in the range [1..4096]x[1..4096] and measuring performance separately on small and little cores with frequency pinning. On big (out-of-order) cores, this change has little to no impact. But on the small (in-order) cores, the matrix-vector products are up to 700% faster. Especially on large matrices. The motivation for this change was some internal code at Google which was using hand-written NEON for implementing similar functionality, processing the matrix one row at a time, which exhibited substantially better performance than Eigen. With the current change, Eigen handily beats that code. 2. Make the logic for choosing number of simultaneous rows apply unifiormly to 8, 4 and 2 rows instead of just 8 rows. Since the default threshold for non-ARM devices is essentially unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM performance. This was verified by running the same set of benchmarks on a Xeon desktop.
\| * \|	Speed up Eigen matrixvector and vectormatrix multiplication.	Rasmus Munk Larsen	2019-01-31
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector. The benchmarks below test c.noalias()= n_by_n_matrix * n_by_1_matrix; c.noalias()= 1_by_n_matrix * n_by_n_matrix; respectively. Benchmark measurements: SSE: Run on * (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 1096 312 +71.5% BM_MatVec/128 4581 1464 +68.0% BM_MatVec/256 18534 5710 +69.2% BM_MatVec/512 118083 24162 +79.5% BM_MatVec/1k 704106 173346 +75.4% BM_MatVec/2k 3080828 742728 +75.9% BM_MatVec/4k 25421512 4530117 +82.2% BM_VecMat/32 352 130 +63.1% BM_VecMat/64 1213 425 +65.0% BM_VecMat/128 4640 1564 +66.3% BM_VecMat/256 17902 5884 +67.1% BM_VecMat/512 70466 24000 +65.9% BM_VecMat/1k 340150 161263 +52.6% BM_VecMat/2k 1420590 645576 +54.6% BM_VecMat/4k 8083859 4364327 +46.0% AVX2: Run on * (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 619 120 +80.6% BM_MatVec/128 9693 752 +92.2% BM_MatVec/256 38356 2773 +92.8% BM_MatVec/512 69006 12803 +81.4% BM_MatVec/1k 443810 160378 +63.9% BM_MatVec/2k 2633553 646594 +75.4% BM_MatVec/4k 16211095 4327148 +73.3% BM_VecMat/64 925 227 +75.5% BM_VecMat/128 3438 830 +75.9% BM_VecMat/256 13427 2936 +78.1% BM_VecMat/512 53944 12473 +76.9% BM_VecMat/1k 302264 157076 +48.0% BM_VecMat/2k 1396811 675778 +51.6% BM_VecMat/4k 8962246 4459010 +50.2% AVX512: Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00 CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_MatVec/64 401 111 +72.3% BM_MatVec/128 1846 513 +72.2% BM_MatVec/256 36739 1927 +94.8% BM_MatVec/512 54490 9227 +83.1% BM_MatVec/1k 487374 161457 +66.9% BM_MatVec/2k 2016270 643824 +68.1% BM_MatVec/4k 13204300 4077412 +69.1% BM_VecMat/32 324 106 +67.3% BM_VecMat/64 1034 246 +76.2% BM_VecMat/128 3576 802 +77.6% BM_VecMat/256 13411 2561 +80.9% BM_VecMat/512 58686 10037 +82.9% BM_VecMat/1k 320862 163750 +49.0% BM_VecMat/2k 1406719 651397 +53.7% BM_VecMat/4k 7785179 4124677 +47.0% Currently watchingStop watching
* \|	GEBP: improves pipelining in the 1pX4 path with FMA.	Gael Guennebaud	2019-01-30
\| \| \| \| \| \| \| \| \| \|	Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA. With AVX+FMA I measured a speed up of about x1.25 in such cases.
* \|	Fix compilation with ARM64.	Gael Guennebaud	2019-01-30
\| \|
* \|	Fix conflicts and merge	Gael Guennebaud	2019-01-30
\|\ \
* \| \|	According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101, the ↵	Gael Guennebaud	2019-01-30
\| \| \| \| \| \| \| \| \| \| \| \|	previous GCC issue is fixed in GCC trunk (will be gcc 9).
* \| \|	ARM64 & GEBP: add specialization for double +30% speed up	Gael Guennebaud	2019-01-30
\| \| \|
* \| \|	ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in ↵	Gael Guennebaud	2019-01-30
\| \| \| \| \| \| \| \| \| \| \| \|	generating good ASM
* \| \|	Renaming some more `I` identifiers	Christoph Hertzberg	2019-01-26
\| \| \|
* \| \|	Fix compilation error in NEON GEBP specializaition of madd.	Rasmus Munk Larsen	2019-01-25
\| \| \|
* \| \|	cleanup	Gael Guennebaud	2019-01-24
\| \| \|
* \| \|	PR 574: use variadic template instead of initializer_list to implement ↵	David Tellenbach	2019-01-23
\| \| \| \| \| \| \| \| \| \| \| \|	fixed-size vector ctor from coefficients.
* \| \|	Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.	Gael Guennebaud	2019-01-22
\| \| \|
* \| \|	PR 572: Add initializer list constructors to Matrix and Array (include unit ↵	David Tellenbach	2019-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests and doc) - {1,2,3,4,5,...} for fixed-size vectors only - {{1,2,3},{4,5,6}} for the general cases - {{1,2,3,4,5,....}} is allowed for both row and column-vector
* \| \|	Replace host_define.h with cuda_runtime_api.h	nluehr	2019-01-18
\| \| \|
* \| \|	Mask unused-parameter warnings, when building with NDEBUG	Christoph Hertzberg	2019-01-18
\| \| \|
* \| \|	Add missing logical packet ops for GPU and NEON.	Rasmus Munk Larsen	2019-01-17
\| \| \|
* \| \|	Remove some useless const_cast	Gael Guennebaud	2019-01-17
\| \| \|
* \| \|	Make nestByValue works again (broken since 3.3) and add unit tests.	Gael Guennebaud	2019-01-17
\| \| \|