eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Unify Altivec/VSX's pexp with generic implementation	Gael Guennebaud	2018-11-26
\|
*	Unify SSE and AVX implementation of pexp	Gael Guennebaud	2018-11-26
\|
*	Unify Altivec/VSX's plog with generic implementation, and enable it!	Gael Guennebaud	2018-11-26
\|
*	Unify NEON's plog with generic implementation	Gael Guennebaud	2018-11-26
\|
*	First step toward a unification of packet log implementation, currently only ↵	Gael Guennebaud	2018-11-26
\| \| \| \| \| \|	SSE and AVX are unified. To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
*	Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"	Gael Guennebaud	2018-11-26
\|
*	Extend unit test to recursively check half-packet types and non packet types	Gael Guennebaud	2018-11-26
\|
*	bug #1611: fix plog(0) on NEON	Gael Guennebaud	2018-11-26
\|
*	Fix typos	Patrik Huber	2018-11-23
\|
*	merge	Gael Guennebaud	2018-11-23
\|\
* \|	Fix reserved usage of double __ in macro names	Gael Guennebaud	2018-11-23
\| \|
* \|	check two ctors	Gael Guennebaud	2018-11-23
\| \|
* \|	Fix double = bool !	Gael Guennebaud	2018-11-23
\| \|
* \|	Fix several uninitialized member from ctor	Gael Guennebaud	2018-11-23
\| \|
\| *	Add default constructor to Bar to make test compile again with clang-3.8	Christoph Hertzberg	2018-11-23
\| \|
\| *	Small typo found be Patrick Huber (pull request PR-547)	Christoph Hertzberg	2018-11-23
\|/
*	bug #1624: improve matrix-matrix product on ARM 64, 20% speedup	Gael Guennebaud	2018-11-23
\|
*	Move regression test to right unit test file	Gael Guennebaud	2018-11-21
\|
*	Workaround weird MSVC bug	Gael Guennebaud	2018-11-21
\|
*	Fixed most conversion warnings in MatrixFunctions module	Christoph Hertzberg	2018-11-20
\|
*	Make MaxPacketSize a true upper bound, even for fixed-size inputs	Gael Guennebaud	2018-11-16
\|
*	Add explicit regression test for bug #1622	Gael Guennebaud	2018-11-16
\|
*	PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals	Mark D Ryan	2018-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit aa110e681b8b2237757a652ba47da49e1fbd2cd6 optimised the multiplication of small dyanmically sized matrices by restricting the packet size to a maximum of 4, increasing the chances that SIMD instructions are used in the computation. However, it introduced a mismatch between the packet size and the requestedAlignment. This mismatch can lead to crashes when the destination is not aligned. This patch fixes the issue by ensuring that the AssignmentTraits are correctly computed when using a restricted packet size. * * * Bind LinearPacketType to MaxPacketSize This commit applies any packet size limit specified when instantiating copy_using_evaluator_traits to the LinearPacketType, providing that the size of the destination is not known at compile time. * * * Add unit test for restricted packet assignment A new unit test is added to check that multiplication of small dynamically sized matrices works correctly when the packet size is restricted to 4 and the destination is unaligned.
*	Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES	Nikolaus Demmel	2018-11-14
\|
*	typo	Gael Guennebaud	2018-11-14
\|
*	help doxygen linking to DenseBase::NulllaryExpr	Gael Guennebaud	2018-11-14
\|
*	Improve doc on multi-threading and warn about hyper-threading	Gael Guennebaud	2018-11-14
\|
*	doxygen does not like \addtogroup and \ingroup in the same line	Gael Guennebaud	2018-11-14
\|
*	Merged in rmlarsen/eigen2 (pull request PR-543)	Rasmus Munk Larsen	2018-11-13
\|\ \| \| \| \| \| \| \| \| \| \|	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
\| *	Remove accidental changes.	Rasmus Munk Larsen	2018-11-12
\| \|
\| *	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number ↵	Rasmus Munk Larsen	2018-11-12
\| \| \| \| \| \| \| \|	of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
* \|	[PATCH 1/2] Misc. typos	luz.paz"	2018-09-18
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)
*	Add optimized version of logistic function for float. As an example, this is ↵	Rasmus Munk Larsen	2018-11-12
\| \| \| \|	about 50% faster than the existing version on Haswell using AVX.
*	Add manual doc on STL-compatible iterators	Gael Guennebaud	2018-11-12
\|
*	Fix warning in c++03	Gael Guennebaud	2018-11-10
\|
*	A few small fixes to a) prevent throwing in ctors and dtors of the threading ↵	Rasmus Munk Larsen	2018-11-09
\| \| \| \|	code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
*	bug #1619: fix mixing of const and non-const generic iterators	Gael Guennebaud	2018-11-09
\|
*	bug #1619: make const and non-const iterators compatible	Gael Guennebaud	2018-11-09
\|
*	add missing ref to a.zeta(b)	Gael Guennebaud	2018-11-09
\|
*	Limit the size of the toc	Gael Guennebaud	2018-11-09
\|
*	Update doxy hacks wrt doxygen 1.8.13/14	Gael Guennebaud	2018-11-09
\|
*	Let doxygen sees lastN	Gael Guennebaud	2018-11-09
\|
*	Add and update manual pages for slicing, indexing, and reshaping.	Gael Guennebaud	2018-11-09
\|
*	Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE	Gael Guennebaud	2018-11-09
\|
*	Fix max-size in indexed-view	Gael Guennebaud	2018-11-08
\|
*	Merged in glchaves/eigen (pull request PR-539)	Gael Guennebaud	2018-11-07
\|\ \| \| \| \| \| \|	Vectorize row-by-row gebp loop iterations on 16 packets as well
* \|	Add option to disable plot generation	Gael Guennebaud	2018-11-07
\| \|
\| *	Vectorize row-by-row gebp loop iterations on 16 packets as well	Gustavo Lima Chaves	2018-11-06
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com> Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
* \|	PR 526: Speed up multiplication of small, dynamically sized matrices	Mark D Ryan	2018-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Packet16f, Packet8f and Packet8d types are too large to use with dynamically sized matrices typically processed by the SliceVectorizedTraversal specialization of the dense_assignment_loop. Using these types is likely to lead to little or no vectorization. Significant slowdown in the multiplication of these small matrices can be observed when building with AVX and AVX512 enabled. This patch introduces a new dense_assignment_kernel that is used when computing small products whose operands have dynamic dimensions. It ensures that the PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used when computing the product. I tested all 969 possible combinations of M, K, and N that are handled by the dense_assignment_loop on x86 builds. Although a few combinations are slowed down by this patch they are far outnumbered by the cases that are sped up, as the following results demonstrate. Disabling Packed8d on AVX512 builds: Total Cases: 969 Better: 511 Worse: 85 Same: 373 Max Improvement: 169.00% (4 8 6) Max Degradation: 36.50% (8 5 3) Median Improvement: 35.46% Median Degradation: 17.41% Total FLOPs Improvement: 19.42% Disabling Packet16f and Packed8f on AVX512 builds: Total Cases: 969 Better: 658 Worse: 5 Same: 306 Max Improvement: 214.05% (8 6 5) Max Degradation: 22.26% (16 2 1) Median Improvement: 60.05% Median Degradation: 13.32% Total FLOPs Improvement: 59.58% Disabling Packed8f on AVX builds: Total Cases: 969 Better: 663 Worse: 96 Same: 210 Max Improvement: 155.29% (4 10 5) Max Degradation: 35.12% (8 3 2) Median Improvement: 34.28% Median Degradation: 15.05% Total FLOPs Improvement: 26.02%
* \|	Fix code format	Eugene Zhulenev	2018-11-02
\| \|