eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Add optimized version of logistic function for float. As an example, this is ↵	Rasmus Munk Larsen	2018-11-12
\| \| \| \|	about 50% faster than the existing version on Haswell using AVX.
*	Add manual doc on STL-compatible iterators	Gael Guennebaud	2018-11-12
\|
*	Fix warning in c++03	Gael Guennebaud	2018-11-10
\|
*	A few small fixes to a) prevent throwing in ctors and dtors of the threading ↵	Rasmus Munk Larsen	2018-11-09
\| \| \| \|	code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
*	bug #1619: fix mixing of const and non-const generic iterators	Gael Guennebaud	2018-11-09
\|
*	bug #1619: make const and non-const iterators compatible	Gael Guennebaud	2018-11-09
\|
*	add missing ref to a.zeta(b)	Gael Guennebaud	2018-11-09
\|
*	Limit the size of the toc	Gael Guennebaud	2018-11-09
\|
*	Update doxy hacks wrt doxygen 1.8.13/14	Gael Guennebaud	2018-11-09
\|
*	Let doxygen sees lastN	Gael Guennebaud	2018-11-09
\|
*	Add and update manual pages for slicing, indexing, and reshaping.	Gael Guennebaud	2018-11-09
\|
*	Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE	Gael Guennebaud	2018-11-09
\|
*	Fix max-size in indexed-view	Gael Guennebaud	2018-11-08
\|
*	Merged in glchaves/eigen (pull request PR-539)	Gael Guennebaud	2018-11-07
\|\ \| \| \| \| \| \|	Vectorize row-by-row gebp loop iterations on 16 packets as well
* \|	Add option to disable plot generation	Gael Guennebaud	2018-11-07
\| \|
\| *	Vectorize row-by-row gebp loop iterations on 16 packets as well	Gustavo Lima Chaves	2018-11-06
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com> Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
* \|	PR 526: Speed up multiplication of small, dynamically sized matrices	Mark D Ryan	2018-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Packet16f, Packet8f and Packet8d types are too large to use with dynamically sized matrices typically processed by the SliceVectorizedTraversal specialization of the dense_assignment_loop. Using these types is likely to lead to little or no vectorization. Significant slowdown in the multiplication of these small matrices can be observed when building with AVX and AVX512 enabled. This patch introduces a new dense_assignment_kernel that is used when computing small products whose operands have dynamic dimensions. It ensures that the PacketSize used is no larger than 4, thereby increasing the chance that vectorized instructions will be used when computing the product. I tested all 969 possible combinations of M, K, and N that are handled by the dense_assignment_loop on x86 builds. Although a few combinations are slowed down by this patch they are far outnumbered by the cases that are sped up, as the following results demonstrate. Disabling Packed8d on AVX512 builds: Total Cases: 969 Better: 511 Worse: 85 Same: 373 Max Improvement: 169.00% (4 8 6) Max Degradation: 36.50% (8 5 3) Median Improvement: 35.46% Median Degradation: 17.41% Total FLOPs Improvement: 19.42% Disabling Packet16f and Packed8f on AVX512 builds: Total Cases: 969 Better: 658 Worse: 5 Same: 306 Max Improvement: 214.05% (8 6 5) Max Degradation: 22.26% (16 2 1) Median Improvement: 60.05% Median Degradation: 13.32% Total FLOPs Improvement: 59.58% Disabling Packed8f on AVX builds: Total Cases: 969 Better: 663 Worse: 96 Same: 210 Max Improvement: 155.29% (4 10 5) Max Degradation: 35.12% (8 3 2) Median Improvement: 34.28% Median Degradation: 15.05% Total FLOPs Improvement: 26.02%
* \|	Fix code format	Eugene Zhulenev	2018-11-02
\| \|
* \|	Workaround nbcc+msvc compiler bug	Eugene Zhulenev	2018-11-02
\|/
*	add unit tests for bug #1619	Gael Guennebaud	2018-11-01
\|
*	bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.	Matthieu Vigne	2018-10-31
\| \| \| \| \|	This made FullPivLU.kernel() crash when used on the zero matrix. Add unit test for FullPivLU.kernel() on the zero matrix.
*	bug #1618: Use different power-of-2 check to avoid MSVC warning	Christoph Hertzberg	2018-11-01
\|
*	Merged in ezhulenev/eigen-02 (pull request PR-534)	Rasmus Munk Larsen	2018-10-25
\|\ \| \| \| \| \| \|	Fix cxx11_tensor_{block_access, reduction} tests
\| *	Fix cxx11_tensor_{block_access, reduction} tests	Eugene Zhulenev	2018-10-25
\| \|
* \|	Fix typo in tutorial documentation.	Halie Murray-Davis	2018-10-25
\| \|
* \|	Document EIGEN_NO_IO preprocessor directive	Christoph Hertzberg	2018-10-25
\| \|
* \|	Collapsed revision (based on pull request PR-325)	Christian von Schultz	2018-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support compiling without IO streams Add the preprocessor definition EIGEN_NO_IO which, if defined, disables all use of the IO streams part of the standard library.
* \|	Do not rely on the compiler generating __device__ functions for constexpr in ↵	Rasmus Munk Larsen	2018-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g., INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc: /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code" /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed /b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code 4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii". ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
* \|	Suppress compiler warning about unused global variable.	Rasmus Munk Larsen	2018-10-22
\| \|
* \|	Merged in rmlarsen/eigen (pull request PR-532)	Rasmus Munk Larsen	2018-10-19
\|\ \ \| \| \| \| \| \| \| \| \|	Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
* \| \|	Fix most Doxygen warnings. Also add links to stable documentation from ↵	Christoph Hertzberg	2018-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unsupported modules (by using the corresponding Doxytags file). Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e
\| * \|	Merged eigen/eigen into default	Rasmus Munk Larsen	2018-10-19
\| \|\ \ \| \|/ / \|/\| \|
* \| \|	bug #1606: Explicitly set the standard before ↵	Christoph Hertzberg	2018-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. Grafted manually from a4afa90d161faab385a77f0e2764fb13ff3b9484
\| * \|	Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if ↵	Rasmus Munk Larsen	2018-10-18
\|/ / \| \| \| \| \| \|	cxx_relaxed_constexpr is available.
* /	Fix GPU build due to gpu_assert not always being defined.	Rasmus Munk Larsen	2018-10-18
\|/
*	fix typo in doc	Gael Guennebaud	2018-10-17
\|
*	Move from rvalue arguments in ThreadPool enqueue* methods	Eugene Zhulenev	2018-10-16
\|
*	Reduce thread scheduling overhead in parallelFor	Eugene Zhulenev	2018-10-16
\|
*	Merged in ezhulenev/eigen-02 (pull request PR-528)	Rasmus Munk Larsen	2018-10-16
\|\ \| \| \| \| \| \| \| \| \| \|	[TensorBlockIO] Check if it's allowed to squeeze inner dimensions Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
* \|	Fix float-to-double warning	Gael Guennebaud	2018-10-16
\| \|
\| *	Check if it's allowed to squueze inner dimensions in TensorBlockIO	Eugene Zhulenev	2018-10-15
\| \|
* \|	bug #1612: fix regression in "outer-vectorization" of partial reductions for ↵	Gael Guennebaud	2018-10-16
\| \| \| \| \| \| \| \|	PacketSize==1 (aka complex<double>)
* \|	Show call stack in case of failing sparse solving.	Gael Guennebaud	2018-10-16
\| \|
* \|	Remove useless (and broken) resize	Gael Guennebaud	2018-10-16
\| \|
* \|	Iterative solvers: unify and fix handling of multiple rhs.	Gael Guennebaud	2018-10-15
\| \| \| \| \| \| \| \|	m_info was not properly computed and the logic was repeated in several places.
* \|	DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve	Gael Guennebaud	2018-10-15
\|/
*	relax number of iterations checks to avoid false negatives	Gael Guennebaud	2018-10-15
\|
*	merge	Gael Guennebaud	2018-10-15
\|\
\| *	Suppress unused variable compiler warning in sparse subtest 3.	Rasmus Munk Larsen	2018-10-12
\| \|
\| *	Explicitly convert 0 to Scalar for custom types	Christoph Hertzberg	2018-10-12
\| \|