eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
\| *	Fix specialization for conjugate on non-complex types in TensorBase.h.	Rasmus Munk Larsen	2019-03-01
\| \|
\| *	Improve EventCount used by the non-blocking threadpool.	Rasmus Munk Larsen	2019-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
\| *	Fix conversion warnings	Gael Guennebaud	2019-02-19
\| \|
\| *	Fix incorrect value of NumDimensions in TensorContraction traits.	Rasmus Munk Larsen	2019-02-19
\| \| \| \| \| \| \| \|	Reported here: #1671
\| *	Merged in ezhulenev/eigen-01 (pull request PR-590)	Rasmus Larsen	2019-02-14
\| \|\ \| \| \| \| \| \| \| \| \|	Do not generate no-op cast() and conjugate() expressions
\| * \|	Fix signed-unsigned return in RuqQueue	Eugene Zhulenev	2019-02-14
\| \| \|
\| * \|	Fix signed-unsigned comparison warning in RunQueue	Eugene Zhulenev	2019-02-14
\| \| \|
\| \| *	Do not generate no-op cast() and conjugate() expressions	Eugene Zhulenev	2019-02-14
\| \|/
\| *	Speedup Tensor ThreadPool RunQueu::Empty()	Eugene Zhulenev	2019-02-13
\| \|
\| *	Add PacketConv implementation for non-vectorizable src expressions	Eugene Zhulenev	2019-02-08
\| \|
\| *	Optimize TensorConversion evaluator: do not convert same type	Eugene Zhulenev	2019-02-08
\| \|
\| *	Spline.h: fix spelling "spang" -> "span"	Steven Peters	2019-02-08
\| \|
\| *	Don't do parallel_pack if we can use thread_local memory in tensor contractions	Eugene Zhulenev	2019-02-07
\| \|
\| *	Do not reduce parallelism too much in contractions with small number of threads	Eugene Zhulenev	2019-02-04
\| \|
\| *	Parallelize tensor contraction only by sharding dimension and use ↵	Eugene Zhulenev	2019-02-04
\| \| \| \| \| \| \| \|	'thread-local' memory for packing
\| *	Workaround lack of support for arbitrary packet-type in Tensor by manually ↵	Gael Guennebaud	2019-01-30
\| \| \| \| \| \| \| \|	loading half/quarter packets in tensor contraction mapper.
\| *	Hide some annoying unused variable warnings in g++8.1	Christoph Hertzberg	2019-01-29
\| \|
\| *	Renaming even more `I` identifiers	Christoph Hertzberg	2019-01-26
\| \|
\| *	Avoid `I` as an identifier, since it may clash with the C-header complex.h	Christoph Hertzberg	2019-01-25
\| \|
\| *	Fix shorten-64-to-32 warning in TensorContractionThreadPool	Eugene Zhulenev	2019-01-11
\| \|
\| *	Fix shorten-64-to-32 warning in TensorContractionThreadPool	Eugene Zhulenev	2019-01-10
\| \|
\| *	bug #1654: fix compilation with cuda and no c++11	Gael Guennebaud	2019-01-09
\| \|
\| *	Optimize evalShardedByInnerDim	Eugene Zhulenev	2019-01-08
\| \|
\| *	Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.	Rasmus Munk Larsen	2018-12-12
\| \|
\| *	Remove debug code.	Gael Guennebaud	2018-12-09
\| \|
\| *	Various fixes in polynomial solver and its unit tests:	Gael Guennebaud	2018-12-09
\| \| \| \| \| \| \| \| \| \| \| \|	- cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places
\| *	Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)	Rasmus Munk Larsen	2018-12-05
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix tensor contraction on AVX512 builds Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
\| \| *	Fix evalShardedByInnerDim for AVX512 builds	Mark D Ryan	2018-12-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	evalShardedByInnerDim ensures that the values it passes for start_k and end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel does not work correctly when the values of k are not multiples of the packet_size. While this precaution works for AVX builds, it is insufficient for AVX512 builds where the maximum packet size is 16. The result is slightly incorrect float32 contractions on AVX512 builds. This commit fixes the problem by ensuring that k is always a multiple of the packet_size if the packet_size is > 8.
\| * \|	Fixed most conversion warnings in MatrixFunctions module	Christoph Hertzberg	2018-11-20
\| \| \|
* \| \|	ROCm/HIP specfic fixes + updates	Deven Desai	2018-11-19
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Eigen/src/Core/arch/GPU/Half.h Updating the HIPCC implementation half so that it can declared as a __shared__ variable 2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h introducing a EIGEN_USE_STD(func) macro that calls - std::func be default - ::func when eigen is being compiled with HIPCC This change was requested in the previous HIP PR (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff) 3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC 4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
* \|	Merged in rmlarsen/eigen2 (pull request PR-543)	Rasmus Munk Larsen	2018-11-13
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
\| * \|	Remove accidental changes.	Rasmus Munk Larsen	2018-11-12
\| \| \|
\| * \|	Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number ↵	Rasmus Munk Larsen	2018-11-12
\| \| \| \| \| \| \| \| \| \| \| \|	of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
* \| \|	[PATCH 1/2] Misc. typos	luz.paz"	2018-09-18
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)
\| *	Fix tensor contraction for AVX512 machines	Mark D Ryan	2018-07-31
\|/ \| \| \| \| \| \| \| \|	This patch modifies the TensorContraction class to ensure that the kc_ field is always a multiple of the packet_size, if the packet_size is > 8. Without this change spatial convolutions in Tensorflow do not work properly as the code that re-arranges the input matrices can assert if kc_ is not a multiple of the packet_size. This leads to a unit test failure, //tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
*	A few small fixes to a) prevent throwing in ctors and dtors of the threading ↵	Rasmus Munk Larsen	2018-11-09
\| \| \| \|	code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
*	Fix most Doxygen warnings. Also add links to stable documentation from ↵	Christoph Hertzberg	2018-10-19
\| \| \| \| \| \| \|	unsupported modules (by using the corresponding Doxytags file). Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e
*	Fix GPU build due to gpu_assert not always being defined.	Rasmus Munk Larsen	2018-10-18
\|
*	Move from rvalue arguments in ThreadPool enqueue* methods	Eugene Zhulenev	2018-10-16
\|
*	Reduce thread scheduling overhead in parallelFor	Eugene Zhulenev	2018-10-16
\|
*	Merged in ezhulenev/eigen-02 (pull request PR-528)	Rasmus Munk Larsen	2018-10-16
\|\ \| \| \| \| \| \| \| \| \| \|	[TensorBlockIO] Check if it's allowed to squeeze inner dimensions Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
\| *	Check if it's allowed to squueze inner dimensions in TensorBlockIO	Eugene Zhulenev	2018-10-15
\| \|
* \|	Iterative solvers: unify and fix handling of multiple rhs.	Gael Guennebaud	2018-10-15
\| \| \| \| \| \| \| \|	m_info was not properly computed and the logic was repeated in several places.
* \|	DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve	Gael Guennebaud	2018-10-15
\|/
*	Fix a lot of Doxygen warnings in Tensor module	Christoph Hertzberg	2018-10-09
\|
*	Fix out-of bounds access in TensorArgMax.h.	Rasmus Munk Larsen	2018-10-08
\|
*	Workaround stupid warning	Gael Guennebaud	2018-10-08
\|
*	Move struct outside of method for C++03 compatibility.	Christoph Hertzberg	2018-10-02
\|
*	Make code compile in C++03 mode again	Christoph Hertzberg	2018-10-02
\|
*	Fix conversion warning ... again	Christoph Hertzberg	2018-10-02
\|