eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
...
*	Add block evaluation V2 to TensorAsyncExecutor.	Rasmus Munk Larsen	2019-10-22
\| \| \| \|	Add async evaluation to a number of ops.
*	Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵	Rasmus Munk Larsen	2019-10-18
\| \| \| \|	c++11 functionality with older compilers.
*	Cleanup Tensor block destination and materialized block storage allocation	Eugene Zhulenev	2019-10-16
\|
*	TensorBroadcasting support for random/uniform blocks	Eugene Zhulenev	2019-10-16
\|
*	Block evaluation for TensorGenerator/TensorReverse/TensorShuffling	Eugene Zhulenev	2019-10-14
\|
*	Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor ↵	Eugene Zhulenev	2019-10-10
\| \| \| \|	reverse op
*	Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵	Eugene Zhulenev	2019-10-09
\| \| \| \|	TensorSlicing
*	Implement c++03 compatible fix for changeset ↵	Gael Guennebaud	2019-10-09
\| \| \| \|	7a43af1a335da2c0489b4119a33ee1cbff0c15d6
*	Fix compilation of FFTW unit test	Gael Guennebaud	2019-10-08
\|
*	Add block evaluation to TensorEvalTo and fix few small bugs	Eugene Zhulenev	2019-10-07
\|
*	Fix compilation warnings and errors with clang in TensorBlockV2 code and tests	Eugene Zhulenev	2019-10-04
\|
*	Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect	Eugene Zhulenev	2019-10-02
\|
*	Fix cxx11_tensor_block_io test	Eugene Zhulenev	2019-09-25
\|
*	Fix compilation warnings and errors with clang in TensorBlockV2	Eugene Zhulenev	2019-09-25
\|
*	Add new TensorBlock api implementation + tests	Eugene Zhulenev	2019-09-24
\|
*	Tensor block evaluation V2 support for unary/binary/broadcsting	Eugene Zhulenev	2019-09-24
\|
*	Add support for asynchronous evaluation of tensor casting expressions.	Rasmus Munk Larsen	2019-09-19
\|
*	Merging eigen/eigen.	Srinivas Vasudevan	2019-09-16
\|\
* \|	Add Bessel functions to SpecialFunctions.	Srinivas Vasudevan	2019-09-14
\|/ \| \| \| \| \| \| \| \|	- Split SpecialFunctions files in to a separate BesselFunctions file. In particular add: - Modified bessel functions of the second kind k0, k1, k0e, k1e - Bessel functions of the first kind j0, j1 - Bessel functions of the second kind y0, y1
*	Fix for the HIP build+test errors introduced by the ndtri support.	Deven Desai	2019-09-06
\| \| \| \| \| \| \|	The fixes needed are * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs) * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
*	Update ThreadLocal to use separate Initialize/Release callables	Eugene Zhulenev	2019-09-10
\|
*	ThreadLocal container that does not rely on thread local storage	Eugene Zhulenev	2019-09-09
\|
*	PR 681: Add ndtri function, the inverse of the normal distribution function.	Srinivas Vasudevan	2019-08-12
\|
*	Allow move-only done callback in TensorAsyncDevice	Eugene Zhulenev	2019-09-03
\|
*	Add test for const TensorMap underlying data mutation	Eugene Zhulenev	2019-09-03
\|
*	evalSubExprsIfNeededAsync + async TensorContractionThreadPool	Eugene Zhulenev	2019-08-30
\|
*	Asynchronous expression evaluation with TensorAsyncDevice	Eugene Zhulenev	2019-08-30
\|
*	Const correctness in TensorMap<const Tensor<T, ...>> expressions	Eugene Zhulenev	2019-08-28
\|
*	Remove XSMM support from Tensor module	Eugene Zhulenev	2019-08-19
\|
*	Disable tests for contraction with output kernels when using libxsmm, which ↵	Rasmus Munk Larsen	2019-08-07
\| \| \| \|	does not support this.
*	Merge with Eigen head	Eugene Zhulenev	2019-06-28
\|\
* \|	Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵	Eugene Zhulenev	2019-06-28
\| \| \| \| \| \| \| \|	block access when preferred
\| *	[SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵	Mehdi Goli	2019-06-28
\|/ \| \| \| \| \| \| \| \| \|	module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
*	Minor build improvements	tra	2019-05-31
\| \| \| \| \| \| \| \|	* Allow specifying multiple GPU architectures. E.g.: cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70" * Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda which may not be the right location, if cmake was invoked with -DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
*	Merged in rmlarsen/eigen_threadpool (pull request PR-640)	Rasmus Larsen	2019-05-13
\|\ \| \| \| \| \| \| \| \| \| \|	Fix deadlocks in thread pool. Approved-by: Eugene Zhulenev <ezhulenev@google.com>
* \|	bug #1707: Fix deprecation warnings, or disable warnings when testing ↵	Christoph Hertzberg	2019-05-10
\| \| \| \| \| \| \| \|	deprecated functions
\| *	A) fix deadlocks in thread pool caused by EventCount	Rasmus Munk Larsen	2019-05-08
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
*	Block evaluation for TensorGeneratorOp	Eugene Zhulenev	2019-03-05
\|
*	Do not create Tensor<const T> in cxx11_tensor_forced_eval test	Eugene Zhulenev	2019-03-05
\|
*	Add tiled evaluation for TensorForcedEvalOp	Eugene Zhulenev	2019-03-04
\|
*	Improve EventCount used by the non-blocking threadpool.	Rasmus Munk Larsen	2019-02-22
\| \| \| \| \| \| \| \| \| \|	The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
*	Avoid `I` as an identifier, since it may clash with the C-header complex.h	Christoph Hertzberg	2019-01-25
\|
*	Fix flaky test for tensor fft.	Rasmus Munk Larsen	2019-01-16
\|
*	bug #1654: fix compilation with cuda and no c++11	Gael Guennebaud	2019-01-09
\|
*	Various fixes in polynomial solver and its unit tests:	Gael Guennebaud	2018-12-09
\| \| \| \| \| \|	- cleanup noise in imaginary part of real roots - take into account the magnitude of the derivative to check roots. - use <= instead of < at appropriate places
*	Fixed most conversion warnings in MatrixFunctions module	Christoph Hertzberg	2018-11-20
\|
*	[PATCH 1/2] Misc. typos	luz.paz"	2018-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001 Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of: ``` als ans cas dum lastr lowd nd overfl pres preverse substraction te uint whch ``` --- CMakeLists.txt \| 26 +++++++++---------- Eigen/src/Core/GenericPacketMath.h \| 2 +- Eigen/src/SparseLU/SparseLU.h \| 2 +- bench/bench_norm.cpp \| 2 +- doc/HiPerformance.dox \| 2 +- doc/QuickStartGuide.dox \| 2 +- .../Eigen/CXX11/src/Tensor/TensorChipping.h \| 6 ++--- .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h \| 2 +- .../src/Tensor/TensorForwardDeclarations.h \| 4 +-- .../src/Tensor/TensorGpuHipCudaDefines.h \| 2 +- .../Eigen/CXX11/src/Tensor/TensorReduction.h \| 2 +- .../CXX11/src/Tensor/TensorReductionGpu.h \| 2 +- .../test/cxx11_tensor_concatenation.cpp \| 2 +- unsupported/test/cxx11_tensor_executor.cpp \| 2 +- 14 files changed, 29 insertions(+), 29 deletions(-)
*	Merged in ezhulenev/eigen-02 (pull request PR-534)	Rasmus Munk Larsen	2018-10-25
\|\ \| \| \| \| \| \|	Fix cxx11_tensor_{block_access, reduction} tests
\| *	Fix cxx11_tensor_{block_access, reduction} tests	Eugene Zhulenev	2018-10-25
\| \|
* \|	bug #1606: Explicitly set the standard before ↵	Christoph Hertzberg	2018-10-19
\|/ \| \| \| \| \| \|	find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11. Grafted manually from a4afa90d161faab385a77f0e2764fb13ff3b9484