eigen - C++ library for linear algebra

	Commit message (Collapse)	Author	Age
*	Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when ↵	Zhuyie	2020-09-25
\| \| \| \|	EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined
*	Avoid a division in NonBlockingThreadPool::Steal.	Ilya Tokar	2020-02-14
\| \| \| \| \| \| \|	Looking at profiles we spend ~10-20% of Steal on simply computing random % size. We can reduce random 32-bit int into [0, size) range with a single multiplication and shift. This transformation is described in https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
*	Update ThreadLocal to use separate Initialize/Release callables	Eugene Zhulenev	2019-09-10
\|
*	ThreadLocal container that does not rely on thread local storage	Eugene Zhulenev	2019-09-09
\|
*	evalSubExprsIfNeededAsync + async TensorContractionThreadPool	Eugene Zhulenev	2019-08-30
\|
*	A) fix deadlocks in thread pool caused by EventCount	Rasmus Munk Larsen	2019-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
*	Fix a data race in NonBlockingThreadPool	Eugene Zhulenev	2019-03-11
\|
*	Merge.	Rasmus Munk Larsen	2019-03-06
\|\
* \|	Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable ↵	Rasmus Munk Larsen	2019-03-06
\| \| \| \| \| \| \| \|	the use of thread_local.
\| *	Add missing return to NonBlockingThreadPool::LocalSteal	Eugene Zhulenev	2019-03-06
\| \|
\| *	Remove redundant steal loop	Eugene Zhulenev	2019-03-06
\|/
*	Add an extra check for the RunQueue size estimate	Eugene Zhulenev	2019-03-05
\|
*	Improve EventCount used by the non-blocking threadpool.	Rasmus Munk Larsen	2019-02-22
\| \| \| \| \| \| \| \| \| \|	The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
*	Fix signed-unsigned return in RuqQueue	Eugene Zhulenev	2019-02-14
\|
*	Fix signed-unsigned comparison warning in RunQueue	Eugene Zhulenev	2019-02-14
\|
*	Speedup Tensor ThreadPool RunQueu::Empty()	Eugene Zhulenev	2019-02-13
\|
*	A few small fixes to a) prevent throwing in ctors and dtors of the threading ↵	Rasmus Munk Larsen	2018-11-09
\| \| \| \|	code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
*	Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides	Christoph Hertzberg	2018-09-24
\|
*	Fiw shadowing of last and all	Gael Guennebaud	2018-09-21
\|
*	Cast to longer type.	Rasmus Munk Larsen	2018-09-19
\|
*	Silence compiler warning.	Rasmus Munk Larsen	2018-09-19
\|
*	Silence compiler warnings in ThreadPoolInterface.h.	Rasmus Munk Larsen	2018-09-19
\|
*	Collapsed revision	Ravi Kiran	2018-09-17
\| \| \| \|	* Merged eigen/eigen into default
*	bug #1598: Let MaxSizeVector respect alignment of objects and add a unit test	Christoph Hertzberg	2018-09-14
\| \| \| \| \|	Also revert 8b3d9ed081fc5d4870290649853b19cb5179546e
*	MSVC 2015 supports c++11 thread-local-storage	Gael Guennebaud	2018-09-13
\|
*	Use padding instead of alignment attribute, which MaxSizeVector does not ↵	Rasmus Munk Larsen	2018-09-05
\| \| \| \|	respect. This leads to undefined behavior and hard-to-trace bugs.
*	Address comments about EIGEN_THREAD_LOCAL.	Rasmus Munk Larsen	2018-08-24
\|
*	Fix g++ compilation.	Rasmus Munk Larsen	2018-08-23
\|
*	Don't rely on __had_feature for g++.	Rasmus Munk Larsen	2018-08-23
\| \| \| \| \|	Don't use __thread. Only use thread_local for gcc 4.8 or newer.
*	Pad PerThread when we emulate thread_local to prevent false sharing.	Rasmus Munk Larsen	2018-08-23
\|
*	rename mu.	Rasmus Munk Larsen	2018-08-23
\|
*	Store std::unique_ptr instead of raw pointers in per_thread_map_.	Rasmus Munk Larsen	2018-08-23
\|
*	merge	Rasmus Munk Larsen	2018-08-23
\|\
\| *	Replace pointers by values or unique_ptr for better leak-safety	Christoph Hertzberg	2018-08-23
\| \|
* \|	Use plain_assert in destructors to avoid throwing in CXX11 tests where ↵	Rasmus Munk Larsen	2018-08-14
\| \| \| \| \| \| \| \|	main.h owerwrites eigen_assert with a throwing version.
* \|	Add Barrier.h.	Rasmus Munk Larsen	2018-08-13
\| \|
* \|	Add support for thread local support on platforms that do not support it ↵	Rasmus Munk Larsen	2018-08-13
\|/ \| \| \|	through emulation using a hash map.
*	Remove SimpleThreadPool and always use {NonBlocking}ThreadPool	Eugene Zhulenev	2018-07-16
\|
*	Fix typos found using codespell	Gael Guennebaud	2018-06-07
\|
*	Fixed compilation warning	Benoit Steiner	2017-07-06
\|
*	Get rid of Init().	Rasmus Munk Larsen	2017-03-10
\|
*	Use C++11 ctor forwarding to simplify code a bit.	Rasmus Munk Larsen	2017-03-10
\|
*	Make the non-blocking threadpool more flexible and less wasteful of CPU ↵	Rasmus Munk Larsen	2017-03-09
\| \| \| \| \| \| \| \| \| \| \| \|	cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
*	Don't call EnvThread::OnCancel by default since it doesn't do anything.	Benoit Steiner	2016-12-14
\|
*	Made ThreadPoolInterface::Cancel() an optional functionality	Benoit Steiner	2016-12-12
\|
*	Reworked the threadpool cancellation mechanism to not depend on ↵	Benoit Steiner	2016-12-09
\| \| \| \|	pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
*	Added a Flush method to the RunQueue	Benoit Steiner	2016-12-08
\|
*	Added the new threadpool cancel method to the threadpool interface based class.	Benoit Steiner	2016-12-08
\|
*	Added support for thread cancellation on Linux	Benoit Steiner	2016-12-08
\|
*	Properly size the list of waiters	Benoit Steiner	2016-09-12
\|