aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/ThreadPool
Commit message (Collapse)AuthorAge
* Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when ↵Gravatar Zhuyie2020-09-25
| | | | EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined
* Avoid a division in NonBlockingThreadPool::Steal.Gravatar Ilya Tokar2020-02-14
| | | | | | | Looking at profiles we spend ~10-20% of Steal on simply computing random % size. We can reduce random 32-bit int into [0, size) range with a single multiplication and shift. This transformation is described in https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
* Update ThreadLocal to use separate Initialize/Release callablesGravatar Eugene Zhulenev2019-09-10
|
* ThreadLocal container that does not rely on thread local storageGravatar Eugene Zhulenev2019-09-09
|
* evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* A) fix deadlocks in thread pool caused by EventCountGravatar Rasmus Munk Larsen2019-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixed 2 deadlocks caused by sloppiness in the EventCount logic. Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm: https://github.com/eigenteam/eigen-git-mirror/commit/01da8caf003990967e42a2b9dc3869f154569538 bug #1 (Prewait): Prewait must not consume existing signals. Consider the following scenario. There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty. Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait. Thread 2 checks the queue and now is going to call Prewait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Now thread 2 resumes and calls Prewait and takes away the signal. Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks. As the result we have 2 tasks, but only 1 thread is running. bug #2 (CancelWait): CancelWait must not take away a signal if it's not sure that the signal was meant for this thread. When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm): (a) the registered waiter notices presence of the new task and does not block (b) the signaler notices presence of the waiters and wakes it (c) both the waiter notices presence of the new task and signaler notices presence of the waiter [it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock] CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else. Consider: Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait. Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded). Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1). Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks. As the result we have 2 tasks, but only 1 thread is running. Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2. This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running. B) fix deadlock in thread pool caused by RunQueue This fixed a deadlock caused by sloppiness in the RunQueue logic. Most likely this was introduced with the non-blocking thread pool. The deadlock only affects workloads that require parallelism. Most computational tasks don't require parallelism. PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals. Consider 2 worker threads are blocked. External thread submits a task. One of the threads is woken. It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock). The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait). Now external thread submits another task and signals EventCount again. The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running. It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug. It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
* Fix a data race in NonBlockingThreadPoolGravatar Eugene Zhulenev2019-03-11
|
* Merge.Gravatar Rasmus Munk Larsen2019-03-06
|\
* | Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable ↵Gravatar Rasmus Munk Larsen2019-03-06
| | | | | | | | the use of thread_local.
| * Add missing return to NonBlockingThreadPool::LocalStealGravatar Eugene Zhulenev2019-03-06
| |
| * Remove redundant steal loopGravatar Eugene Zhulenev2019-03-06
|/
* Add an extra check for the RunQueue size estimateGravatar Eugene Zhulenev2019-03-05
|
* Improve EventCount used by the non-blocking threadpool.Gravatar Rasmus Munk Larsen2019-02-22
| | | | | | | | | | The current algorithm requires threads to commit/cancel waiting in order they called Prewait. Spinning caused by that serialization can consume lots of CPU time on some workloads. Restructure the algorithm to not require that serialization and remove spin waits from Commit/CancelWait. Note: this reduces max number of threads from 2^16 to 2^14 to leave more space for ABA counter (which is now 22 bits). Implementation details are explained in comments.
* Fix signed-unsigned return in RuqQueueGravatar Eugene Zhulenev2019-02-14
|
* Fix signed-unsigned comparison warning in RunQueueGravatar Eugene Zhulenev2019-02-14
|
* Speedup Tensor ThreadPool RunQueu::Empty()Gravatar Eugene Zhulenev2019-02-13
|
* A few small fixes to a) prevent throwing in ctors and dtors of the threading ↵Gravatar Rasmus Munk Larsen2018-11-09
| | | | code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
* Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overridesGravatar Christoph Hertzberg2018-09-24
|
* Fiw shadowing of last and allGravatar Gael Guennebaud2018-09-21
|
* Cast to longer type.Gravatar Rasmus Munk Larsen2018-09-19
|
* Silence compiler warning.Gravatar Rasmus Munk Larsen2018-09-19
|
* Silence compiler warnings in ThreadPoolInterface.h.Gravatar Rasmus Munk Larsen2018-09-19
|
* Collapsed revisionGravatar Ravi Kiran2018-09-17
| | | | * Merged eigen/eigen into default
* bug #1598: Let MaxSizeVector respect alignment of objects and add a unit testGravatar Christoph Hertzberg2018-09-14
| | | | | Also revert 8b3d9ed081fc5d4870290649853b19cb5179546e
* MSVC 2015 supports c++11 thread-local-storageGravatar Gael Guennebaud2018-09-13
|
* Use padding instead of alignment attribute, which MaxSizeVector does not ↵Gravatar Rasmus Munk Larsen2018-09-05
| | | | respect. This leads to undefined behavior and hard-to-trace bugs.
* Address comments about EIGEN_THREAD_LOCAL.Gravatar Rasmus Munk Larsen2018-08-24
|
* Fix g++ compilation.Gravatar Rasmus Munk Larsen2018-08-23
|
* Don't rely on __had_feature for g++.Gravatar Rasmus Munk Larsen2018-08-23
| | | | | Don't use __thread. Only use thread_local for gcc 4.8 or newer.
* Pad PerThread when we emulate thread_local to prevent false sharing.Gravatar Rasmus Munk Larsen2018-08-23
|
* rename mu.Gravatar Rasmus Munk Larsen2018-08-23
|
* Store std::unique_ptr instead of raw pointers in per_thread_map_.Gravatar Rasmus Munk Larsen2018-08-23
|
* mergeGravatar Rasmus Munk Larsen2018-08-23
|\
| * Replace pointers by values or unique_ptr for better leak-safetyGravatar Christoph Hertzberg2018-08-23
| |
* | Use plain_assert in destructors to avoid throwing in CXX11 tests where ↵Gravatar Rasmus Munk Larsen2018-08-14
| | | | | | | | main.h owerwrites eigen_assert with a throwing version.
* | Add Barrier.h.Gravatar Rasmus Munk Larsen2018-08-13
| |
* | Add support for thread local support on platforms that do not support it ↵Gravatar Rasmus Munk Larsen2018-08-13
|/ | | | through emulation using a hash map.
* Remove SimpleThreadPool and always use {NonBlocking}ThreadPoolGravatar Eugene Zhulenev2018-07-16
|
* Fix typos found using codespellGravatar Gael Guennebaud2018-06-07
|
* Fixed compilation warningGravatar Benoit Steiner2017-07-06
|
* Get rid of Init().Gravatar Rasmus Munk Larsen2017-03-10
|
* Use C++11 ctor forwarding to simplify code a bit.Gravatar Rasmus Munk Larsen2017-03-10
|
* Make the non-blocking threadpool more flexible and less wasteful of CPU ↵Gravatar Rasmus Munk Larsen2017-03-09
| | | | | | | | | | | | cycles for high-latency use-cases. * Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O. * This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time. * Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for. * Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
* Don't call EnvThread::OnCancel by default since it doesn't do anything.Gravatar Benoit Steiner2016-12-14
|
* Made ThreadPoolInterface::Cancel() an optional functionalityGravatar Benoit Steiner2016-12-12
|
* Reworked the threadpool cancellation mechanism to not depend on ↵Gravatar Benoit Steiner2016-12-09
| | | | pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
* Added a Flush method to the RunQueueGravatar Benoit Steiner2016-12-08
|
* Added the new threadpool cancel method to the threadpool interface based class.Gravatar Benoit Steiner2016-12-08
|
* Added support for thread cancellation on LinuxGravatar Benoit Steiner2016-12-08
|
* Properly size the list of waitersGravatar Benoit Steiner2016-09-12
|