diff options
author | Abseil Team <absl-team@google.com> | 2021-01-08 09:10:22 -0800 |
---|---|---|
committer | Derek Mauro <dmauro@google.com> | 2021-01-08 12:50:29 -0500 |
commit | 62ce712ecc887f669610a93efe18abecf70b47a0 (patch) | |
tree | 21ef34fc27e09d5ff51baad57d27b9ec033f9388 /absl/flags | |
parent | 92ba53599931fcbe31e7970497cb9e60091434c1 (diff) |
Export of internal Abseil changes
--
b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>:
Include priority in the calculation of mutex waiter equivalence
This changes the behavior of the absl::Mutex wait list to take into account
waiter priority when creating "skip chains". A skip chain on the wait list
is a set of adjacent waiters that share some property and enable skipping
during traversal.
Prior to this CL, the skip chains were formed of waiters with the same
wait type (e.g. exclusive vs read) and Condition. With this CL, the priority
is also taken into account.
This avoids O(n) behavior when enqueueing a waiter onto a wait list where
the oldest waiter is at a lower priority than the waiter to be enqueued.
With the prior notion of equivalence class, a skip chain could contain
waiters of different priority, so we had to walk the linked list one-by-one
until finding the appropriate insertion point. With the new equivalence
class computation, we can skip past all of the equivalent waiters to find
the right insertion point.
This gives a substantial improvement to the enqueue performance in the
case where there's already a waiter at lower priority.
Note that even though this code path isn't a hot one, it's performed while
holding the Mutex's spinlock, which prevents other threads from unlocking
the Mutex, so minimizing the time under the critical section can have
"knock-on" throughput benefits.
Notable performance differences:
name old cpu/op new cpu/op delta
BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17)
BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17)
BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20)
name old time/op new time/op delta
BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17)
BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18)
BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17)
BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18)
BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16)
BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20)
...
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19)
PiperOrigin-RevId: 350775432
--
e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>:
Fix test timeouts for sequence_lock_test on TSAN
PiperOrigin-RevId: 350680903
--
3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>:
Add benchmarks for Mutex performance with multiple priorities
This adds a new benchmark to mutex_benchmark which forces threads to go
through the slow "Enqueue" path. The benchmark runs with varying numbers
of threads and with/without the presence of a lower-priority waiter.
PiperOrigin-RevId: 350655403
GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3
Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
Diffstat (limited to 'absl/flags')
-rw-r--r-- | absl/flags/internal/sequence_lock_test.cc | 23 |
1 files changed, 16 insertions, 7 deletions
diff --git a/absl/flags/internal/sequence_lock_test.cc b/absl/flags/internal/sequence_lock_test.cc index 9aff1edc..ff8b476b 100644 --- a/absl/flags/internal/sequence_lock_test.cc +++ b/absl/flags/internal/sequence_lock_test.cc @@ -13,6 +13,7 @@ // limitations under the License. #include "absl/flags/internal/sequence_lock.h" +#include <algorithm> #include <atomic> #include <thread> // NOLINT(build/c++11) #include <tuple> @@ -112,13 +113,21 @@ std::vector<int> MultiplicativeRange(int low, int high, int scale) { return result; } -INSTANTIATE_TEST_SUITE_P(TestManyByteSizes, ConcurrentSequenceLockTest, - testing::Combine( - // Buffer size (bytes). - testing::Range(1, 128), - // Number of reader threads. - testing::ValuesIn(MultiplicativeRange( - 1, absl::base_internal::NumCPUs(), 2)))); +#ifndef ABSL_HAVE_THREAD_SANITIZER +const int kMaxThreads = absl::base_internal::NumCPUs(); +#else +// With TSAN, a lot of threads contending for atomic access on the sequence +// lock make this test run too slowly. +const int kMaxThreads = std::min(absl::base_internal::NumCPUs(), 4); +#endif + +INSTANTIATE_TEST_SUITE_P( + TestManyByteSizes, ConcurrentSequenceLockTest, + testing::Combine( + // Buffer size (bytes). + testing::Range(1, 128), + // Number of reader threads. + testing::ValuesIn(MultiplicativeRange(1, kMaxThreads, 2)))); // Simple single-threaded test, parameterized by the size of the buffer to be // protected. |