From 62ce712ecc887f669610a93efe18abecf70b47a0 Mon Sep 17 00:00:00 2001 From: Abseil Team Date: Fri, 8 Jan 2021 09:10:22 -0800 Subject: Export of internal Abseil changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit -- b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon : Include priority in the calculation of mutex waiter equivalence This changes the behavior of the absl::Mutex wait list to take into account waiter priority when creating "skip chains". A skip chain on the wait list is a set of adjacent waiters that share some property and enable skipping during traversal. Prior to this CL, the skip chains were formed of waiters with the same wait type (e.g. exclusive vs read) and Condition. With this CL, the priority is also taken into account. This avoids O(n) behavior when enqueueing a waiter onto a wait list where the oldest waiter is at a lower priority than the waiter to be enqueued. With the prior notion of equivalence class, a skip chain could contain waiters of different priority, so we had to walk the linked list one-by-one until finding the appropriate insertion point. With the new equivalence class computation, we can skip past all of the equivalent waiters to find the right insertion point. This gives a substantial improvement to the enqueue performance in the case where there's already a waiter at lower priority. Note that even though this code path isn't a hot one, it's performed while holding the Mutex's spinlock, which prevents other threads from unlocking the Mutex, so minimizing the time under the critical section can have "knock-on" throughput benefits. Notable performance differences: name old cpu/op new cpu/op delta BM_MutexEnqueue/multiple_priorities:0/threads:4 8.60µs ± 7% 8.69µs ± 6% ~ (p=0.365 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 8.47µs ± 5% 8.64µs ±10% ~ (p=0.569 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 8.56µs ± 3% 8.55µs ± 6% ~ (p=0.563 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 8.98µs ± 8% 8.86µs ± 4% ~ (p=0.232 n=19+17) BM_MutexEnqueue/multiple_priorities:1/threads:4 6.64µs ±10% 6.45µs ± 4% ~ (p=0.097 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 15.2µs ± 8% 9.1µs ± 4% -39.93% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:128 22.3µs ± 6% 9.4µs ± 4% -57.82% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 61.5µs ± 3% 10.1µs ± 8% -83.53% (p=0.000 n=20+20) name old time/op new time/op delta BM_Mutex/real_time/threads:1 19.6ns ± 4% 19.8ns ±11% ~ (p=0.534 n=17+17) BM_Mutex/real_time/threads:112 120ns ±17% 122ns ±14% ~ (p=0.988 n=20+18) BM_MutexEnqueue/multiple_priorities:0/threads:4 5.18µs ± 6% 5.23µs ± 6% ~ (p=0.428 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:64 5.06µs ± 5% 5.18µs ±10% ~ (p=0.235 n=19+20) BM_MutexEnqueue/multiple_priorities:0/threads:128 5.16µs ± 3% 5.14µs ± 6% ~ (p=0.474 n=17+17) BM_MutexEnqueue/multiple_priorities:0/threads:512 5.40µs ± 8% 5.32µs ± 5% ~ (p=0.196 n=20+18) BM_MutexEnqueue/multiple_priorities:1/threads:4 3.99µs ±10% 3.88µs ± 3% ~ (p=0.074 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:64 8.48µs ± 9% 5.41µs ± 3% -36.20% (p=0.000 n=20+16) BM_MutexEnqueue/multiple_priorities:1/threads:128 12.2µs ± 6% 5.6µs ± 4% -54.43% (p=0.000 n=20+17) BM_MutexEnqueue/multiple_priorities:1/threads:512 32.1µs ± 3% 5.9µs ± 8% -81.45% (p=0.000 n=20+20) ... BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:32 1.69µs ± 4% 1.66µs ± 2% -1.91% (p=0.000 n=20+20) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:48 1.90µs ± 2% 1.82µs ± 2% -4.09% (p=0.000 n=20+19) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:64 2.19µs ± 2% 1.80µs ± 1% -17.89% (p=0.000 n=20+20) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:96 2.18µs ± 5% 1.81µs ± 1% -16.94% (p=0.000 n=17+19) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:128 2.18µs ± 1% 1.91µs ± 2% -12.33% (p=0.000 n=19+20) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:192 2.27µs ± 2% 1.89µs ± 1% -16.79% (p=0.000 n=20+19) BM_Contended/cs_ns:2000/num_prios:2/real_time/threads:256 2.36µs ± 2% 1.83µs ± 1% -22.25% (p=0.000 n=20+19) PiperOrigin-RevId: 350775432 -- e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon : Fix test timeouts for sequence_lock_test on TSAN PiperOrigin-RevId: 350680903 -- 3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon : Add benchmarks for Mutex performance with multiple priorities This adds a new benchmark to mutex_benchmark which forces threads to go through the slow "Enqueue" path. The benchmark runs with varying numbers of threads and with/without the presence of a lower-priority waiter. PiperOrigin-RevId: 350655403 GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3 Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22 --- absl/flags/internal/sequence_lock_test.cc | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) (limited to 'absl/flags') diff --git a/absl/flags/internal/sequence_lock_test.cc b/absl/flags/internal/sequence_lock_test.cc index 9aff1edc..ff8b476b 100644 --- a/absl/flags/internal/sequence_lock_test.cc +++ b/absl/flags/internal/sequence_lock_test.cc @@ -13,6 +13,7 @@ // limitations under the License. #include "absl/flags/internal/sequence_lock.h" +#include #include #include // NOLINT(build/c++11) #include @@ -112,13 +113,21 @@ std::vector MultiplicativeRange(int low, int high, int scale) { return result; } -INSTANTIATE_TEST_SUITE_P(TestManyByteSizes, ConcurrentSequenceLockTest, - testing::Combine( - // Buffer size (bytes). - testing::Range(1, 128), - // Number of reader threads. - testing::ValuesIn(MultiplicativeRange( - 1, absl::base_internal::NumCPUs(), 2)))); +#ifndef ABSL_HAVE_THREAD_SANITIZER +const int kMaxThreads = absl::base_internal::NumCPUs(); +#else +// With TSAN, a lot of threads contending for atomic access on the sequence +// lock make this test run too slowly. +const int kMaxThreads = std::min(absl::base_internal::NumCPUs(), 4); +#endif + +INSTANTIATE_TEST_SUITE_P( + TestManyByteSizes, ConcurrentSequenceLockTest, + testing::Combine( + // Buffer size (bytes). + testing::Range(1, 128), + // Number of reader threads. + testing::ValuesIn(MultiplicativeRange(1, kMaxThreads, 2)))); // Simple single-threaded test, parameterized by the size of the buffer to be // protected. -- cgit v1.2.3