From 62ce712ecc887f669610a93efe18abecf70b47a0 Mon Sep 17 00:00:00 2001
From: Abseil Team <absl-team@google.com>
Date: Fri, 8 Jan 2021 09:10:22 -0800
Subject: Export of internal Abseil changes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

--
b927776da818c674a674e46a7bbbdd54170a0ad3 by Todd Lipcon <tlipcon@google.com>:

Include priority in the calculation of mutex waiter equivalence

This changes the behavior of the absl::Mutex wait list to take into account
waiter priority when creating "skip chains". A skip chain on the wait list
is a set of adjacent waiters that share some property and enable skipping
during traversal.

Prior to this CL, the skip chains were formed of waiters with the same
wait type (e.g. exclusive vs read) and Condition. With this CL, the priority
is also taken into account.

This avoids O(n) behavior when enqueueing a waiter onto a wait list where
the oldest waiter is at a lower priority than the waiter to be enqueued.
With the prior notion of equivalence class, a skip chain could contain
waiters of different priority, so we had to walk the linked list one-by-one
until finding the appropriate insertion point. With the new equivalence
class computation, we can skip past all of the equivalent waiters to find
the right insertion point.

This gives a substantial improvement to the enqueue performance in the
case where there's already a waiter at lower priority.

Note that even though this code path isn't a hot one, it's performed while
holding the Mutex's spinlock, which prevents other threads from unlocking
the Mutex, so minimizing the time under the critical section can have
"knock-on" throughput benefits.

Notable performance differences:

name                                                                    old cpu/op  new cpu/op  delta
BM_MutexEnqueue/multiple_priorities:0/threads:4                         8.60µs ± 7%  8.69µs ± 6%     ~     (p=0.365 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:64                        8.47µs ± 5%  8.64µs ±10%     ~     (p=0.569 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:128                       8.56µs ± 3%  8.55µs ± 6%     ~     (p=0.563 n=17+17)
BM_MutexEnqueue/multiple_priorities:0/threads:512                       8.98µs ± 8%  8.86µs ± 4%     ~     (p=0.232 n=19+17)
BM_MutexEnqueue/multiple_priorities:1/threads:4                         6.64µs ±10%  6.45µs ± 4%     ~     (p=0.097 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:64                        15.2µs ± 8%   9.1µs ± 4%  -39.93%  (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:128                       22.3µs ± 6%   9.4µs ± 4%  -57.82%  (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:512                       61.5µs ± 3%  10.1µs ± 8%  -83.53%  (p=0.000 n=20+20)

name                                                                    old time/op             new time/op             delta
BM_Mutex/real_time/threads:1                                            19.6ns ± 4%             19.8ns ±11%     ~           (p=0.534 n=17+17)
BM_Mutex/real_time/threads:112                                           120ns ±17%              122ns ±14%     ~           (p=0.988 n=20+18)
BM_MutexEnqueue/multiple_priorities:0/threads:4                         5.18µs ± 6%             5.23µs ± 6%     ~           (p=0.428 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:64                        5.06µs ± 5%             5.18µs ±10%     ~           (p=0.235 n=19+20)
BM_MutexEnqueue/multiple_priorities:0/threads:128                       5.16µs ± 3%             5.14µs ± 6%     ~           (p=0.474 n=17+17)
BM_MutexEnqueue/multiple_priorities:0/threads:512                       5.40µs ± 8%             5.32µs ± 5%     ~           (p=0.196 n=20+18)
BM_MutexEnqueue/multiple_priorities:1/threads:4                         3.99µs ±10%             3.88µs ± 3%     ~           (p=0.074 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:64                        8.48µs ± 9%             5.41µs ± 3%  -36.20%        (p=0.000 n=20+16)
BM_MutexEnqueue/multiple_priorities:1/threads:128                       12.2µs ± 6%              5.6µs ± 4%  -54.43%        (p=0.000 n=20+17)
BM_MutexEnqueue/multiple_priorities:1/threads:512                       32.1µs ± 3%              5.9µs ± 8%  -81.45%        (p=0.000 n=20+20)
...
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:32   1.69µs ± 4%             1.66µs ± 2%   -1.91%        (p=0.000 n=20+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:48   1.90µs ± 2%             1.82µs ± 2%   -4.09%        (p=0.000 n=20+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:64   2.19µs ± 2%             1.80µs ± 1%  -17.89%        (p=0.000 n=20+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:96   2.18µs ± 5%             1.81µs ± 1%  -16.94%        (p=0.000 n=17+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:128  2.18µs ± 1%             1.91µs ± 2%  -12.33%        (p=0.000 n=19+20)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:192  2.27µs ± 2%             1.89µs ± 1%  -16.79%        (p=0.000 n=20+19)
BM_Contended<absl::Mutex>/cs_ns:2000/num_prios:2/real_time/threads:256  2.36µs ± 2%             1.83µs ± 1%  -22.25%        (p=0.000 n=20+19)

PiperOrigin-RevId: 350775432

--
e7812590e5dbd75d21e2e8762713bd04c0353ef6 by Todd Lipcon <tlipcon@google.com>:

Fix test timeouts for sequence_lock_test on TSAN

PiperOrigin-RevId: 350680903

--
3090d8154d875f3eabce48876321ae8d6a197302 by Todd Lipcon <tlipcon@google.com>:

Add benchmarks for Mutex performance with multiple priorities

This adds a new benchmark to mutex_benchmark which forces threads to go
through the slow "Enqueue" path. The benchmark runs with varying numbers
of threads and with/without the presence of a lower-priority waiter.

PiperOrigin-RevId: 350655403
GitOrigin-RevId: b927776da818c674a674e46a7bbbdd54170a0ad3
Change-Id: If739e5e205f0d3867661a52466b8f64e7e033b22
---
 absl/flags/internal/sequence_lock_test.cc | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

(limited to 'absl/flags')

diff --git a/absl/flags/internal/sequence_lock_test.cc b/absl/flags/internal/sequence_lock_test.cc
index 9aff1edc..ff8b476b 100644
--- a/absl/flags/internal/sequence_lock_test.cc
+++ b/absl/flags/internal/sequence_lock_test.cc
@@ -13,6 +13,7 @@
 // limitations under the License.
 #include "absl/flags/internal/sequence_lock.h"
 
+#include <algorithm>
 #include <atomic>
 #include <thread>  // NOLINT(build/c++11)
 #include <tuple>
@@ -112,13 +113,21 @@ std::vector<int> MultiplicativeRange(int low, int high, int scale) {
   return result;
 }
 
-INSTANTIATE_TEST_SUITE_P(TestManyByteSizes, ConcurrentSequenceLockTest,
-                         testing::Combine(
-                             // Buffer size (bytes).
-                             testing::Range(1, 128),
-                             // Number of reader threads.
-                             testing::ValuesIn(MultiplicativeRange(
-                                 1, absl::base_internal::NumCPUs(), 2))));
+#ifndef ABSL_HAVE_THREAD_SANITIZER
+const int kMaxThreads = absl::base_internal::NumCPUs();
+#else
+// With TSAN, a lot of threads contending for atomic access on the sequence
+// lock make this test run too slowly.
+const int kMaxThreads = std::min(absl::base_internal::NumCPUs(), 4);
+#endif
+
+INSTANTIATE_TEST_SUITE_P(
+    TestManyByteSizes, ConcurrentSequenceLockTest,
+    testing::Combine(
+        // Buffer size (bytes).
+        testing::Range(1, 128),
+        // Number of reader threads.
+        testing::ValuesIn(MultiplicativeRange(1, kMaxThreads, 2))));
 
 // Simple single-threaded test, parameterized by the size of the buffer to be
 // protected.
-- 
cgit v1.2.3