| Commit message (Collapse) | Author | Age |
|
|
|
|
| |
PiperOrigin-RevId: 571929102
Change-Id: I8fb907d047a4ff3bb42e7c7f99454fa687b8f1c9
|
|
|
|
|
| |
PiperOrigin-RevId: 571487219
Change-Id: I6fbb2ff19db2b6d77e55059004d65c8639eb7fca
|
|
|
|
|
| |
PiperOrigin-RevId: 571430428
Change-Id: I4777c37c5287d26a75f37fe059324ac218878f0e
|
|
|
|
|
| |
PiperOrigin-RevId: 571418371
Change-Id: Ie650a4e8c7a9fbb022b1d27e6800765b59fcfc0c
|
|
|
|
|
| |
PiperOrigin-RevId: 571347014
Change-Id: I716ca435128081f0e9b0434143103df579256f50
|
|
|
|
|
| |
PiperOrigin-RevId: 571322393
Change-Id: I0e227b0075d3133ee28c8f766a1be7872c101176
|
|
|
|
|
|
|
| |
This should make it more efficient to pass absl::Status parameters and return values, allowing them to be passed in a register.
PiperOrigin-RevId: 571213728
Change-Id: I2a0183aedc08c270d0af0e7a30a07590ea116896
|
|
|
|
|
| |
PiperOrigin-RevId: 571084409
Change-Id: I4e6c98ac11f4cb40b65cc9484188faa6168718b4
|
|
|
|
|
|
|
|
|
| |
and those that call it can be inlined sufficiently far
to mess up high-level skip-counts. But this function
assumes it is the bottommost frame, as in the comment below.
PiperOrigin-RevId: 570790048
Change-Id: I4d354f9e79e13aaa6a8a62a9e0870fbeac075de6
|
|
|
|
|
|
|
|
|
| |
including sanitizer mode checks.
Sanitizer mode can be used for canaries so performance is still relevant. This change also makes the code more uniform.
PiperOrigin-RevId: 570438923
Change-Id: I62859160eb9323e6420680a43fd23e97e8a62389
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
propagation and improve performance.
Correctness:
- We use swap to implement copy assignment and move assignment, which means that allocator propagation in copy/move assignment depends on `propagate_on_container_swap` in addition to `propagate_on_container_copy_assignment`/`propagate_on_container_move_assignment`.
- In swap, if `propagate_on_container_swap` is `false` and `get_allocator() != other.get_allocator()`, the behavior is undefined (https://en.cppreference.com/w/cpp/container/unordered_set/swap) - we should assert that this UB case isn't happening. For this reason, we also delete the NoPropagateOn_Swap test case in raw_hash_set_allocator_test.
Performance:
- Don't rely on swap so we don't have to do unnecessary copying into the moved-from sets.
- Don't use temp sets in move assignment.
- Default the move constructor of CommonFields.
- Avoid using exchange in raw_hash_set move constructor.
- In `raw_hash_set(raw_hash_set&& that, const allocator_type& a)` with unequal allocators and in move assignment with non-propagating unequal allocators, move set keys instead of copying them.
PiperOrigin-RevId: 570419290
Change-Id: I499e54f17d9cb0b0836601f5c06187d1f269a5b8
|
|
|
|
|
|
|
| |
This cl/ updates the link provided in the comment to point to a valid website. Currently the link points to https://screenshot.googleplex.com/BZhRp6mNJAtjMmz which is now a software company landing page.
PiperOrigin-RevId: 570384723
Change-Id: Ib6d17851046125957e092b59d845ddb7ecb1f7b7
|
|
|
|
|
| |
PiperOrigin-RevId: 570180405
Change-Id: If14b21a4d0df19546a47923a1f2a359b38fe6f93
|
|
|
|
|
|
|
| |
We test for `ABSL_INTERNAL_HAS_RTTI` in `absl::container_internal::TypeName` before calling `typeid`.
PiperOrigin-RevId: 570101013
Change-Id: I1f2f9b2f475a6beae50d0b88718b17b296311155
|
|
|
|
|
|
|
| |
node_handle
PiperOrigin-RevId: 568997790
Change-Id: I9899ccc95eeb9c8b92d0dceec7e2fc4a2b1102c0
|
|
|
|
|
| |
PiperOrigin-RevId: 568858834
Change-Id: I276efa86259aa425c4b6dff27f037f488a58c9ae
|
|
|
|
|
| |
PiperOrigin-RevId: 568845530
Change-Id: I8987053041423f1e8b122372f63b0a84e05eb594
|
|
|
|
|
| |
PiperOrigin-RevId: 568665135
Change-Id: I42ec9bc6cfe923777f7b60ea032c7b64428493c9
|
|
|
|
|
| |
PiperOrigin-RevId: 568652465
Change-Id: I9f72a11cb514eaf694dae589a19dc139891e7af2
|
|
|
|
|
|
|
| |
Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs.
PiperOrigin-RevId: 568645559
Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
|
|
|
|
|
| |
PiperOrigin-RevId: 568603611
Change-Id: I7a31e0d6336a7235a8dc6eeed5680625cb3b4298
|
|
|
|
|
|
|
| |
This also adds a test for `operator<<`.
PiperOrigin-RevId: 568590367
Change-Id: Ia0ad39cb582e7d24e6c4131827818d8c4b10dfd9
|
|
|
|
|
|
|
| |
the functors passed to it.
PiperOrigin-RevId: 568476251
Change-Id: Ic625c9b5300d1db496979c178ca1e655581f9276
|
|
|
|
|
| |
PiperOrigin-RevId: 567869792
Change-Id: I29948282b57b401f3199dc41160538aa9a8079a7
|
|
|
|
|
|
|
| |
`absl::Cord`.
PiperOrigin-RevId: 567695227
Change-Id: I13eb8a1872d2fe703b5f3b9bc8df7fec4381fb55
|
|
|
|
|
| |
PiperOrigin-RevId: 567415671
Change-Id: I59bfcb5ac9fbde227a4cdb3b497b0bd5969b0770
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a temporary workaround for an apparent compiler bug with pmull(2) instructions. The current hot loop looks like this:
mov w14, #0xef02,
lsl x15, x15, #6,
mov x13, xzr,
movk w14, #0x740e, lsl #16,
sub x15, x15, #0x40,
ldr q4, [x16, #0x4e0],
_LOOP_START:
add x16, x9, x13,
add x17, x12, x13,
fmov d19, x14, <--------- This is Loop invariant and expensive
add x13, x13, #0x40,
cmp x15, x13,
prfm pldl1keep, [x16, #0x140],
prfm pldl1keep, [x17, #0x140],
ldp x18, x0, [x16, #0x40],
crc32cx w10, w10, x18,
ldp x2, x18, [x16, #0x50],
crc32cx w10, w10, x0,
crc32cx w10, w10, x2,
ldp x0, x2, [x16, #0x60],
crc32cx w10, w10, x18,
ldp x18, x16, [x16, #0x70],
pmull2 v5.1q, v1.2d, v4.2d,
pmull2 v6.1q, v0.2d, v4.2d,
pmull2 v7.1q, v2.2d, v4.2d,
pmull2 v16.1q, v3.2d, v4.2d,
ldp q17, q18, [x17, #0x40],
crc32cx w10, w10, x0,
pmull v1.1q, v1.1d, v19.1d,
crc32cx w10, w10, x2,
pmull v0.1q, v0.1d, v19.1d,
crc32cx w10, w10, x18,
pmull v2.1q, v2.1d, v19.1d,
crc32cx w10, w10, x16,
pmull v3.1q, v3.1d, v19.1d,
ldp q20, q21, [x17, #0x60],
eor v1.16b, v17.16b, v1.16b,
eor v0.16b, v18.16b, v0.16b,
eor v1.16b, v1.16b, v5.16b,
eor v2.16b, v20.16b, v2.16b,
eor v0.16b, v0.16b, v6.16b,
eor v3.16b, v21.16b, v3.16b,
eor v2.16b, v2.16b, v7.16b,
eor v3.16b, v3.16b, v16.16b,
b.ne _LOOP_START
There is a redundant fmov that moves the same constant into a Neon register every loop iteration to be used in the PMULL instructions. The PMULL2 instructions already have this constant loaded into Neon registers. After this change, both the PMULL and PMULL2 instructions use the values in q4, and they are not reloaded every iteration. This fmov was expensive because it contends for execution units with crc32cx instructions. This is up to 20% faster for large inputs.
PiperOrigin-RevId: 567391972
Change-Id: I4c8e49750cfa5cc5730c3bb713bd9fd67657804a
|
|
|
|
|
|
|
|
|
| |
propagation (defined in test_allocator.h) and minimal alignment.
Also remove some extraneous value_types from typed tests. The motivation is to reduce btree_test compile time.
PiperOrigin-RevId: 567376572
Change-Id: I6ac6130b99faeadaedab8c2c7b05d5e23e77cc1e
|
|
|
|
|
|
|
| |
There are some regressions reported.
PiperOrigin-RevId: 567181925
Change-Id: I4ee8a61afd336de7ecb22ec307adb2068932bc8b
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
raw_hash_set.
`SwisstableDebugEnabled()` is also true for release builds with hardening
enabled. To minimize their impact in those builds:
- use `ABSL_PREDICT_FALSE()` to provide a compiler hint for code layout
- use `ABSL_RAW_LOG()` with a format string to reduce code size and improve
the chances that the hot paths will be inlined.
PiperOrigin-RevId: 567102494
Change-Id: I6734bd491d7b2e1fb9df0e86f4e29e6ad0a03102
|
|
|
|
|
| |
PiperOrigin-RevId: 567102456
Change-Id: I0750284c36850adbabc5ec0b4a2635aa8a967e53
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tidy up Mutex::[Reader]TryLock codegen by outlining slow path
and non-tail function call, and un-unrolling the loop.
Current codegen:
https://gist.githubusercontent.com/dvyukov/a4d353fd71ac873af9332c1340675b60/raw/226537ffa305b25a79ef3a85277fa870fee5191d/gistfile1.txt
New codegen:
https://gist.githubusercontent.com/dvyukov/686a094c5aa357025689764f155e5a29/raw/e3125c1cdb5669fac60faf336e2f60395e29d888/gistfile1.txt
name old cpu/op new cpu/op delta
BM_TryLock 18.0ns ± 0% 17.7ns ± 0% -1.64% (p=0.016 n=4+5)
BM_ReaderTryLock/real_time/threads:1 17.9ns ± 0% 17.9ns ± 0% -0.10% (p=0.016 n=5+5)
BM_ReaderTryLock/real_time/threads:72 9.61µs ± 8% 8.42µs ± 7% -12.37% (p=0.008 n=5+5)
PiperOrigin-RevId: 567006472
Change-Id: Iea0747e71bbf2dc1f00c70a4235203071d795b99
|
|
|
|
|
| |
PiperOrigin-RevId: 566991965
Change-Id: I6c4d64de79d303e69b18330bda04fdc84d40893d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ReaderLock/Unlock tries CAS only once.
Even if there is moderate contention from other readers only,
ReaderLock/Unlock go onto slow path, which does lots of additional work
before retrying the CAS (since there are only readers, the slow path
logic is not really needed for anything).
Retry CAS while there are only readers.
name old cpu/op new cpu/op delta
BM_ReaderLock/real_time/threads:1 17.9ns ± 0% 17.9ns ± 0% ~ (p=0.071 n=5+5)
BM_ReaderLock/real_time/threads:72 11.4µs ± 3% 8.4µs ± 4% -26.24% (p=0.008 n=5+5)
PiperOrigin-RevId: 566981511
Change-Id: I432a3c1d85b84943d0ad4776a34fa5bfcf5b3b8e
|
|
|
|
|
| |
PiperOrigin-RevId: 566961701
Change-Id: Id04e4c5a598f508a0fe7532ae8f084c583865f2d
|
|
|
|
|
| |
PiperOrigin-RevId: 566675048
Change-Id: Ie598c21474858974e4b4adbad401c61a38924c98
|
|
|
|
|
| |
PiperOrigin-RevId: 566650311
Change-Id: Ibfabee88ea9999d08ade05ece362f5a075d19695
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently Mutex::Lock contains not inlined non-tail call:
TryAcquireWithSpinning -> GetMutexGlobals -> LowLevelCallOnce -> init closure
This turns the function into non-leaf with stack frame allocation
and additional register use. Remove this non-tail call to make the function leaf.
Move spin iterations initialization to LockSlow.
Current Lock happy path:
00000000001edc20 <absl::Mutex::Lock()>:
1edc20: 55 push %rbp
1edc21: 48 89 e5 mov %rsp,%rbp
1edc24: 53 push %rbx
1edc25: 50 push %rax
1edc26: 48 89 fb mov %rdi,%rbx
1edc29: 48 8b 07 mov (%rdi),%rax
1edc2c: a8 19 test $0x19,%al
1edc2e: 75 0e jne 1edc3e <absl::Mutex::Lock()+0x1e>
1edc30: 48 89 c1 mov %rax,%rcx
1edc33: 48 83 c9 08 or $0x8,%rcx
1edc37: f0 48 0f b1 0b lock cmpxchg %rcx,(%rbx)
1edc3c: 74 42 je 1edc80 <absl::Mutex::Lock()+0x60>
... unhappy path ...
1edc80: 48 83 c4 08 add $0x8,%rsp
1edc84: 5b pop %rbx
1edc85: 5d pop %rbp
1edc86: c3 ret
New Lock happy path:
00000000001eea80 <absl::Mutex::Lock()>:
1eea80: 48 8b 07 mov (%rdi),%rax
1eea83: a8 19 test $0x19,%al
1eea85: 75 0f jne 1eea96 <absl::Mutex::Lock()+0x16>
1eea87: 48 89 c1 mov %rax,%rcx
1eea8a: 48 83 c9 08 or $0x8,%rcx
1eea8e: f0 48 0f b1 0f lock cmpxchg %rcx,(%rdi)
1eea93: 75 01 jne 1eea96 <absl::Mutex::Lock()+0x16>
1eea95: c3 ret
... unhappy path ...
PiperOrigin-RevId: 566488042
Change-Id: I62f854b82a322cfb1d42c34f8ed01b4677693fca
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if a thread already blocked on a Mutex,
but then failed to acquire the Mutex, we queue it in FIFO order again.
As the result unlucky threads can suffer bad latency
if they are requeued several times.
The least we can do for them is to queue in LIFO order after blocking.
PiperOrigin-RevId: 566478783
Change-Id: I8bac08325f20ff6ccc2658e04e1847fd4614c653
|
|
|
|
|
|
|
|
|
|
|
| |
annotation.
This moves the implementation of most methods from absl::Status to absl::status_internal::StatusRep, and ensures that no calls to absl::Status methods are in a cc file.
Stub implementations checking only inlined rep properties and calling no-op (RepToPointer) or out of line methods exist in status.h
PiperOrigin-RevId: 566187430
Change-Id: I356ec29c0970ffe82eac2a5d98850e647fcd5ea5
|
|
|
|
|
|
|
|
|
|
|
|
| |
CondVar wait morhping has a special case for timed waits.
The code goes back to 2006, it seems that there might have
been some reasons to do this back then.
But now it does not seem to be necessary.
Wait morphing should work just fine after timed CondVar waits.
Remove the special case and simplify code.
PiperOrigin-RevId: 565798838
Change-Id: I4e4d61ae7ebd521f5c32dfc673e57a0c245e7cfb
|
|
|
|
|
|
|
|
|
|
|
| |
flavors of these.
In particular, if ABSL_MIN_LOG_LEVEL exceeds kFatal, these should, upon failure, terminate the program without logging anything. The lack of logging should be visible to the optimizer so that it can strip string literals and stringified variable names from the object file.
Making some edge cases work under Clang required rewriting NormalizeLogSeverity to help make constraints on its return value more obvious to the optimizer.
PiperOrigin-RevId: 565792699
Change-Id: Ibb6a47d4956191bbbd0297e04492cddc354578e2
|
|
|
|
|
|
|
| |
move assignment.
PiperOrigin-RevId: 565730754
Change-Id: Id828847d32c812736669803c179351433dda4aa6
|
|
|
|
|
|
|
| |
that can be shared between different container tests.
PiperOrigin-RevId: 565693736
Change-Id: I59af987e30da03a805ce59ff0fb7eeae3fc08293
|
|
|
|
|
|
|
| |
compatible with AnyInvokable for const uses.
PiperOrigin-RevId: 565682320
Change-Id: I924dadf110481e572bdb8af0111fa62d6f553d90
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Remove special handling of Condition::kTrue.
Condition::kTrue is used very rarely (frequently its uses even indicate
confusion and bugs). But we pay few additional branches for kTrue
on all Condition operations.
Remove that special handling and simplify logic.
2. And remove known_false condition in Mutex code.
Checking known_false condition only causes slow down because:
1. We already built skip list with equivalent conditions
(and keep improving it on every Skip call). And when we built
the skip list, we used more capable GuaranteedEqual function
(it does not just check for equality of pointers,
but for also for equality of function/arg).
2. Condition pointer are rarely equal even for equivalent conditions
becuase temp Condition objects are usually created on the stack.
We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
but that slows down things even more (see point 1).
So remove the known_false optimization.
Benchmark results for this and the previous change:
name old cpu/op new cpu/op delta
BM_ConditionWaiters/0/1 36.0ns ± 0% 34.9ns ± 0% -3.02% (p=0.008 n=5+5)
BM_ConditionWaiters/1/1 36.0ns ± 0% 34.9ns ± 0% -2.98% (p=0.008 n=5+5)
BM_ConditionWaiters/2/1 35.9ns ± 0% 34.9ns ± 0% -3.03% (p=0.016 n=5+4)
BM_ConditionWaiters/0/8 55.5ns ± 5% 49.8ns ± 3% -10.33% (p=0.008 n=5+5)
BM_ConditionWaiters/1/8 36.2ns ± 0% 35.2ns ± 0% -2.90% (p=0.016 n=5+4)
BM_ConditionWaiters/2/8 53.2ns ± 7% 48.3ns ± 7% ~ (p=0.056 n=5+5)
BM_ConditionWaiters/0/64 295ns ± 1% 254ns ± 2% -13.73% (p=0.008 n=5+5)
BM_ConditionWaiters/1/64 36.2ns ± 0% 35.2ns ± 0% -2.85% (p=0.008 n=5+5)
BM_ConditionWaiters/2/64 290ns ± 6% 250ns ± 4% -13.68% (p=0.008 n=5+5)
BM_ConditionWaiters/0/512 5.50µs ±12% 4.99µs ± 8% ~ (p=0.056 n=5+5)
BM_ConditionWaiters/1/512 36.7ns ± 3% 35.2ns ± 0% -4.10% (p=0.008 n=5+5)
BM_ConditionWaiters/2/512 4.44µs ±13% 4.01µs ± 3% -9.74% (p=0.008 n=5+5)
BM_ConditionWaiters/0/4096 104µs ± 6% 101µs ± 3% ~ (p=0.548 n=5+5)
BM_ConditionWaiters/1/4096 36.2ns ± 0% 35.1ns ± 0% -3.03% (p=0.008 n=5+5)
BM_ConditionWaiters/2/4096 90.4µs ± 5% 85.3µs ± 7% ~ (p=0.222 n=5+5)
BM_ConditionWaiters/0/8192 384µs ± 5% 367µs ± 7% ~ (p=0.222 n=5+5)
BM_ConditionWaiters/1/8192 36.2ns ± 0% 35.2ns ± 0% -2.84% (p=0.008 n=5+5)
BM_ConditionWaiters/2/8192 363µs ± 3% 316µs ± 7% -12.84% (p=0.008 n=5+5)
PiperOrigin-RevId: 565669535
Change-Id: I5180c4a787933d2ce477b004a111853753304684
|
|
|
|
|
| |
PiperOrigin-RevId: 565662176
Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
|
|
|
|
|
|
|
| |
semantics.
PiperOrigin-RevId: 565330231
Change-Id: I84f0e9065986bb592b5bfb196b3fc221feb14bc4
|
|
|
|
|
| |
PiperOrigin-RevId: 565050503
Change-Id: I8f4c463be4ef513a2788745d1b454a7ede489152
|
|
|
|
|
| |
PiperOrigin-RevId: 565040001
Change-Id: I1c2e715c97375754c8d863132be2c388265ca4ad
|