| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
the results of a bunch of accessors of CommonFields.
Motivation: this makes it easier to refactor CommonFields to be smaller.
PiperOrigin-RevId: 551616928
Change-Id: I3710443fb156537d716944584bea02f945559e99
|
|
|
|
|
|
|
| |
calling erase(begin(), end()) resets reserved growth.
PiperOrigin-RevId: 551248712
Change-Id: I34755c63e3ee40da4ba7047e0d24eec567d28173
|
|
|
|
|
|
|
|
| |
motivation is to avoid potentially expanding the table unnecessarily later.
Note: I prefer doing this as a special case in erase(iterator, iterator) rather than special casing erase(iterator) for size==1 because IIUC that changes the time complexity of erase(iterator) from O(1) to O(N) and in pathological cases, it could change loops from O(N) to O(N^2).
PiperOrigin-RevId: 549661855
Change-Id: I8603324260f51a98809db32f840ff09f25cf2481
|
|
|
|
|
|
|
| |
corresponding function in raw_hash_set.
PiperOrigin-RevId: 549379884
Change-Id: I305745dbea2c15821b2092441c9b4546fc7aabbe
|
|
|
|
|
| |
PiperOrigin-RevId: 548794485
Change-Id: Ie82d5f8ad752518ef05b38144ca1e32b21c9def8
|
|
|
|
|
|
|
| |
fails.
PiperOrigin-RevId: 545683736
Change-Id: Ic0abec5037e160898e2f24de1b1489caff7d06fd
|
|
|
|
|
|
|
| |
easier to change how we store this information internally.
PiperOrigin-RevId: 544732832
Change-Id: I0c9a30f18edc71b3c7fffe94e2002ff6c52050f8
|
|
|
|
|
|
|
| |
to change how we store this information internally.
PiperOrigin-RevId: 544718317
Change-Id: I0ff3e4df7e810be9f7c5db4328995e172eb704a5
|
|
|
|
|
|
|
| |
how we store this information internally.
PiperOrigin-RevId: 544667586
Change-Id: I9b1943ca71ea1c1f8699832422cd7bc095ac8b2d
|
|
|
|
| |
https://google.github.io/styleguide/cppguide.html#Function_Comments
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 536785792
Change-Id: I2963dea81a75b01b7275d784f6a2908816d0c7bf
|
|/ |
|
|
|
|
|
| |
PiperOrigin-RevId: 529119690
Change-Id: If585274c409e2a344c8d60759da6f8f990023d29
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 527066823
Change-Id: Ifa1e9a43c7490b34f9f4dbfa12d3acbed6b49777
|
|/ |
|
|
|
|
|
|
|
| |
different empty hashtables are compared.
PiperOrigin-RevId: 513568915
Change-Id: I3f387d0bae0e86749dff540e4fdd5037304ac975
|
|
|
|
|
|
|
| |
BM_DropDeletes 73.4µs ± 0% 68.9µs ± 1% -6.22% (p=0.008 n=5+5)
PiperOrigin-RevId: 511813266
Change-Id: Id28cece454d583e2dfe060e27cfc4720f987f009
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 511695308
Change-Id: I502cdc75e993582eaca5cd91ed068238936a9640
|
| |
| |
| |
| |
| |
| |
| | |
distinguish between the "likely erased" and "could have rehashed" cases when generations are enabled, (b) suggest running under ASan when generations aren't enabled and doing so would narrow down the possible error cases, and (c) make ABSL_INTERNAL_ASSERT_IS_FULL not be a macro.
PiperOrigin-RevId: 511255275
Change-Id: I5a44a813cd310837d0bd0209d2187b100be201e7
|
|/
|
|
| |
We support GCC 7 and up, so we can remove this.
|
|
|
|
|
|
|
| |
that we can distinguish between end() iterators and default-constructed iterators in debug mode.
PiperOrigin-RevId: 509606271
Change-Id: I77b68590b3904a4cf7809b75d814d74cb89603b6
|
|
|
|
|
|
|
|
|
| |
compared.
We change AssertSameContainer to be after AssertIsValidForComparison calls so that we can have more specific failure messages.
PiperOrigin-RevId: 508472485
Change-Id: Iff2f7512086948a4aca7fd403596e8d4fde53b2a
|
|
|
|
|
|
|
|
|
| |
randomly rehashing on insertions when there is no reserved growth.
Rollforward of ed59f62f8bbc5f05bcba2f89ee16f107e03813f2
PiperOrigin-RevId: 506314970
Change-Id: I7a654aef36bb169da9ea5c618789ee771f05fe28
|
|
|
|
|
|
|
|
|
| |
randomly rehashing on insertions when there is no reserved growth.
Rollback of ed59f62f8bbc5f05bcba2f89ee16f107e03813f2
PiperOrigin-RevId: 506003574
Change-Id: I1766321f279a3226e2821e0390387d5639d28964
|
|
|
|
|
|
|
| |
rehashing on insertions when there is no reserved growth.
PiperOrigin-RevId: 505807487
Change-Id: I9051a04f6a75e579d16e9ae8defd404bcc377fba
|
|
|
|
|
| |
PiperOrigin-RevId: 505184961
Change-Id: I64482558a76abda6896bec4b2d323833b6cd7edf
|
|
|
|
|
|
|
| |
growth runs out.
PiperOrigin-RevId: 502625638
Change-Id: I1c06b2162dbdaaa6a36cea503ac6d07cd157b2e2
|
|
|
|
|
| |
PiperOrigin-RevId: 499964205
Change-Id: I45a1d62a5e093921946e7c3c7ab31480252b330e
|
|
|
|
|
|
|
| |
reserved growth if the reservation wouldn't grow the table.
PiperOrigin-RevId: 497246219
Change-Id: I9671236f56d10851c49de71c21899368be6c3a00
|
|
|
|
|
|
|
| |
arrays so that we can detect invalid iterator use.
PiperOrigin-RevId: 496455788
Change-Id: I83df92828098a3ef1181b4e454f3ac5d3ac7a2f2
|
|
|
|
|
|
|
|
| |
name old cpu/op new cpu/op delta
BM_Group_CountLeadingEmptyOrDeleted 0.98ns ± 0% 0.78ns ± 0% -20.51% (p=0.000 n=10+10)
PiperOrigin-RevId: 496397005
Change-Id: I1c6b325b14566da194f21d3387b6f4d838bf0b34
|
|
|
|
|
| |
PiperOrigin-RevId: 493993005
Change-Id: I0705be8678022a9e08a1af9972687b7955593994
|
|
|
|
|
|
|
|
|
|
|
| |
first member of the settings_ CompressedTuple so that we can move growth_left into CommonFields.
This allows for removing growth_left as a separate argument for a few functions.
Also, move the infoz() accessor functions to be before the data members of CommonFields to comply with the style guide.
PiperOrigin-RevId: 493918310
Change-Id: I58474e37d3b16a1513d2931af6b153dea1d809c2
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
called. When the variable is a global the compiler is allowed to instantiate it more
aggresively and it might happen before the types involved are complete.
When it is inside a function the compiler can't instantiate it until after the
functions are called.
Remove an unused member from the vtable.
Replace transfer_slot_fn with a generic function when relocation is available to reduce duplication.
PiperOrigin-RevId: 492227302
Change-Id: I07499f63b91c59c0ae42402683387c7b84a6f0ee
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL makes a bunch of changes (mostly to raw_hash_set which
underlies flat_hash_set and flat_hash_map). Techniques used:
* Extract code that does not depend on the specific hash table type
into common (non-inlined) functions.
* Place ABSL_ATTRIBUTE_NOINLINE directives judiciously.
* Out-of-line some slow paths.
Reduces sizes of some large binaries by ~0.5%.
Has no significant performance impact on a few performance critical
binaries.
## Speed of fleetbench micro-benchmarks
Following is a histogram of %-age changes in
[fleetbench](https://github.com/google/fleetbench)
hot_swissmap_benchmark results. Negative numbers indicate a speedup
caused by this change. Statistically insignificant changes are mapped
to zero.
XXX Also run and merge in cold_swissmap_benchmark
Across all 351 benchmarks, the average speedup is 0.38%.
The best speedup was -25%, worst slowdown was +6.81%.
```
Count: 351 Average: -0.382764 StdDev: 3.77807
Min: -25 Median: 0.435135 Max: 6.81
---------------------------------------------
[ -25, -10) 16 4.558% 4.558% #
[ -9, -8) 2 0.570% 5.128%
[ -8, -7) 1 0.285% 5.413%
[ -7, -6) 1 0.285% 5.698%
[ -6, -5) 2 0.570% 6.268%
[ -5, -4) 5 1.425% 7.692%
[ -4, -3) 13 3.704% 11.396% #
[ -3, -2) 15 4.274% 15.670% #
[ -2, -1) 26 7.407% 23.077% ##
[ -1, 0) 14 3.989% 27.066% #
[ 0, 1) 185 52.707% 79.772% ############
[ 1, 2) 14 3.989% 83.761% #
[ 2, 3) 8 2.279% 86.040% #
[ 3, 4) 7 1.994% 88.034%
[ 4, 5) 32 9.117% 97.151% ##
[ 5, 6) 6 1.709% 98.860%
[ 6, 7) 4 1.140% 100.000%
```
We looked at the slowdowns and they do not seem worth worrying
about. E.g., the worst one was:
```
BM_FindHit_Hot<::absl::node_hash_set,64>/set_size:4096/density:0
2.61ns ± 1% 2.79ns ± 1% +6.81% (p=0.008 n=5+5)
```
## Detailed changes
* Out-of-line slow paths in hash table sampler methods.
* Explicitly unregister from sampler instead of from destructor.
* Introduced a non-templated CommonFields struct that holds some of
the hash table fields (infoz, ctrl, slots, size, capacity). This
struct can be passed to new non-templated helpers. The struct is
a private base class of raw_hash_set.
* Made non-inlined InitializeSlots<> that is only templated on
allocator and size/alignment of the slot type so that we can share
instantiations across types that have the same size/alignment.
* Moved some infrequently called code paths into non-inlined type-erased.
functions. Pass a suite of type-specific function pointers to these
routines for when they need to operate on slots.
* Marked some methods as non-inlined.
* Avoid unnecessary reinitialization in destructor.
* Introduce UpdateSpine type-erased helper that is called from
clear() and rehash().
PiperOrigin-RevId: 491413386
Change-Id: Ia5495c5a6ec73622a785a0d260e406ddb9085a7c
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, ~raw_hash_set() would change *this to have the same
representation as an empty hash table. This is unnecessary since
nobody should be touching a destroyed hash table, and prevents future
optimizations/changes that might not be able to preserve this
behavior.
PiperOrigin-RevId: 487899950
Change-Id: I2d4470677bdd411c2e48ef511187e39f4e7fc2f4
|
|
|
|
|
|
|
| |
We check for comparisons of swisstable iterators from different heap allocations, which can indicate either iterators from different containers or that there was a rehash between when the iterators were initialized.
PiperOrigin-RevId: 487304602
Change-Id: I5c596c5ea07948d66e048f99937f9032a630344f
|
|
|
|
|
|
|
|
|
| |
- Separate the failure cases into different assertions: end/default constructed vs rehashed or erased.
- Update the assertion error for AssertIsValid to not mention the end iterator case because end iterators are considered valid by AssertIsValid.
- Also fix an out-of-date comment for skip_empty_or_deleted.
PiperOrigin-RevId: 485402559
Change-Id: I593056abdc6c3565d0396fb885923fef643bf4e4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Addresses failures with the following, in some files:
-Wshorten-64-to-32
-Wimplicit-int-conversion
-Wsign-compare
-Wsign-conversion
-Wtautological-unsigned-zero-compare
(This specific CL focuses on .cc files in */internal/.)
Bug: chromium:1292951
PiperOrigin-RevId: 473868797
Change-Id: Ibe0b76e33f9e001d59862beaac54fb47bacd39b2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Addresses failures with the following, in some files:
-Wshorten-64-to-32
-Wimplicit-int-conversion
-Wsign-compare
-Wsign-conversion
-Wtautological-unsigned-zero-compare
(This specific CL focuses on .cc files in */internal/.)
Bug: chromium:1292951
PiperOrigin-RevId: 471561809
Change-Id: I7abd6d83706f5ca135f1ce3458192a498a6280b9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Addresses failures with the following, in some files:
-Wshorten-64-to-32
-Wimplicit-int-conversion
-Wsign-compare
-Wsign-conversion
-Wtautological-unsigned-zero-compare
(This specific CL focuses on .cc files in */internal/.)
Bug: chromium:1292951
PiperOrigin-RevId: 471549854
Change-Id: Id685d0e4666212926f4e001b8ef4930b6a33a4cc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Addresses failures with the following, in some files:
-Wshorten-64-to-32
-Wimplicit-int-conversion
-Wsign-compare
-Wsign-conversion
-Wtautological-unsigned-zero-compare
(This specific CL focuses on .h and win32 .inc files.)
Bug: chromium:1292951
PiperOrigin-RevId: 463835431
Change-Id: If8e5f7f651d5cd96035e23e4623bdb08a7fedabe
|
|
|
|
|
| |
PiperOrigin-RevId: 463214218
Change-Id: I54a37fd9560b480f9eaf0454670eacf875015fe8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ensures that emmintrin.h is included with clang-cl. Otherwise, errors like
this occur:
.../absl/container/internal/raw_hash_set.h(536,8): error: unknown type name '__m128i'
inline __m128i _mm_cmpgt_epi8_fixed(__m128i a, __m128i b) {
This aligns the include ifdef guards and the guards that use the provided APIs.
The includes are also reordered, so they appear after the config header which
provides the internal macro values.
PiperOrigin-RevId: 456530271
Change-Id: I86dfd0022fd06fe7aa132138ec4d1bd14a86ba84
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://pastebin.com/fDvgWgHe
After having a chat with Dougall Johnson (https://twitter.com/dougallj/status/1534213050944802816), we realized that __clzll works with zero arguments per documentation:
https://developer.arm.com/documentation/101028/0009/Data-processing-intrinsics
```
Returns the number of leading zero bits in x. When x is zero it returns the argument width, i.e. 32 or 64.
```
Codegen improves https://godbolt.org/z/ebadf717Y
Thus we can use a little bit different construction not involving CLS but using more understandable CLZ and removing some operations.
PiperOrigin-RevId: 453879080
Change-Id: Ie2d7f834f63364d7bd50dd6a682c107985f21942
|
|
|
|
|
|
|
|
|
|
|
| |
https://pastebin.com/CmnzwUFN
The key idea is to avoid using 16 byte NEON and use 8 byte NEON which has lower latency for BitMask::Match. Even though 16 byte NEON achieves higher throughput, in SwissMap it's very important to catch these Matches with low latency as probing on average happens at most once.
I also introduced NonIterableMask as ARM has really great cbnz instructions and additional AND on scalar mask had 1 extra latency cycle
PiperOrigin-RevId: 453216147
Change-Id: I842c50d323954f8383ae156491232ced55aacb78
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Benchmarks: https://pastebin.com/tZ7dr67W. Works well especially on smaller ranges.
After a week on spending optimizing NEON SIMD where I almost managed to make hash tables work with NEON SIMD without performance hits (still 1 cycle to optimize and I gave up a little), I found an interesting optimization for aarch64 to use cls instruction (count leading sign bits).
The loop has a property that ctrl_ group is not matched against count when the first slot is empty or deleted.
```
void skip_empty_or_deleted() {
while (IsEmptyOrDeleted(*ctrl_)) {
uint32_t shift = Group{ctrl_}.CountLeadingEmptyOrDeleted();
ctrl_ += shift;
slot_ += shift;
}
...
}
```
However, `kEmpty` and `kDeleted` have format of `1xxxxxx0` and `~ctrl & (ctrl >> 7)` always sets the lowest bit to 1.
In naive implementation, it does +1 to start counting zero bits, however, in aarch64 we may start counting one bits immediately. This saves 1 cycle and 5% of iteration performance.
Then it becomes hard to find a supported and sustainable C++ version of it.
`__clsll` is not supported by GCC and was supported only since clang 8, `__builtin_clrsb` is not producing optimal codegen for clang. `__rbit` is not supported by GCC and there is no intrinsic to do that, however, in clang we have `__builtin_bitreverse{32,64}`. For now I decided to enable this only for clang, only if they have appropriate builtins.
PiperOrigin-RevId: 451168570
Change-Id: I7e9256a60aecdc88ced4e6eb15ebc257281b6664
|
|
|
|
|
|
|
|
|
|
|
| |
functions.
Add a new (internal) feature test macro to detect whether the wrappers are no-ops on a given platform.
Note that one-arg __builtin_prefetch(x) is equivalent to __builtin_prefetch(x, 0, 3), per `man BUILTIN_PREFETCH(3)` and gcc docs.
PiperOrigin-RevId: 449508660
Change-Id: I144e750205eec0c956d8dd62bc72e10bdb87c4f7
|
|
|
|
|
|
|
| |
Also note that this probe sequence visits every group exactly once.
PiperOrigin-RevId: 446535602
Change-Id: I13169be3f8ee6a4ddbbe8be84f1e1a482444d0cc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
complicated, slightly incorrect.
The old analysis viewed it as birthday attack, which asks how often
there are multiple values in the probe same probe sequence with the
same H2. In their own words, this analysis "breaks down" at around `n
= 12`.
Instead we can answer a simpler question, which is, if a probe
sequence examines `k` objects that are not what we are looking for,
what's the number of calls `==`? The expectation is simply `k/128`.
PiperOrigin-RevId: 446518063
Change-Id: Ie879bd4f6c97979822bc9d550b9e2503b1418c78
|