| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 495308617
Change-Id: Ic373a80908e513ce3cc4a9156d49aac8ebf89024
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This was likely an unintentional behavior change made a while ago while trying to reduce duplication. The new behavior will always include the unexpanded macro in the error string. For example, `CHECK_EQ(MACRO(x), MACRO(y))` will now output "MACRO(x) == MACRO(y)" if it fails. Before this change, CHECK and QCHECK were the only macros that had this behavior.
Not using function-like macro aliases is a possible alternative here, but unfortunately that would flood the macro namespace downstream with CHECK* and break existing code.
PiperOrigin-RevId: 495138582
Change-Id: I6a1afd89a6b9334003362e5d3e55da68f86eec98
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We already prefetch in case of large inputs, do the same
for medium sized inputs as well. This is mostly neutral
for performance in most cases, so this also adds a new
bench with working size >> cache size to ensure that we
are seeing performance benefits of prefetch. Main benefits
are on AMD with hardware prefetchers turned off:
AMD prefetchers on:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 1% 2.43ns ± 1% ~ (p=0.814 n=40+40)
BM_Calculate/1 2.50ns ± 2% 2.50ns ± 2% ~ (p=0.745 n=39+39)
BM_Calculate/100 9.17ns ± 1% 9.17ns ± 2% ~ (p=0.747 n=40+40)
BM_Calculate/10000 474ns ± 1% 474ns ± 2% ~ (p=0.749 n=40+40)
BM_Calculate/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.298 n=39+40)
BM_Extend/0 1.38ns ± 1% 1.38ns ± 1% ~ (p=0.651 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 1% ~ (p=0.957 n=40+39)
BM_Extend/100 9.48ns ± 1% 9.48ns ± 2% ~ (p=1.000 n=40+40)
BM_Extend/10000 474ns ± 2% 474ns ± 1% ~ (p=0.928 n=40+40)
BM_Extend/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.331 n=40+40)
BM_Extend/100000000 4.79ms ± 1% 4.79ms ± 1% ~ (p=0.753 n=38+38)
BM_ExtendCacheMiss/10 25.5ms ± 2% 25.5ms ± 2% ~ (p=0.988 n=38+40)
BM_ExtendCacheMiss/100 23.1ms ± 2% 23.1ms ± 2% ~ (p=0.792 n=40+40)
BM_ExtendCacheMiss/1000 37.2ms ± 1% 28.6ms ± 2% -23.00% (p=0.000 n=38+40)
BM_ExtendCacheMiss/100000 7.77ms ± 2% 7.74ms ± 2% -0.45% (p=0.006 n=40+40)
AMD prefetchers off:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 2% 2.43ns ± 2% ~ (p=0.351 n=40+39)
BM_Calculate/1 2.51ns ± 2% 2.51ns ± 1% ~ (p=0.535 n=40+40)
BM_Calculate/100 9.18ns ± 2% 9.15ns ± 2% ~ (p=0.120 n=38+39)
BM_Calculate/10000 475ns ± 2% 475ns ± 2% ~ (p=0.852 n=40+40)
BM_Calculate/500000 22.9µs ± 2% 22.8µs ± 2% ~ (p=0.396 n=40+40)
BM_Extend/0 1.38ns ± 2% 1.38ns ± 2% ~ (p=0.466 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 2% ~ (p=0.914 n=40+39)
BM_Extend/100 9.49ns ± 2% 9.49ns ± 2% ~ (p=0.802 n=40+40)
BM_Extend/10000 475ns ± 2% 474ns ± 1% ~ (p=0.589 n=40+40)
BM_Extend/500000 22.8µs ± 2% 22.8µs ± 2% ~ (p=0.872 n=39+40)
BM_Extend/100000000 10.0ms ± 3% 10.0ms ± 4% ~ (p=0.355 n=40+40)
BM_ExtendCacheMiss/10 196ms ± 2% 196ms ± 2% ~ (p=0.698 n=40+40)
BM_ExtendCacheMiss/100 129ms ± 1% 129ms ± 1% ~ (p=0.602 n=36+37)
BM_ExtendCacheMiss/1000 88.6ms ± 1% 57.2ms ± 1% -35.49% (p=0.000 n=36+38)
BM_ExtendCacheMiss/100000 14.9ms ± 1% 14.9ms ± 1% ~ (p=0.888 n=39+40)
Intel skylake:
BM_Calculate/0 2.49ns ± 2% 2.44ns ± 4% -2.15% (p=0.001 n=31+34)
BM_Calculate/1 3.04ns ± 2% 2.98ns ± 9% -1.95% (p=0.003 n=31+35)
BM_Calculate/100 8.64ns ± 3% 8.53ns ± 5% ~ (p=0.065 n=31+35)
BM_Calculate/10000 290ns ± 3% 285ns ± 7% -1.80% (p=0.004 n=28+34)
BM_Calculate/500000 11.8µs ± 2% 11.6µs ± 8% -1.59% (p=0.003 n=26+34)
BM_Extend/0 1.56ns ± 1% 1.52ns ± 3% -2.44% (p=0.000 n=26+35)
BM_Extend/1 1.88ns ± 3% 1.83ns ± 6% -2.17% (p=0.001 n=27+35)
BM_Extend/100 9.31ns ± 3% 9.13ns ± 7% -1.92% (p=0.000 n=33+38)
BM_Extend/10000 290ns ± 3% 283ns ± 3% -2.45% (p=0.000 n=32+38)
BM_Extend/500000 11.8µs ± 2% 11.5µs ± 8% -1.80% (p=0.001 n=35+37)
BM_Extend/100000000 6.39ms ±10% 6.11ms ± 8% -4.34% (p=0.000 n=40+40)
BM_ExtendCacheMiss/10 36.2ms ± 7% 35.8ms ±14% ~ (p=0.281 n=33+37)
BM_ExtendCacheMiss/100 26.9ms ±15% 25.9ms ±12% -3.93% (p=0.000 n=40+40)
BM_ExtendCacheMiss/1000 23.8ms ± 5% 23.4ms ± 5% -1.68% (p=0.001 n=39+40)
BM_ExtendCacheMiss/100000 10.1ms ± 5% 10.0ms ± 4% ~ (p=0.051 n=39+39)
PiperOrigin-RevId: 495119444
Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
|
| |
| |
| |
| |
| |
| |
| | |
that the proper non-obsolete RFC 4648 is already listed in escaping.h's Base64Escape() documentation)
PiperOrigin-RevId: 494821805
Change-Id: Id3bffcb968a7c865c9a6bcbf241870c3674601ba
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
disassembly under LLVM. Due to the issue described in
https://github.com/abseil/abseil-cpp/issues/1340 and
https://github.com/google/benchmark/commit/8545dfb3ea301f5c77626a046d4756ef9f2e4970
it no longer builds under GCC.
The other changes are necessary to fix the build using the latest benchmark snapshot
Fixes #1340
PiperOrigin-RevId: 494809290
Change-Id: I4a03b2e2dcbdc273e59f1f09f204322e388e7cea
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 494749165
Change-Id: I8d855be9c508a9fdfb5f60e87471c0947057ecc9
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 494587777
Change-Id: I41504edca6fcf750d52602fa84a33bc7fe5fbb48
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
See cl/490546476 for details.
PiperOrigin-RevId: 494047255
Change-Id: Ic2f88d976fa9a70ff104c47e9daf682ab7d0b7d2
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 493993005
Change-Id: I0705be8678022a9e08a1af9972687b7955593994
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This was an unintentional behavior change when we added a new layer of macros. Not using function-like macro aliases would get around this, but unfortunately that would flood the macro namespace downstream with CHECK and LOG (and break existing code).
Note, the old behavior only applied to CHECK and QCHECK. Other CHECK macros already had multiple layers of function-like macros and were unaffected.
PiperOrigin-RevId: 493984662
Change-Id: I9a050dcaf01f2b6935f02cd42e23bc3a4d5fc62a
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After internal investigation, it’s no longer clear that the alternative
LowLevelHash mixer committed in a05366d851c5cb88065272f951e03955197e7c11
unequivocally improves performance on AArch64. It unnecessarily reduces
performance on Apple Silicon and the AWS Graviton. It also lowers hash
quality, which offsets much of the performance gain it provides on the
Arm Neoverse N1 (see https://github.com/abseil/abseil-cpp/issues/1093).
Switch back to the original mixer.
Closes: https://github.com/abseil/abseil-cpp/issues/1093
PiperOrigin-RevId: 493941913
Change-Id: I84c789b2f88c91dec22f6f0f6e8c5129d2939a6f
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
first member of the settings_ CompressedTuple so that we can move growth_left into CommonFields.
This allows for removing growth_left as a separate argument for a few functions.
Also, move the infoz() accessor functions to be before the data members of CommonFields to comply with the style guide.
PiperOrigin-RevId: 493918310
Change-Id: I58474e37d3b16a1513d2931af6b153dea1d809c2
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- The deadlock seems to occur if flag initialization happens to occur while a sample is being created.
- Each sample has its own mutex that is locked when a new sample is registered, i.e. created for the first time.
- The flag implicitly creates a global sampler object which locks `graveyard_`'s mutex.
- Usually, in `PushDead`, the `graveyard` is locked before the sample, hence triggering deadlock detection.
- This lock order can never be recreated since this code is executed exactly once per sample object, and the sample object cannot be accessed until after the method returns.
- It should therefore be safe to ignore any locking order condition that may occur during sample creation.
PiperOrigin-RevId: 493901903
Change-Id: I094abca82c1a8a82ac392383c72469d68eef09c4
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 493682437
Change-Id: I30f2ac36b998b86c24fe7513cd952b860560a66e
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 493617276
Change-Id: Ia7fb938c7abfba10e5b62f43f3cf71fb99b132f5
|
| |
| |
| |
| |
| |
| |
| | |
This change removes the 'create by single memcpy' CopyRaw() logic to avoid potential UB from the atomic refcount data being included in the memcpy. While we know this works on all supported platforms, it is officially UB, and we should change it to something else. This change changes the CopyRaw() method to explicitly create a (default initialized) instance, initialize `length` and `refcount`, and then memcpy the remaining contents which are trivial uint8_t and CordRep* values.
PiperOrigin-RevId: 493596072
Change-Id: I46618883eb1c7c9ed9eb083f4d3e7fc501f23df5
|
| |
| |
| |
| |
| |
| |
| | |
This will allow OSS code to use absl logging without necessarily polluting the preprocessor symbols with definitions for LOG and CHECK
PiperOrigin-RevId: 493404211
Change-Id: I7bc5807252218dd7fc26da3af13d5734ef8b2601
|
|\ \
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 493386604
Change-Id: I289cb38b4a3da5760ab7ef3976d402d165d7e10f
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This will allow us to create ABSL_-prefixed variants with shared implementation.
PiperOrigin-RevId: 493383908
Change-Id: I3529021df7afa642fadaf43eb9fd8249e9202758
|
|/ / |
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 493091927
Change-Id: I69a5e776b3d3193f3fd14458734e99bf020782f5
|
| |
| |
| |
| |
| |
| |
| | |
the discussions section on GitHub
PiperOrigin-RevId: 493078058
Change-Id: Iee972afbb14ab775fb153f3397545db142e8124c
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 492535520
Change-Id: I2e58b39bd4ab3064f675474c5e712c76fac02674
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 492481345
Change-Id: Ie77656ed334b54930ee852d31e2794a1fc58ce2f
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 492463896
Change-Id: I063759ca5ceb3597a7c8ab25af23aa688dee26c2
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
called. When the variable is a global the compiler is allowed to instantiate it more
aggresively and it might happen before the types involved are complete.
When it is inside a function the compiler can't instantiate it until after the
functions are called.
Remove an unused member from the vtable.
Replace transfer_slot_fn with a generic function when relocation is available to reduce duplication.
PiperOrigin-RevId: 492227302
Change-Id: I07499f63b91c59c0ae42402683387c7b84a6f0ee
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 492219541
Change-Id: Iee5d7941e413c8b960365e60fa0254536dd20e49
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491992941
Change-Id: Id66154cc4561770047b55625ef00014602975c5d
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
coverage of the accelerated CRC implementation and some differences
bewteen the internal and external implementation.
This change adds CI coverage to the
linux_clang-latest_libstdcxx_bazel.sh script assuming this script
always runs on machines of at least the Intel Haswell generation.
Fixes include:
* Remove the use of the deprecated xor operator on crc32c_t
* Remove #pragma unroll_completely, which isn't known by GCC or Clang:
https://godbolt.org/z/97j4vbacs
* Fixes for -Wsign-compare, -Wsign-conversion and -Wshorten-64-to-32
PiperOrigin-RevId: 491965029
Change-Id: Ic5e1f3a20f69fcd35fe81ebef63443ad26bf7931
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491915718
Change-Id: I7469601857b5a3506163518d29f49792f3053b34
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491723314
Change-Id: I68bc5a7ea5288982f6d0efb64c14fdbee4eec85a
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491722639
Change-Id: Iff13661095d10c82599ad30f7220700825a78c9e
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
std::array has a special-case to allow this
https://en.cppreference.com/w/cpp/container/array
Fixes #1332
PiperOrigin-RevId: 491703960
Change-Id: Ib83a1f0865448314e463e8ebf39ae3b842f762ea
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491681300
Change-Id: I4ecdd3bf359cda7592b6c392a2fbb61b8394f71b
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The motivation is to explicitly remove and document dangerous
operations like adding crc32c_t to a set, because equality is not
enough to guarantee uniqueness.
PiperOrigin-RevId: 491656425
Change-Id: I7b4dadc1a59ea9861e6ec7a929d64b5746467832
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
this parser for the static checker.
This fixes some outstanding bugs where the static checker differed from the
dynamic one.
Also, fix `%v` to be accepted with POSIX syntax.
Tested:
Presubmit
TGP OCL:487237262:BASE:490275393:1669141454896:92dd62e3
PiperOrigin-RevId: 491650577
Change-Id: Id138c108187428b3aea46f8887495f1da12c91b2
|
| |
| |
| |
| |
| |
| |
| | |
including for (size_t, char) overload.
PiperOrigin-RevId: 491456410
Change-Id: I76dec24b0bd02204fa38419af9247cee38b1cf50
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to https://stackoverflow.com/a/68939636 it is safe to use
__m128i instead.
https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170 also uses this type instead
Fixes #1330
PiperOrigin-RevId: 491427300
Change-Id: I4a1d44ac4d5e7c1e1ee063ff397935df118254a1
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This CL makes a bunch of changes (mostly to raw_hash_set which
underlies flat_hash_set and flat_hash_map). Techniques used:
* Extract code that does not depend on the specific hash table type
into common (non-inlined) functions.
* Place ABSL_ATTRIBUTE_NOINLINE directives judiciously.
* Out-of-line some slow paths.
Reduces sizes of some large binaries by ~0.5%.
Has no significant performance impact on a few performance critical
binaries.
## Speed of fleetbench micro-benchmarks
Following is a histogram of %-age changes in
[fleetbench](https://github.com/google/fleetbench)
hot_swissmap_benchmark results. Negative numbers indicate a speedup
caused by this change. Statistically insignificant changes are mapped
to zero.
XXX Also run and merge in cold_swissmap_benchmark
Across all 351 benchmarks, the average speedup is 0.38%.
The best speedup was -25%, worst slowdown was +6.81%.
```
Count: 351 Average: -0.382764 StdDev: 3.77807
Min: -25 Median: 0.435135 Max: 6.81
---------------------------------------------
[ -25, -10) 16 4.558% 4.558% #
[ -9, -8) 2 0.570% 5.128%
[ -8, -7) 1 0.285% 5.413%
[ -7, -6) 1 0.285% 5.698%
[ -6, -5) 2 0.570% 6.268%
[ -5, -4) 5 1.425% 7.692%
[ -4, -3) 13 3.704% 11.396% #
[ -3, -2) 15 4.274% 15.670% #
[ -2, -1) 26 7.407% 23.077% ##
[ -1, 0) 14 3.989% 27.066% #
[ 0, 1) 185 52.707% 79.772% ############
[ 1, 2) 14 3.989% 83.761% #
[ 2, 3) 8 2.279% 86.040% #
[ 3, 4) 7 1.994% 88.034%
[ 4, 5) 32 9.117% 97.151% ##
[ 5, 6) 6 1.709% 98.860%
[ 6, 7) 4 1.140% 100.000%
```
We looked at the slowdowns and they do not seem worth worrying
about. E.g., the worst one was:
```
BM_FindHit_Hot<::absl::node_hash_set,64>/set_size:4096/density:0
2.61ns ± 1% 2.79ns ± 1% +6.81% (p=0.008 n=5+5)
```
## Detailed changes
* Out-of-line slow paths in hash table sampler methods.
* Explicitly unregister from sampler instead of from destructor.
* Introduced a non-templated CommonFields struct that holds some of
the hash table fields (infoz, ctrl, slots, size, capacity). This
struct can be passed to new non-templated helpers. The struct is
a private base class of raw_hash_set.
* Made non-inlined InitializeSlots<> that is only templated on
allocator and size/alignment of the slot type so that we can share
instantiations across types that have the same size/alignment.
* Moved some infrequently called code paths into non-inlined type-erased.
functions. Pass a suite of type-specific function pointers to these
routines for when they need to operate on slots.
* Marked some methods as non-inlined.
* Avoid unnecessary reinitialization in destructor.
* Introduce UpdateSpine type-erased helper that is called from
clear() and rehash().
PiperOrigin-RevId: 491413386
Change-Id: Ia5495c5a6ec73622a785a0d260e406ddb9085a7c
|
| |
| |
| |
| |
| |
| |
| | |
Fixes #1329
PiperOrigin-RevId: 491372279
Change-Id: I93c094b06ece9cb9bdb39fd4541353e0344a1a57
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491367420
Change-Id: I6a0ab74bb0675fd910ed9fc95ee20c5023eb0cb6
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491338755
Change-Id: I813566ef69ba6121bb4d4b64ea483cd7c4cd6019
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
TSan misses synchronization around passing PerThreadSynch between threads
since it happens inside of the Mutex code (which me mostly ignore),
so we need to ignore all accesses to the object.
PiperOrigin-RevId: 491297912
Change-Id: I13ea2015dee5c1a3fc4315c85112902ccffccc45
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 491266544
Change-Id: I0dd222f6d9fe49f1fdcdb11cf732c13c353e7695
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we take generic/default code-path on AMD due to misspelling.
Mostly helps with crc+memcpy:
name old speed new speed delta
BM_Memcpy/1 156MB/s ± 1% 156MB/s ± 1% ~ (p=0.563 n=18+18)
BM_Memcpy/100 6.38GB/s ± 1% 6.50GB/s ± 1% +1.89% (p=0.000 n=19+19)
BM_Memcpy/10000 14.6GB/s ± 1% 21.7GB/s ± 0% +49.01% (p=0.000 n=20+19)
BM_Memcpy/500000 13.5GB/s ± 1% 19.9GB/s ± 0% +47.35% (p=0.000 n=18+17)
PiperOrigin-RevId: 490572650
Change-Id: Id7901321a23262c0ab62a2d82fae86cf42acf16d
|
| |
| |
| |
| |
| |
| |
| | |
Using /arch:AVX on MSVC now uses the accelerated implementation
PiperOrigin-RevId: 490550573
Change-Id: I924259845f38ee41d15f23f95ad085ad664642b5
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 490329293
Change-Id: Ied36e737e85afc683cc7cc116ac6bc07092472df
|
| |
| |
| |
| |
| |
| |
| | |
ifdefs inside btree_iterator.
PiperOrigin-RevId: 490317784
Change-Id: I4ffe2a1ad2e39890790e278d82eec7223b67908c
|
| |
| |
| |
| |
| | |
PiperOrigin-RevId: 490228223
Change-Id: Iec5af16531132a903aaa3e584dd87b03feb2c0c7
|