| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
This will allow us to give visibility to other Google-internal libraries. The
change is necessary since //visibility:private cannot be combined with other
specifications.
PiperOrigin-RevId: 615779561
Change-Id: I82b1edfa4e1ca280e429cf2a5e4003a1cc316a60
|
|
|
|
|
|
|
| |
https://bazel.build/build/style-guide#other-conventions
PiperOrigin-RevId: 603084345
Change-Id: Ibd7c9573d820f88059d12c46ff82d7d322d002ae
|
|
|
|
|
|
|
| |
Note that this only changes how we allocate the empty state, and reference countings of `empty` stay the same.
PiperOrigin-RevId: 599526339
Change-Id: I2c6aaf875c144c947e17fe8f69692b1195b55dd7
|
|
|
|
|
| |
PiperOrigin-RevId: 572575394
Change-Id: Ic1c5ac2423b1634e50c43bad6daa14e82a8f3e2c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The layering_check feature ensures that rules that include a header
explicitly depend on a rule that exports that header. Compiler support
is required, and currently only Clang 16+ supports diagnoses
layering_check failures.
The parse_headers feature ensures headers are self-contained by
compiling them with -fsyntax-only on supported compilers.
PiperOrigin-RevId: 572350144
Change-Id: I37297f761566d686d9dd58d318979d688b7e36d1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL rolls forward a previous change which we rolled back temporarily due to
compilation errors on x86 when PCLMUL intrinsics were unavailable.
*** Original change description ***
This change replaces inline x86 intrinsics with generic versions that compile
for both x86 and ARM depending on the target arch.
This change does not enable the accelerated crc memcpy engine on ARM. That will
be done in a subsequent change after the optimal number of vector and integer
regions for different CPUs is determined.
***
PiperOrigin-RevId: 563416413
Change-Id: Iee630a15ed83c26659adb0e8a03d3f3d3a46d688
|
|
|
|
|
|
|
|
| |
In some configurations this change causes compilation errors. We will roll this
forward again after those issue are addressed.
PiperOrigin-RevId: 562810916
Change-Id: I45b2a8d456273e9eff188f36da8f11323c4dfe66
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change replaces inline x86 intrinsics with generic versions that compile
for both x86 and ARM depending on the target arch.
This change does not enable the accelerated crc memcpy engine on ARM. That will
be done in a subsequent change after the optimal number of vector and integer
regions for different CPUs is determined.
PiperOrigin-RevId: 562785420
Change-Id: I8ba4aa8de17587cedd92532f03767059a481f159
|
|
|
|
|
|
|
| |
natively
PiperOrigin-RevId: 552940359
Change-Id: I925764757404c0c9f2a13ed729190d51f4ac46cf
|
|
|
|
|
|
|
| |
hex instead of dec
PiperOrigin-RevId: 552927211
Change-Id: I0375d60a9df4cdfc694fe8d3b3d790f80fc614a1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already prefetch in case of large inputs, do the same
for medium sized inputs as well. This is mostly neutral
for performance in most cases, so this also adds a new
bench with working size >> cache size to ensure that we
are seeing performance benefits of prefetch. Main benefits
are on AMD with hardware prefetchers turned off:
AMD prefetchers on:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 1% 2.43ns ± 1% ~ (p=0.814 n=40+40)
BM_Calculate/1 2.50ns ± 2% 2.50ns ± 2% ~ (p=0.745 n=39+39)
BM_Calculate/100 9.17ns ± 1% 9.17ns ± 2% ~ (p=0.747 n=40+40)
BM_Calculate/10000 474ns ± 1% 474ns ± 2% ~ (p=0.749 n=40+40)
BM_Calculate/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.298 n=39+40)
BM_Extend/0 1.38ns ± 1% 1.38ns ± 1% ~ (p=0.651 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 1% ~ (p=0.957 n=40+39)
BM_Extend/100 9.48ns ± 1% 9.48ns ± 2% ~ (p=1.000 n=40+40)
BM_Extend/10000 474ns ± 2% 474ns ± 1% ~ (p=0.928 n=40+40)
BM_Extend/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.331 n=40+40)
BM_Extend/100000000 4.79ms ± 1% 4.79ms ± 1% ~ (p=0.753 n=38+38)
BM_ExtendCacheMiss/10 25.5ms ± 2% 25.5ms ± 2% ~ (p=0.988 n=38+40)
BM_ExtendCacheMiss/100 23.1ms ± 2% 23.1ms ± 2% ~ (p=0.792 n=40+40)
BM_ExtendCacheMiss/1000 37.2ms ± 1% 28.6ms ± 2% -23.00% (p=0.000 n=38+40)
BM_ExtendCacheMiss/100000 7.77ms ± 2% 7.74ms ± 2% -0.45% (p=0.006 n=40+40)
AMD prefetchers off:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 2% 2.43ns ± 2% ~ (p=0.351 n=40+39)
BM_Calculate/1 2.51ns ± 2% 2.51ns ± 1% ~ (p=0.535 n=40+40)
BM_Calculate/100 9.18ns ± 2% 9.15ns ± 2% ~ (p=0.120 n=38+39)
BM_Calculate/10000 475ns ± 2% 475ns ± 2% ~ (p=0.852 n=40+40)
BM_Calculate/500000 22.9µs ± 2% 22.8µs ± 2% ~ (p=0.396 n=40+40)
BM_Extend/0 1.38ns ± 2% 1.38ns ± 2% ~ (p=0.466 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 2% ~ (p=0.914 n=40+39)
BM_Extend/100 9.49ns ± 2% 9.49ns ± 2% ~ (p=0.802 n=40+40)
BM_Extend/10000 475ns ± 2% 474ns ± 1% ~ (p=0.589 n=40+40)
BM_Extend/500000 22.8µs ± 2% 22.8µs ± 2% ~ (p=0.872 n=39+40)
BM_Extend/100000000 10.0ms ± 3% 10.0ms ± 4% ~ (p=0.355 n=40+40)
BM_ExtendCacheMiss/10 196ms ± 2% 196ms ± 2% ~ (p=0.698 n=40+40)
BM_ExtendCacheMiss/100 129ms ± 1% 129ms ± 1% ~ (p=0.602 n=36+37)
BM_ExtendCacheMiss/1000 88.6ms ± 1% 57.2ms ± 1% -35.49% (p=0.000 n=36+38)
BM_ExtendCacheMiss/100000 14.9ms ± 1% 14.9ms ± 1% ~ (p=0.888 n=39+40)
Intel skylake:
BM_Calculate/0 2.49ns ± 2% 2.44ns ± 4% -2.15% (p=0.001 n=31+34)
BM_Calculate/1 3.04ns ± 2% 2.98ns ± 9% -1.95% (p=0.003 n=31+35)
BM_Calculate/100 8.64ns ± 3% 8.53ns ± 5% ~ (p=0.065 n=31+35)
BM_Calculate/10000 290ns ± 3% 285ns ± 7% -1.80% (p=0.004 n=28+34)
BM_Calculate/500000 11.8µs ± 2% 11.6µs ± 8% -1.59% (p=0.003 n=26+34)
BM_Extend/0 1.56ns ± 1% 1.52ns ± 3% -2.44% (p=0.000 n=26+35)
BM_Extend/1 1.88ns ± 3% 1.83ns ± 6% -2.17% (p=0.001 n=27+35)
BM_Extend/100 9.31ns ± 3% 9.13ns ± 7% -1.92% (p=0.000 n=33+38)
BM_Extend/10000 290ns ± 3% 283ns ± 3% -2.45% (p=0.000 n=32+38)
BM_Extend/500000 11.8µs ± 2% 11.5µs ± 8% -1.80% (p=0.001 n=35+37)
BM_Extend/100000000 6.39ms ±10% 6.11ms ± 8% -4.34% (p=0.000 n=40+40)
BM_ExtendCacheMiss/10 36.2ms ± 7% 35.8ms ±14% ~ (p=0.281 n=33+37)
BM_ExtendCacheMiss/100 26.9ms ±15% 25.9ms ±12% -3.93% (p=0.000 n=40+40)
BM_ExtendCacheMiss/1000 23.8ms ± 5% 23.4ms ± 5% -1.68% (p=0.001 n=39+40)
BM_ExtendCacheMiss/100000 10.1ms ± 5% 10.0ms ± 4% ~ (p=0.051 n=39+39)
PiperOrigin-RevId: 495119444
Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
|
|
|
|
|
| |
PiperOrigin-RevId: 494587777
Change-Id: I41504edca6fcf750d52602fa84a33bc7fe5fbb48
|
|
|
|
|
|
|
| |
Fixes #1329
PiperOrigin-RevId: 491372279
Change-Id: I93c094b06ece9cb9bdb39fd4541353e0344a1a57
|
|
|
|
|
| |
PiperOrigin-RevId: 488373221
Change-Id: I1e30820188cc860ce4df8fddafa04de343ec46af
|
|
This implementation can advantage of hardware acceleration available
on common CPUs when using GCC and Clang. A future update may enable
this on MSVC as well.
PiperOrigin-RevId: 487327024
Change-Id: I99a8f1bcbdf25297e776537e23bd0a902e0818a1
|