| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
https://google.github.io/styleguide/cppguide.html#Designated_initializers
recommends using designated initializers as does https://abseil.io/tips/172,
but apparently they are a non-standard extension prior to C++20.
For maximum compatibility, avoid using them here.
Fixes #1413
PiperOrigin-RevId: 516892890
Change-Id: Id7b7857891e39eb52132c3edf70e5bf4973755af
|
|
|
|
| |
This shows that these are member functions that do not modify a class's data.
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 511271203
Change-Id: I1ed352e06265b705b62d401a50b4699d01f7f1d7
|
| |
| |
| |
| | |
These make the changed constructors match closer to the other ones that are default.
|
|/
|
|
| |
This also helps a lot with dealing with conversions and data structure creation under the hood.
|
|
|
|
|
| |
PiperOrigin-RevId: 507790741
Change-Id: I347357f9a2d698510f29b7d1b065ef73f9289292
|
|
|
|
|
| |
PiperOrigin-RevId: 505184961
Change-Id: I64482558a76abda6896bec4b2d323833b6cd7edf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation can be optimized to not having to perform an ExtendByZero operation.
`RemoveCrc32cSuffix` can simply be implemented as
uint32_t result = static_cast<uint32_t>(full_string_crc) ^
static_cast<uint32_t>(suffix_crc);
CrcEngine()->UnextendByZeroes(&result, suffix_len);
return crc32c_t{result};
Math proof that this change is correct:
`ComputeCrc32c` actually computes the following:
ConditionedCRC(data) = UnconditionedCRC(data) + StartValue(data) + ~0
with:
StartValue(data) = ~0 * x**BitLength(data) mod P
(with `+` being a carry-less add, ie an xor).
``UnconditionedCRC` in the context of this description means: no initial or final xor with ~0 and a starting value of zero - ie the result that `CrcEngine()->Extend` would give you with a starting value of 0.
Given `full_string_crc` and `suffix_crc` (both conditioned CRCs), xoring them together results in:
(1):
full_string_crc + suffix_crc =
UnconditionedCRC(full_string) + StartValue(full_string) + ~0
+ UnconditionedCRC(suffix) + StartValue(suffix) + ~0
Since `+` is carry-less addition (ie an XOR), the two ~0 cancel each other out.
(2)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string) + StartValue(full_string)
+ UnconditionedCRC(suffix) + StartValue(suffix)
We can make use of the fact that:
(3)
UnconditionedCRC(full_string) + UnConditionedCRC(suffix)
= UnconditionedCRC(full_string_with_suffix_replaced_by_zeros).
Ie, UnconditionedCRC("AABBB") + UnconditionedCRC("BBB") = UnconditionedCRC("AA\0\0\0")
Putting (3) into (2) yields:
(4)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_with_suffix_replaced_by_zeros)
+ StartValue(full_string) + StartValue(suffix)
Using:
(5)
UnconditionedCRC(full_string_with_suffix_replaced_by_zeros)
=
UnconditionedCRC(full_string_without_suffix) * x**Bitlength(suffix) mod P
and putting (5) into (4)
(6)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**Bitlength(suffix) mod P +
StartValue(full_string) + StartValue(suffix)
Using
(7)
StartValue(full_string) = ~0 * x ** Bitlength(full_string) mod P
and
(8)
StartValue(suffix) = ~0 * x**BitLength(suffix) mod P
Putting (7) and (8) in (6):
(9):
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ ~0 * x ** Bitlength(full_string) mod P
+ ~0 * x ** BitLength(suffix) mod P
Using:
(10)
Bitlength(full_string) =
Bitlength(full_string_without_suffix) +
Bitlength(suffix)
And putting (10) in (9):
(11)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ ~0 * x ** (Bitlength(full_string_without_suffix) + Bitlength(suffix)) mod P
+ ~0 * x ** BitLength(suffix) mod P
using x**(A+B) = x**A * x**B results in:
(12)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ [ ~0 * x ** Bitlength(full_string_without_suffix) * x**Bitlength(suffix)] mod P
+ ~0 * x ** BitLength(suffix) mod P
using A mod P + B mod P + C mod P = (A + B + C) mod P:
(this works in carry-less arithmetic)
(13)
full_string_crc + suffix_crc = [
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix))
+ [ ~0 * x ** Bitlength(full_string_without_suffix) * x**Bitlength(suffix)]
+ ~0 * x ** BitLength(suffix) ] mod P
Factor out x**Bitlength(suffix):
(14)
full_string_crc + suffix_crc = [
x**(Bitlength(suffix)) * [
UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix)
+ ~0 ] mod P
Using:
(15)
ConditionedCRC(full_string_without_suffix) =
[ UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix) ] mod P + ~0
=
[ UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix) + ~0] mod P
(~0 is less than x**32, so ~0 mod P = ~0)
Putting (15) in (14) results in:
full_string_crc + suffix_crc = [
x**(Bitlength(suffix)) * ConditionedCRC(full_string_without_suffix)] mod P
Or:
(16)
ConditionedCRC(full_string_without_suffix) =
(full_string_crc + suffix_crc) * x**(-Bitlength(suffix)) mod P
A multiplication by x**(-8*bytelength) mod P is implemented by `CrcEngine()->UnextendByZeros`.
PiperOrigin-RevId: 502659140
Change-Id: I66b0700d258f948be0885f691370b73d7fad56e3
|
|
|
|
|
| |
PiperOrigin-RevId: 501464530
Change-Id: I5a0929a2b88c1c158b1696634a65ffda9c4b8590
|
|
|
|
|
|
|
|
| |
This also ensures that there is only one definition of
GetArchSpecificEngines by moving the condition to a common place.
PiperOrigin-RevId: 500038304
Change-Id: If0c55d701dfdc11a1a9c8c1b34eb220435529ffb
|
|
|
|
|
|
|
|
| |
32-bit builds with SSE 4.2 do exist, and these builds do not work
without this patch.
PiperOrigin-RevId: 499498979
Change-Id: I0ade09068804655652c07d0f1ef13554464a1558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already prefetch in case of large inputs, do the same
for medium sized inputs as well. This is mostly neutral
for performance in most cases, so this also adds a new
bench with working size >> cache size to ensure that we
are seeing performance benefits of prefetch. Main benefits
are on AMD with hardware prefetchers turned off:
AMD prefetchers on:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 1% 2.43ns ± 1% ~ (p=0.814 n=40+40)
BM_Calculate/1 2.50ns ± 2% 2.50ns ± 2% ~ (p=0.745 n=39+39)
BM_Calculate/100 9.17ns ± 1% 9.17ns ± 2% ~ (p=0.747 n=40+40)
BM_Calculate/10000 474ns ± 1% 474ns ± 2% ~ (p=0.749 n=40+40)
BM_Calculate/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.298 n=39+40)
BM_Extend/0 1.38ns ± 1% 1.38ns ± 1% ~ (p=0.651 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 1% ~ (p=0.957 n=40+39)
BM_Extend/100 9.48ns ± 1% 9.48ns ± 2% ~ (p=1.000 n=40+40)
BM_Extend/10000 474ns ± 2% 474ns ± 1% ~ (p=0.928 n=40+40)
BM_Extend/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.331 n=40+40)
BM_Extend/100000000 4.79ms ± 1% 4.79ms ± 1% ~ (p=0.753 n=38+38)
BM_ExtendCacheMiss/10 25.5ms ± 2% 25.5ms ± 2% ~ (p=0.988 n=38+40)
BM_ExtendCacheMiss/100 23.1ms ± 2% 23.1ms ± 2% ~ (p=0.792 n=40+40)
BM_ExtendCacheMiss/1000 37.2ms ± 1% 28.6ms ± 2% -23.00% (p=0.000 n=38+40)
BM_ExtendCacheMiss/100000 7.77ms ± 2% 7.74ms ± 2% -0.45% (p=0.006 n=40+40)
AMD prefetchers off:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 2% 2.43ns ± 2% ~ (p=0.351 n=40+39)
BM_Calculate/1 2.51ns ± 2% 2.51ns ± 1% ~ (p=0.535 n=40+40)
BM_Calculate/100 9.18ns ± 2% 9.15ns ± 2% ~ (p=0.120 n=38+39)
BM_Calculate/10000 475ns ± 2% 475ns ± 2% ~ (p=0.852 n=40+40)
BM_Calculate/500000 22.9µs ± 2% 22.8µs ± 2% ~ (p=0.396 n=40+40)
BM_Extend/0 1.38ns ± 2% 1.38ns ± 2% ~ (p=0.466 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 2% ~ (p=0.914 n=40+39)
BM_Extend/100 9.49ns ± 2% 9.49ns ± 2% ~ (p=0.802 n=40+40)
BM_Extend/10000 475ns ± 2% 474ns ± 1% ~ (p=0.589 n=40+40)
BM_Extend/500000 22.8µs ± 2% 22.8µs ± 2% ~ (p=0.872 n=39+40)
BM_Extend/100000000 10.0ms ± 3% 10.0ms ± 4% ~ (p=0.355 n=40+40)
BM_ExtendCacheMiss/10 196ms ± 2% 196ms ± 2% ~ (p=0.698 n=40+40)
BM_ExtendCacheMiss/100 129ms ± 1% 129ms ± 1% ~ (p=0.602 n=36+37)
BM_ExtendCacheMiss/1000 88.6ms ± 1% 57.2ms ± 1% -35.49% (p=0.000 n=36+38)
BM_ExtendCacheMiss/100000 14.9ms ± 1% 14.9ms ± 1% ~ (p=0.888 n=39+40)
Intel skylake:
BM_Calculate/0 2.49ns ± 2% 2.44ns ± 4% -2.15% (p=0.001 n=31+34)
BM_Calculate/1 3.04ns ± 2% 2.98ns ± 9% -1.95% (p=0.003 n=31+35)
BM_Calculate/100 8.64ns ± 3% 8.53ns ± 5% ~ (p=0.065 n=31+35)
BM_Calculate/10000 290ns ± 3% 285ns ± 7% -1.80% (p=0.004 n=28+34)
BM_Calculate/500000 11.8µs ± 2% 11.6µs ± 8% -1.59% (p=0.003 n=26+34)
BM_Extend/0 1.56ns ± 1% 1.52ns ± 3% -2.44% (p=0.000 n=26+35)
BM_Extend/1 1.88ns ± 3% 1.83ns ± 6% -2.17% (p=0.001 n=27+35)
BM_Extend/100 9.31ns ± 3% 9.13ns ± 7% -1.92% (p=0.000 n=33+38)
BM_Extend/10000 290ns ± 3% 283ns ± 3% -2.45% (p=0.000 n=32+38)
BM_Extend/500000 11.8µs ± 2% 11.5µs ± 8% -1.80% (p=0.001 n=35+37)
BM_Extend/100000000 6.39ms ±10% 6.11ms ± 8% -4.34% (p=0.000 n=40+40)
BM_ExtendCacheMiss/10 36.2ms ± 7% 35.8ms ±14% ~ (p=0.281 n=33+37)
BM_ExtendCacheMiss/100 26.9ms ±15% 25.9ms ±12% -3.93% (p=0.000 n=40+40)
BM_ExtendCacheMiss/1000 23.8ms ± 5% 23.4ms ± 5% -1.68% (p=0.001 n=39+40)
BM_ExtendCacheMiss/100000 10.1ms ± 5% 10.0ms ± 4% ~ (p=0.051 n=39+39)
PiperOrigin-RevId: 495119444
Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
|
|
|
|
|
| |
PiperOrigin-RevId: 494749165
Change-Id: I8d855be9c508a9fdfb5f60e87471c0947057ecc9
|
|
|
|
|
| |
PiperOrigin-RevId: 494587777
Change-Id: I41504edca6fcf750d52602fa84a33bc7fe5fbb48
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 493386604
Change-Id: I289cb38b4a3da5760ab7ef3976d402d165d7e10f
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coverage of the accelerated CRC implementation and some differences
bewteen the internal and external implementation.
This change adds CI coverage to the
linux_clang-latest_libstdcxx_bazel.sh script assuming this script
always runs on machines of at least the Intel Haswell generation.
Fixes include:
* Remove the use of the deprecated xor operator on crc32c_t
* Remove #pragma unroll_completely, which isn't known by GCC or Clang:
https://godbolt.org/z/97j4vbacs
* Fixes for -Wsign-compare, -Wsign-conversion and -Wshorten-64-to-32
PiperOrigin-RevId: 491965029
Change-Id: Ic5e1f3a20f69fcd35fe81ebef63443ad26bf7931
|
|
|
|
|
| |
PiperOrigin-RevId: 491722639
Change-Id: Iff13661095d10c82599ad30f7220700825a78c9e
|
|
|
|
|
|
|
|
|
|
| |
std::array has a special-case to allow this
https://en.cppreference.com/w/cpp/container/array
Fixes #1332
PiperOrigin-RevId: 491703960
Change-Id: Ib83a1f0865448314e463e8ebf39ae3b842f762ea
|
|
|
|
|
| |
PiperOrigin-RevId: 491681300
Change-Id: I4ecdd3bf359cda7592b6c392a2fbb61b8394f71b
|
|
|
|
|
|
|
|
|
| |
The motivation is to explicitly remove and document dangerous
operations like adding crc32c_t to a set, because equality is not
enough to guarantee uniqueness.
PiperOrigin-RevId: 491656425
Change-Id: I7b4dadc1a59ea9861e6ec7a929d64b5746467832
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to https://stackoverflow.com/a/68939636 it is safe to use
__m128i instead.
https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170 also uses this type instead
Fixes #1330
PiperOrigin-RevId: 491427300
Change-Id: I4a1d44ac4d5e7c1e1ee063ff397935df118254a1
|
|
|
|
|
|
|
| |
Fixes #1329
PiperOrigin-RevId: 491372279
Change-Id: I93c094b06ece9cb9bdb39fd4541353e0344a1a57
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we take generic/default code-path on AMD due to misspelling.
Mostly helps with crc+memcpy:
name old speed new speed delta
BM_Memcpy/1 156MB/s ± 1% 156MB/s ± 1% ~ (p=0.563 n=18+18)
BM_Memcpy/100 6.38GB/s ± 1% 6.50GB/s ± 1% +1.89% (p=0.000 n=19+19)
BM_Memcpy/10000 14.6GB/s ± 1% 21.7GB/s ± 0% +49.01% (p=0.000 n=20+19)
BM_Memcpy/500000 13.5GB/s ± 1% 19.9GB/s ± 0% +47.35% (p=0.000 n=18+17)
PiperOrigin-RevId: 490572650
Change-Id: Id7901321a23262c0ab62a2d82fae86cf42acf16d
|
|
|
|
|
|
|
| |
Using /arch:AVX on MSVC now uses the accelerated implementation
PiperOrigin-RevId: 490550573
Change-Id: I924259845f38ee41d15f23f95ad085ad664642b5
|
|
|
|
|
| |
PiperOrigin-RevId: 488373221
Change-Id: I1e30820188cc860ce4df8fddafa04de343ec46af
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the build on arm64 macOS.
Note that hardware acceleration is not yet enabled on arm64 when not
running under Linux.
Addresses the report from https://github.com/abseil/abseil-cpp/commit/1687dbf814eceb93de2d93f91b31acaab404091c#commitcomment-89529264
PiperOrigin-RevId: 487655295
Change-Id: I168dfc863c960d0b694b26dfcb85ff0fd0e95a1e
|
|
This implementation can advantage of hardware acceleration available
on common CPUs when using GCC and Clang. A future update may enable
this on MSVC as well.
PiperOrigin-RevId: 487327024
Change-Id: I99a8f1bcbdf25297e776537e23bd0a902e0818a1
|