| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
This bug does not affect any users currently since AcceleratedCrcMemcpyEngine
is never configured with a single region currently.
Before this CL, if the number of regions for the AcceleratedCrcMemcpyEngine was
set to one, the CRC for the sole region would be incorrectly concatenated onto
itself and corrupted.
PiperOrigin-RevId: 561663848
Change-Id: Ibfc596306ab07db906d2e3ecf6eea3f6cb9f1b2b
|
|
|
|
|
| |
PiperOrigin-RevId: 561444259
Change-Id: I205ba9f11f4d41163ce74ae9cfa417fe500ccab3
|
|
|
|
|
| |
PiperOrigin-RevId: 561119886
Change-Id: Ia1483fdb237f4b211068c7ad1f780ab3e6b81eca
|
|
|
|
|
| |
PiperOrigin-RevId: 561108037
Change-Id: Idff65e288384cb55ce69f789db2d9374ae781d3d
|
|
|
|
|
|
|
|
|
| |
Using the non-temporal AVX engine for unknown CPU types looks like a mistake to
me, and the default built into the switch case is to use the fallback engine. I
don't think this is causing issues now, but it might once we add ARM support.
PiperOrigin-RevId: 561097994
Change-Id: I7f0edd447017c09acd49e4ea11476e32740d630a
|
|
|
|
|
| |
PiperOrigin-RevId: 554936252
Change-Id: Idb2ffbbc11aa6c98414fdd1ec38873d4687ab5e7
|
|
|
|
|
|
|
| |
natively
PiperOrigin-RevId: 552940359
Change-Id: I925764757404c0c9f2a13ed729190d51f4ac46cf
|
|
|
|
|
|
|
| |
hex instead of dec
PiperOrigin-RevId: 552927211
Change-Id: I0375d60a9df4cdfc694fe8d3b3d790f80fc614a1
|
|
|
|
|
| |
PiperOrigin-RevId: 552638642
Change-Id: I6b43289ca10ee9aecd6b848e78471863b22b01d1
|
|
|
|
|
|
|
| |
implementation.
PiperOrigin-RevId: 539749773
Change-Id: Iec83431ffd360a077b153cea00427580ae287d1f
|
|
|
|
|
|
|
|
|
|
|
|
| |
Imported from GitHub PR https://github.com/abseil/abseil-cpp/pull/1452
__cpuid is declared in intrin.h, but is excluded on non-Windows platforms.
We add this declaration to compensate.
Fixes #1358
PiperOrigin-RevId: 534449804
Change-Id: I91027f79d8d52c4da428d5c3a53e2cec00825c13
|
|
|
|
|
| |
PiperOrigin-RevId: 534213948
Change-Id: I56b897060b9afe9d3d338756c80e52f421653b55
|
|
|
|
|
| |
PiperOrigin-RevId: 534179290
Change-Id: I9ad24518cc6a336fbaf602269fb01319491c8b60
|
| |
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 527066823
Change-Id: Ifa1e9a43c7490b34f9f4dbfa12d3acbed6b49777
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://google.github.io/styleguide/cppguide.html#Designated_initializers
recommends using designated initializers as does https://abseil.io/tips/172,
but apparently they are a non-standard extension prior to C++20.
For maximum compatibility, avoid using them here.
Fixes #1413
PiperOrigin-RevId: 516892890
Change-Id: Id7b7857891e39eb52132c3edf70e5bf4973755af
|
|
|
|
| |
This shows that these are member functions that do not modify a class's data.
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 511271203
Change-Id: I1ed352e06265b705b62d401a50b4699d01f7f1d7
|
| |
| |
| |
| | |
These make the changed constructors match closer to the other ones that are default.
|
|/
|
|
| |
This also helps a lot with dealing with conversions and data structure creation under the hood.
|
|
|
|
|
| |
PiperOrigin-RevId: 507790741
Change-Id: I347357f9a2d698510f29b7d1b065ef73f9289292
|
|
|
|
|
| |
PiperOrigin-RevId: 505184961
Change-Id: I64482558a76abda6896bec4b2d323833b6cd7edf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation can be optimized to not having to perform an ExtendByZero operation.
`RemoveCrc32cSuffix` can simply be implemented as
uint32_t result = static_cast<uint32_t>(full_string_crc) ^
static_cast<uint32_t>(suffix_crc);
CrcEngine()->UnextendByZeroes(&result, suffix_len);
return crc32c_t{result};
Math proof that this change is correct:
`ComputeCrc32c` actually computes the following:
ConditionedCRC(data) = UnconditionedCRC(data) + StartValue(data) + ~0
with:
StartValue(data) = ~0 * x**BitLength(data) mod P
(with `+` being a carry-less add, ie an xor).
``UnconditionedCRC` in the context of this description means: no initial or final xor with ~0 and a starting value of zero - ie the result that `CrcEngine()->Extend` would give you with a starting value of 0.
Given `full_string_crc` and `suffix_crc` (both conditioned CRCs), xoring them together results in:
(1):
full_string_crc + suffix_crc =
UnconditionedCRC(full_string) + StartValue(full_string) + ~0
+ UnconditionedCRC(suffix) + StartValue(suffix) + ~0
Since `+` is carry-less addition (ie an XOR), the two ~0 cancel each other out.
(2)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string) + StartValue(full_string)
+ UnconditionedCRC(suffix) + StartValue(suffix)
We can make use of the fact that:
(3)
UnconditionedCRC(full_string) + UnConditionedCRC(suffix)
= UnconditionedCRC(full_string_with_suffix_replaced_by_zeros).
Ie, UnconditionedCRC("AABBB") + UnconditionedCRC("BBB") = UnconditionedCRC("AA\0\0\0")
Putting (3) into (2) yields:
(4)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_with_suffix_replaced_by_zeros)
+ StartValue(full_string) + StartValue(suffix)
Using:
(5)
UnconditionedCRC(full_string_with_suffix_replaced_by_zeros)
=
UnconditionedCRC(full_string_without_suffix) * x**Bitlength(suffix) mod P
and putting (5) into (4)
(6)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**Bitlength(suffix) mod P +
StartValue(full_string) + StartValue(suffix)
Using
(7)
StartValue(full_string) = ~0 * x ** Bitlength(full_string) mod P
and
(8)
StartValue(suffix) = ~0 * x**BitLength(suffix) mod P
Putting (7) and (8) in (6):
(9):
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ ~0 * x ** Bitlength(full_string) mod P
+ ~0 * x ** BitLength(suffix) mod P
Using:
(10)
Bitlength(full_string) =
Bitlength(full_string_without_suffix) +
Bitlength(suffix)
And putting (10) in (9):
(11)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ ~0 * x ** (Bitlength(full_string_without_suffix) + Bitlength(suffix)) mod P
+ ~0 * x ** BitLength(suffix) mod P
using x**(A+B) = x**A * x**B results in:
(12)
full_string_crc + suffix_crc =
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix)) mod P
+ [ ~0 * x ** Bitlength(full_string_without_suffix) * x**Bitlength(suffix)] mod P
+ ~0 * x ** BitLength(suffix) mod P
using A mod P + B mod P + C mod P = (A + B + C) mod P:
(this works in carry-less arithmetic)
(13)
full_string_crc + suffix_crc = [
UnconditionedCRC(full_string_without_suffix) * x**(Bitlength(suffix))
+ [ ~0 * x ** Bitlength(full_string_without_suffix) * x**Bitlength(suffix)]
+ ~0 * x ** BitLength(suffix) ] mod P
Factor out x**Bitlength(suffix):
(14)
full_string_crc + suffix_crc = [
x**(Bitlength(suffix)) * [
UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix)
+ ~0 ] mod P
Using:
(15)
ConditionedCRC(full_string_without_suffix) =
[ UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix) ] mod P + ~0
=
[ UnconditionedCRC(full_string_without_suffix)
+ ~0 * x ** Bitlength(full_string_without_suffix) + ~0] mod P
(~0 is less than x**32, so ~0 mod P = ~0)
Putting (15) in (14) results in:
full_string_crc + suffix_crc = [
x**(Bitlength(suffix)) * ConditionedCRC(full_string_without_suffix)] mod P
Or:
(16)
ConditionedCRC(full_string_without_suffix) =
(full_string_crc + suffix_crc) * x**(-Bitlength(suffix)) mod P
A multiplication by x**(-8*bytelength) mod P is implemented by `CrcEngine()->UnextendByZeros`.
PiperOrigin-RevId: 502659140
Change-Id: I66b0700d258f948be0885f691370b73d7fad56e3
|
|
|
|
|
| |
PiperOrigin-RevId: 501464530
Change-Id: I5a0929a2b88c1c158b1696634a65ffda9c4b8590
|
|
|
|
|
|
|
|
| |
This also ensures that there is only one definition of
GetArchSpecificEngines by moving the condition to a common place.
PiperOrigin-RevId: 500038304
Change-Id: If0c55d701dfdc11a1a9c8c1b34eb220435529ffb
|
|
|
|
|
|
|
|
| |
32-bit builds with SSE 4.2 do exist, and these builds do not work
without this patch.
PiperOrigin-RevId: 499498979
Change-Id: I0ade09068804655652c07d0f1ef13554464a1558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already prefetch in case of large inputs, do the same
for medium sized inputs as well. This is mostly neutral
for performance in most cases, so this also adds a new
bench with working size >> cache size to ensure that we
are seeing performance benefits of prefetch. Main benefits
are on AMD with hardware prefetchers turned off:
AMD prefetchers on:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 1% 2.43ns ± 1% ~ (p=0.814 n=40+40)
BM_Calculate/1 2.50ns ± 2% 2.50ns ± 2% ~ (p=0.745 n=39+39)
BM_Calculate/100 9.17ns ± 1% 9.17ns ± 2% ~ (p=0.747 n=40+40)
BM_Calculate/10000 474ns ± 1% 474ns ± 2% ~ (p=0.749 n=40+40)
BM_Calculate/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.298 n=39+40)
BM_Extend/0 1.38ns ± 1% 1.38ns ± 1% ~ (p=0.651 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 1% ~ (p=0.957 n=40+39)
BM_Extend/100 9.48ns ± 1% 9.48ns ± 2% ~ (p=1.000 n=40+40)
BM_Extend/10000 474ns ± 2% 474ns ± 1% ~ (p=0.928 n=40+40)
BM_Extend/500000 22.8µs ± 1% 22.9µs ± 2% ~ (p=0.331 n=40+40)
BM_Extend/100000000 4.79ms ± 1% 4.79ms ± 1% ~ (p=0.753 n=38+38)
BM_ExtendCacheMiss/10 25.5ms ± 2% 25.5ms ± 2% ~ (p=0.988 n=38+40)
BM_ExtendCacheMiss/100 23.1ms ± 2% 23.1ms ± 2% ~ (p=0.792 n=40+40)
BM_ExtendCacheMiss/1000 37.2ms ± 1% 28.6ms ± 2% -23.00% (p=0.000 n=38+40)
BM_ExtendCacheMiss/100000 7.77ms ± 2% 7.74ms ± 2% -0.45% (p=0.006 n=40+40)
AMD prefetchers off:
name old time/op new time/op delta
BM_Calculate/0 2.43ns ± 2% 2.43ns ± 2% ~ (p=0.351 n=40+39)
BM_Calculate/1 2.51ns ± 2% 2.51ns ± 1% ~ (p=0.535 n=40+40)
BM_Calculate/100 9.18ns ± 2% 9.15ns ± 2% ~ (p=0.120 n=38+39)
BM_Calculate/10000 475ns ± 2% 475ns ± 2% ~ (p=0.852 n=40+40)
BM_Calculate/500000 22.9µs ± 2% 22.8µs ± 2% ~ (p=0.396 n=40+40)
BM_Extend/0 1.38ns ± 2% 1.38ns ± 2% ~ (p=0.466 n=40+40)
BM_Extend/1 1.53ns ± 2% 1.53ns ± 2% ~ (p=0.914 n=40+39)
BM_Extend/100 9.49ns ± 2% 9.49ns ± 2% ~ (p=0.802 n=40+40)
BM_Extend/10000 475ns ± 2% 474ns ± 1% ~ (p=0.589 n=40+40)
BM_Extend/500000 22.8µs ± 2% 22.8µs ± 2% ~ (p=0.872 n=39+40)
BM_Extend/100000000 10.0ms ± 3% 10.0ms ± 4% ~ (p=0.355 n=40+40)
BM_ExtendCacheMiss/10 196ms ± 2% 196ms ± 2% ~ (p=0.698 n=40+40)
BM_ExtendCacheMiss/100 129ms ± 1% 129ms ± 1% ~ (p=0.602 n=36+37)
BM_ExtendCacheMiss/1000 88.6ms ± 1% 57.2ms ± 1% -35.49% (p=0.000 n=36+38)
BM_ExtendCacheMiss/100000 14.9ms ± 1% 14.9ms ± 1% ~ (p=0.888 n=39+40)
Intel skylake:
BM_Calculate/0 2.49ns ± 2% 2.44ns ± 4% -2.15% (p=0.001 n=31+34)
BM_Calculate/1 3.04ns ± 2% 2.98ns ± 9% -1.95% (p=0.003 n=31+35)
BM_Calculate/100 8.64ns ± 3% 8.53ns ± 5% ~ (p=0.065 n=31+35)
BM_Calculate/10000 290ns ± 3% 285ns ± 7% -1.80% (p=0.004 n=28+34)
BM_Calculate/500000 11.8µs ± 2% 11.6µs ± 8% -1.59% (p=0.003 n=26+34)
BM_Extend/0 1.56ns ± 1% 1.52ns ± 3% -2.44% (p=0.000 n=26+35)
BM_Extend/1 1.88ns ± 3% 1.83ns ± 6% -2.17% (p=0.001 n=27+35)
BM_Extend/100 9.31ns ± 3% 9.13ns ± 7% -1.92% (p=0.000 n=33+38)
BM_Extend/10000 290ns ± 3% 283ns ± 3% -2.45% (p=0.000 n=32+38)
BM_Extend/500000 11.8µs ± 2% 11.5µs ± 8% -1.80% (p=0.001 n=35+37)
BM_Extend/100000000 6.39ms ±10% 6.11ms ± 8% -4.34% (p=0.000 n=40+40)
BM_ExtendCacheMiss/10 36.2ms ± 7% 35.8ms ±14% ~ (p=0.281 n=33+37)
BM_ExtendCacheMiss/100 26.9ms ±15% 25.9ms ±12% -3.93% (p=0.000 n=40+40)
BM_ExtendCacheMiss/1000 23.8ms ± 5% 23.4ms ± 5% -1.68% (p=0.001 n=39+40)
BM_ExtendCacheMiss/100000 10.1ms ± 5% 10.0ms ± 4% ~ (p=0.051 n=39+39)
PiperOrigin-RevId: 495119444
Change-Id: I67bcf3b0282b5e1c43122de2837a24c16b8aded7
|
|
|
|
|
| |
PiperOrigin-RevId: 494749165
Change-Id: I8d855be9c508a9fdfb5f60e87471c0947057ecc9
|
|
|
|
|
| |
PiperOrigin-RevId: 494587777
Change-Id: I41504edca6fcf750d52602fa84a33bc7fe5fbb48
|
|\
| |
| |
| |
| | |
PiperOrigin-RevId: 493386604
Change-Id: I289cb38b4a3da5760ab7ef3976d402d165d7e10f
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coverage of the accelerated CRC implementation and some differences
bewteen the internal and external implementation.
This change adds CI coverage to the
linux_clang-latest_libstdcxx_bazel.sh script assuming this script
always runs on machines of at least the Intel Haswell generation.
Fixes include:
* Remove the use of the deprecated xor operator on crc32c_t
* Remove #pragma unroll_completely, which isn't known by GCC or Clang:
https://godbolt.org/z/97j4vbacs
* Fixes for -Wsign-compare, -Wsign-conversion and -Wshorten-64-to-32
PiperOrigin-RevId: 491965029
Change-Id: Ic5e1f3a20f69fcd35fe81ebef63443ad26bf7931
|
|
|
|
|
| |
PiperOrigin-RevId: 491722639
Change-Id: Iff13661095d10c82599ad30f7220700825a78c9e
|
|
|
|
|
|
|
|
|
|
| |
std::array has a special-case to allow this
https://en.cppreference.com/w/cpp/container/array
Fixes #1332
PiperOrigin-RevId: 491703960
Change-Id: Ib83a1f0865448314e463e8ebf39ae3b842f762ea
|
|
|
|
|
| |
PiperOrigin-RevId: 491681300
Change-Id: I4ecdd3bf359cda7592b6c392a2fbb61b8394f71b
|
|
|
|
|
|
|
|
|
| |
The motivation is to explicitly remove and document dangerous
operations like adding crc32c_t to a set, because equality is not
enough to guarantee uniqueness.
PiperOrigin-RevId: 491656425
Change-Id: I7b4dadc1a59ea9861e6ec7a929d64b5746467832
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to https://stackoverflow.com/a/68939636 it is safe to use
__m128i instead.
https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170 also uses this type instead
Fixes #1330
PiperOrigin-RevId: 491427300
Change-Id: I4a1d44ac4d5e7c1e1ee063ff397935df118254a1
|
|
|
|
|
|
|
| |
Fixes #1329
PiperOrigin-RevId: 491372279
Change-Id: I93c094b06ece9cb9bdb39fd4541353e0344a1a57
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we take generic/default code-path on AMD due to misspelling.
Mostly helps with crc+memcpy:
name old speed new speed delta
BM_Memcpy/1 156MB/s ± 1% 156MB/s ± 1% ~ (p=0.563 n=18+18)
BM_Memcpy/100 6.38GB/s ± 1% 6.50GB/s ± 1% +1.89% (p=0.000 n=19+19)
BM_Memcpy/10000 14.6GB/s ± 1% 21.7GB/s ± 0% +49.01% (p=0.000 n=20+19)
BM_Memcpy/500000 13.5GB/s ± 1% 19.9GB/s ± 0% +47.35% (p=0.000 n=18+17)
PiperOrigin-RevId: 490572650
Change-Id: Id7901321a23262c0ab62a2d82fae86cf42acf16d
|
|
|
|
|
|
|
| |
Using /arch:AVX on MSVC now uses the accelerated implementation
PiperOrigin-RevId: 490550573
Change-Id: I924259845f38ee41d15f23f95ad085ad664642b5
|
|
|
|
|
| |
PiperOrigin-RevId: 488373221
Change-Id: I1e30820188cc860ce4df8fddafa04de343ec46af
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the build on arm64 macOS.
Note that hardware acceleration is not yet enabled on arm64 when not
running under Linux.
Addresses the report from https://github.com/abseil/abseil-cpp/commit/1687dbf814eceb93de2d93f91b31acaab404091c#commitcomment-89529264
PiperOrigin-RevId: 487655295
Change-Id: I168dfc863c960d0b694b26dfcb85ff0fd0e95a1e
|
|
This implementation can advantage of hardware acceleration available
on common CPUs when using GCC and Clang. A future update may enable
this on MSVC as well.
PiperOrigin-RevId: 487327024
Change-Id: I99a8f1bcbdf25297e776537e23bd0a902e0818a1
|