summaryrefslogtreecommitdiff
path: root/absl/container/internal/hashtablez_sampler_test.cc
diff options
context:
space:
mode:
authorGravatar Connal de Souza <connaldesouza@google.com>2023-08-30 12:22:55 -0700
committerGravatar Copybara-Service <copybara-worker@google.com>2023-08-30 12:23:48 -0700
commit37770938fc0fc9b25f458434031cf3d8f6c65e95 (patch)
treeb9876595368f1f1c7e451920e5782073e98282af /absl/container/internal/hashtablez_sampler_test.cc
parent99a3a6ae4af45fef96b5279ddaad6b9eb9e44462 (diff)
Optimize Resize and Iteration on Arm
There is a few cycles of overhead when transfering between GPR and Neon registers. We pay this cost for GroupAarch64Impl, largely because the speedup we get in Match() makes it profitable. After a Match call, if we do subsequent Group operations, we don't have to pay the full GPR <-> Neon cost, so it makes sense to do them with Neon instructions as well. However, in iteration and find_first_non_full(), we do not do a prior Match(), so the Mask/Count EmptyOrDeleted calls pay the full GPR <-> Neon cost. We can avoid this by using the GPR versions of the functions in the portable implementation of Group instead. We slightly change the order of operations in these functions (should be functionally a nop) in order to take advantage of Arm's free flexible second operand shifts with Logical operations. Iteration and Resize are roughly 8% and 12.6% faster respectively. This is not profitable on x86 because there is much lower GPR <-> xmm register latency and we use a 16-bit wide Group size. PiperOrigin-RevId: 561415183 Change-Id: I660b5bb84afedb05a12dcdf04d5b2e1514902760
Diffstat (limited to 'absl/container/internal/hashtablez_sampler_test.cc')
0 files changed, 0 insertions, 0 deletions