Optimize Resize and Iteration on Arm - debian-abseil

diff options

author	Connal de Souza <connaldesouza@google.com>	2023-08-30 12:22:55 -0700
committer	Copybara-Service <copybara-worker@google.com>	2023-08-30 12:23:48 -0700
commit	37770938fc0fc9b25f458434031cf3d8f6c65e95 (patch)
tree	b9876595368f1f1c7e451920e5782073e98282af /absl/container/internal/hashtablez_sampler_test.cc
parent	99a3a6ae4af45fef96b5279ddaad6b9eb9e44462 (diff)

Optimize Resize and Iteration on Arm

There is a few cycles of overhead when transfering between GPR and Neon registers. We pay this cost for GroupAarch64Impl, largely because the speedup we get in Match() makes it profitable. After a Match call, if we do subsequent Group operations, we don't have to pay the full GPR <-> Neon cost, so it makes sense to do them with Neon instructions as well. However, in iteration and find_first_non_full(), we do not do a prior Match(), so the Mask/Count EmptyOrDeleted calls pay the full GPR <-> Neon cost. We can avoid this by using the GPR versions of the functions in the portable implementation of Group instead. We slightly change the order of operations in these functions (should be functionally a nop) in order to take advantage of Arm's free flexible second operand shifts with Logical operations. Iteration and Resize are roughly 8% and 12.6% faster respectively. This is not profitable on x86 because there is much lower GPR <-> xmm register latency and we use a 16-bit wide Group size. PiperOrigin-RevId: 561415183 Change-Id: I660b5bb84afedb05a12dcdf04d5b2e1514902760

Diffstat (limited to 'absl/container/internal/hashtablez_sampler_test.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: