aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/Sk4px_NEON.h
diff options
context:
space:
mode:
authorGravatar mtklein <mtklein@chromium.org>2016-07-15 07:45:53 -0700
committerGravatar Commit bot <commit-bot@chromium.org>2016-07-15 07:45:53 -0700
commit036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541 (patch)
tree81efe17768f56658fc48fc7a694e352809da3072 /src/opts/Sk4px_NEON.h
parent58e389b0518b46bbe58ba01c23443cf23c18435c (diff)
Add a bench to measure the best way to pack from int to uint16_t with SSE.
I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002
Diffstat (limited to 'src/opts/Sk4px_NEON.h')
0 files changed, 0 insertions, 0 deletions