Add a bench to measure the best way to pack from int to uint16_t with SSE. - skia

diff options

author	mtklein <mtklein@chromium.org>	2016-07-15 07:45:53 -0700
committer	Commit bot <commit-bot@chromium.org>	2016-07-15 07:45:53 -0700
commit	036e1831e05ae3a6ec9bcd30cb24f6b1a49a3541 (patch)
tree	81efe17768f56658fc48fc7a694e352809da3072 /src/opts/Sk4px_NEON.h
parent	58e389b0518b46bbe58ba01c23443cf23c18435c (diff)

Add a bench to measure the best way to pack from int to uint16_t with SSE.

I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002

Diffstat (limited to 'src/opts/Sk4px_NEON.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: