aboutsummaryrefslogtreecommitdiffhomepage
path: root/samplecode/SampleMegaStroke.cpp
diff options
context:
space:
mode:
authorGravatar Mike Klein <mtklein@chromium.org>2017-08-03 00:04:12 -0400
committerGravatar Skia Commit-Bot <skia-commit-bot@chromium.org>2017-08-03 13:24:46 +0000
commite7f89fc257a5ddd83a314e7bbdd23cb17a461ae5 (patch)
tree77b99ba0f8714c42f56f1638c39d68fc2fd5e9f9 /samplecode/SampleMegaStroke.cpp
parent698edfecef121d8575eee6af207ce8a9525032ee (diff)
improve HSW 16->8 bit pack
__builtin_convertvector(..., U8x4) is producing a fairly long sequence of code to convert U16x4 to U8x4 on HSW: vextracti128 $0x1,%ymm2,%xmm3 vmovdqa 0x1848(%rip),%xmm4 vpshufb %xmm4,%xmm3,%xmm3 vpshufb %xmm4,%xmm2,%xmm2 vpunpcklqdq %xmm3,%xmm2,%xmm2 vextracti128 $0x1,%ymm0,%xmm3 vpshufb %xmm4,%xmm3,%xmm3 vpshufb %xmm4,%xmm0,%xmm0 vpunpcklqdq %xmm3,%xmm0,%xmm0 vinserti128 $0x1,%xmm2,%ymm0,%ymm0 We can do much better with _mm256_packus_epi16: vinserti128 $0x1,%xmm0,%ymm2,%ymm3 vperm2i128 $0x31,%ymm0,%ymm2,%ymm0 vpackuswb %ymm0,%ymm3,%ymm0 vpackuswb packs the values in a somewhat surprising order, which the first two instructions get us lined up for. This is a pretty noticeable speedup, 7-8% on some benchmarks. The same sort of change could be made for SSE2 and SSE4.1 also using _mm_packus_epi16, but the difference for that change is much less dramatic. Might as well stick to focusing on HSW. Change-Id: I0d6765bd67e0d024d658a61d19e6f6826b4d392c Reviewed-on: https://skia-review.googlesource.com/30420 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
Diffstat (limited to 'samplecode/SampleMegaStroke.cpp')
0 files changed, 0 insertions, 0 deletions