aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkBlitMask_opts.h
Commit message (Collapse)AuthorAge
* Revert SkBlitMask_opts.h back to hand-coded NEON.Gravatar mtklein2015-11-18
| | | | | | | | | SkPx has triggered a bunch of small (2-9%) regressions on NEON devices. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1462783002
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot Committed: https://skia.googlesource.com/skia/+/a7627dc5cc2bf5d9a95d883d20c40d477ecadadf Review URL: https://codereview.chromium.org/1317233005
* Revert of SkPx: new approach to fixed-point SIMD (patchset #12 id:220001 of ↵Gravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1317233005/ ) Reason for revert: master-skia unhappy: https://android-build.storage.googleapis.com/builds/git_master-skia-linux-volantis-userdebug/2404853/e6c439e806fb0bd0f872a3d7a5cf0637d4ad11bfaa89e9bc18b651dc65f0a36b/logs/build_error.log?GoogleAccessId=701025073339-mqn0q2nvir9iurm6q5d00tdv7blbgvjr%40developer.gserviceaccount.com&Signature=WOqQO7xHkv83SmC4h5tNUIp%2BREaYULqK11hNTWlhj1XXo0NAOQd7GNSIHl775uRRZpBw2LkHeb2Ups3LsgRPrldqymposFtDa%2BUEW0Jv2NWAr%2F1Cqt6lwWsfknvJLN9NiEGfpCCye3Q%2FEYx9bU1ozMBG6h2DRHJUMRS%2FjstkJg0%3D&Expires=1446838937 Original issue's description: > SkPx: new approach to fixed-point SIMD > > SkPx is like Sk4px, except each platform implementation of SkPx can declare > a different sweet spot of N pixels, with extra loads and stores to handle the > ragged edge of 0<n<N pixels. > > In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so > we can now use NEON's transposing loads and stores, and _none is just 1. > This makes operations involving alpha considerably more efficient on NEON, > as alpha is its own distinct 8x8 bit plane that's easy to toss around. > > This incorporates a few other improvements I've been wanting: > - no requirement that we're dealing with SkPMColor. SkColor works too. > - no anonymous namespace hack to differentiate implementations. > > Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. > The NEON code looks very similar to the old NEON code, as intended. > No .skp or GM diffs on my laptop. Don't expect any. > > I intend this to replace Sk4px. Plan after landing: > - port SkXfermode_opts.h > - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other > SkOpts code) > - delete all Sk4px-related code > - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) > leaving Sk2f, Sk4f (and Sk2s, Sk4s). > - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels > at a time. > > In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 > > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot > > Committed: https://skia.googlesource.com/skia/+/a7627dc5cc2bf5d9a95d883d20c40d477ecadadf TBR=msarett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1409843005
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1317233005
* Revert of SkPx: new approach to fixed-point SIMD (patchset #9 id:160001 of ↵Gravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1317233005/ ) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.8-Clang-Arm7-Debug-Android/builds/4627 Original issue's description: > SkPx: new approach to fixed-point SIMD > > SkPx is like Sk4px, except each platform implementation of SkPx can declare > a different sweet spot of N pixels, with extra loads and stores to handle the > ragged edge of 0<n<N pixels. > > In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so > we can now use NEON's transposing loads and stores, and _none is just 1. > This makes operations involving alpha considerably more efficient on NEON, > as alpha is its own distinct 8x8 bit plane that's easy to toss around. > > This incorporates a few other improvements I've been wanting: > - no requirement that we're dealing with SkPMColor. SkColor works too. > - no anonymous namespace hack to differentiate implementations. > > Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. > The NEON code looks very similar to the old NEON code, as intended. > No .skp or GM diffs on my laptop. Don't expect any. > > I intend this to replace Sk4px. Plan after landing: > - port SkXfermode_opts.h > - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other > SkOpts code) > - delete all Sk4px-related code > - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) > leaving Sk2f, Sk4f (and Sk2s, Sk4s). > - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels > at a time. > > In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 TBR=mtklein@google.com,msarett@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1336423002
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Review URL: https://codereview.chromium.org/1317233005
* Restore old NEON blit_mask_d32_a8 methods.Gravatar mtklein2015-09-01
| | | | | | | | | | | | | | As you'll see from the BUG line, we have a strong indication that the new Sk4px methods regress some devices. This restores the old code back as literally as possible while still fitting in SkOpts framework. This is ideally temporary breathing room. We should get an early indication of if those bugs will improve by watching https://perf.skia.org/#4004 BUG=skia:4117,525844,519596,524149 Review URL: https://codereview.chromium.org/1312763009
* trifurcate blit_mask_d32_a8 into _black, _opaque, _general.Gravatar mtklein2015-08-26
| | | | | | | | | | | We used to split the NEON code this way, and just had one path for SSE. It's unclear to me testing locally if there's any major win here, but there's at least a small one. No pixel diffs or even any math changes, just folding constants through. BUG=skia:4117 Review URL: https://codereview.chromium.org/1304373006
* Sk4px blit mask.Gravatar mtklein2015-08-10
Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups). I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results. NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range. The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version. Somewhat fantastically, I see no pixel diffs on GMs or SKPs. I will be following up with more CLs for the other procs called by SkBlitMask. BUG=skia: Review URL: https://codereview.chromium.org/1278253003