aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* archive skpx... currently dead codeGravatar mtklein2015-12-11
| | | | | | | BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1521623003
* better NEON div255Gravatar mtklein2015-12-07
| | | | | | | | | | | | | | | | | | | | | | | We were doing (x+127)/255 = ((x+128) + (x+128)>>8)>>8 in three instructions: 1) x += 128 2) shift x right 8 bits 3) add x and x>>8 together, then shift right more 8 bits Now do it as two instructions: 1) shift (x+128) right 8 bits 2) add x and (x+128)>>8 and 128 all together, then shift right 8 more bits On ARM this will be a 5-10% speedup for SrcATop, DstATop, Xor, Multiply, Difference, HardLight, Darken, and Lighten xfermodes. When we have a mask (e.g. text), *all* xfermodes except Plus will get a similar boost. This should mean now that (a*b).div255() is the same speed as a.approxMulDiv255(b) on both x86 and ARM, and of course it's perfect instead of approximate. So we should eliminate approxMulDiv255(), but I'll leave it to another CL, as it'll need Blink rebaselines. This CL should not change GMs or Blink. https://gold.skia.org/search2?issue=1502843002&unt=true&query=source_type%3Dgm&master=false BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot,Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot Review URL: https://codereview.chromium.org/1502843002
* Don't use the Sk4f gradient impl without SIMDGravatar fmalita2015-12-03
| | | | | | | | | | | Also remove the SK_SUPPORT_LEGACY_LINEAR_GRADIENT_TABLE guard since it is no longer used in Chromium. BUG=chromium:563492 R=reed@google.com,mtklein@google.com CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1489233005
* Add Sk4f::ToBytes(uint8_t[16], Sk4f, Sk4f, Sk4f, Sk4f)Gravatar mtklein2015-12-01
| | | | | | | | | | | | This is a big speedup for float -> byte. E.g. gradient_linear_clamp_3color: x86-64 147µs -> 103µs (Broadwell MBP) arm64 2.03ms -> 648µs (Galaxy S6) armv7 1.12ms -> 489µs (Galaxy S6, same device!) BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot Review URL: https://codereview.chromium.org/1483953002
* Add SkNx_cast().Gravatar mtklein2015-11-20
| | | | | | | | | | | | | | | | | | | | SkNx_cast() can cast between any of our vector types, provided they have the same number of elements. Any types should work with the default implementation, and we can drop in specializations as needed, like the SSE and NEON Sk4f -> Sk4i I included here as an example. To make this work, I made some internal name changes: SkNi<N,T> -> SkNx<N, T> SkNf<N> -> SkNx<N, float> User aliases (Sk4f, Sk16b, etc.) stay the same. We can land this first (it's PS1) if that makes things easier. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1464623002
* Revert float xfermodes back to Sk4f (from Sk8f).Gravatar mtklein2015-11-19
| | | | | | | | | | | | | | | Generally this was a performance win, even on devices without AVX due to unrolling, but on ARM+NEON it looks like that unrolling hurt a bit. while (...) { blend a pixel } ~~~> while (...) { blend two pixels } if (n % 2) { blend last pixel } BUG=chromium:555278 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1465483002
* Revert SkBlitMask_opts.h back to hand-coded NEON.Gravatar mtklein2015-11-18
| | | | | | | | | SkPx has triggered a bunch of small (2-9%) regressions on NEON devices. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1462783002
* div255(x) as ((x+128)*257)>>16 with SSEGravatar mtklein2015-11-17
| | | | | | | | | | | | | | | | | _mm_mulhi_epu16 makes the (...*257)>>16 part simple. This seems to speed up every transfermode that uses div255(), in the 7-25% range. It even appears to obviate the need for approxMulDiv255() on SSE. I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now. Should be no pixels change: https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1452903004
* trim some fat from SSE2 fixed point alpha codeGravatar mtklein2015-11-17
| | | | | | | | | | | | - extract alpha from a pixel: 5 1-cycle ops to 4 1-cycle ops - load alphas: drop 4 unnecessary ops Should be no pixel diffs. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1447273004
* float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX.Gravatar mtklein2015-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1432903002
* SkPx: use namespaces as namespacesGravatar mtklein2015-11-09
| | | | | | | | | | This is a pure refactor. No behavior change. I'm just getting tired of typing out the names... BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1436513002
* prune unused SkNx featuresGravatar mtklein2015-11-09
| | | | | | | | | | | | | | | - remove float -> int conversion, keeping float -> byte - remove support for doubles I was thinking of specializing Sk8f for AVX. This will help keep the complexity down. This may cause minor diffs in radial gradients: toBytes() rounds where castTrunc() truncated. But I don't see any diffs in Gold. https://gold.skia.org/search2?issue=1411563008&unt=true&query=source_type%3Dgm&master=false BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1411563008
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot Committed: https://skia.googlesource.com/skia/+/a7627dc5cc2bf5d9a95d883d20c40d477ecadadf Review URL: https://codereview.chromium.org/1317233005
* Revert of SkPx: new approach to fixed-point SIMD (patchset #12 id:220001 of ↵Gravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1317233005/ ) Reason for revert: master-skia unhappy: https://android-build.storage.googleapis.com/builds/git_master-skia-linux-volantis-userdebug/2404853/e6c439e806fb0bd0f872a3d7a5cf0637d4ad11bfaa89e9bc18b651dc65f0a36b/logs/build_error.log?GoogleAccessId=701025073339-mqn0q2nvir9iurm6q5d00tdv7blbgvjr%40developer.gserviceaccount.com&Signature=WOqQO7xHkv83SmC4h5tNUIp%2BREaYULqK11hNTWlhj1XXo0NAOQd7GNSIHl775uRRZpBw2LkHeb2Ups3LsgRPrldqymposFtDa%2BUEW0Jv2NWAr%2F1Cqt6lwWsfknvJLN9NiEGfpCCye3Q%2FEYx9bU1ozMBG6h2DRHJUMRS%2FjstkJg0%3D&Expires=1446838937 Original issue's description: > SkPx: new approach to fixed-point SIMD > > SkPx is like Sk4px, except each platform implementation of SkPx can declare > a different sweet spot of N pixels, with extra loads and stores to handle the > ragged edge of 0<n<N pixels. > > In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so > we can now use NEON's transposing loads and stores, and _none is just 1. > This makes operations involving alpha considerably more efficient on NEON, > as alpha is its own distinct 8x8 bit plane that's easy to toss around. > > This incorporates a few other improvements I've been wanting: > - no requirement that we're dealing with SkPMColor. SkColor works too. > - no anonymous namespace hack to differentiate implementations. > > Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. > The NEON code looks very similar to the old NEON code, as intended. > No .skp or GM diffs on my laptop. Don't expect any. > > I intend this to replace Sk4px. Plan after landing: > - port SkXfermode_opts.h > - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other > SkOpts code) > - delete all Sk4px-related code > - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) > leaving Sk2f, Sk4f (and Sk2s, Sk4s). > - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels > at a time. > > In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 > > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot > > Committed: https://skia.googlesource.com/skia/+/a7627dc5cc2bf5d9a95d883d20c40d477ecadadf TBR=msarett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1409843005
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1317233005
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 Review URL: https://codereview.chromium.org/1415653003
* Revert of Make SkBlurImageFilter capable of cropping during blur (patchset ↵Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #16 id:400001 of https://codereview.chromium.org/1415653003/ ) Reason for revert: ASAN failures (see https://codereview.chromium.org/1415653003/) Original issue's description: > Make SkBlurImageFilter capable of cropping during blur (raster path) > > SkBlurImageFilter can currently only process a source image > which is larger than or equal to the destination rect. If > the source image (or crop rect) is smaller, it is padded > out to dest size with transparent black via the 6-param > version of applyCropRect(). > > Fixing this requires modifying all the flavours of RGBA > box_blur() to accept a src crop rect. > > BUG=skia:4502, skia:4526 > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 TBR=mtklein@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4502, skia:4526 Review URL: https://codereview.chromium.org/1428053002
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1415653003
* SkBlurImageFilter_opts: optimize NEON box_blur_double in separate loops.Gravatar senorblanco2015-10-28
| | | | | | | | | | | | | | Stop leaning so hard on the branch predictor, and pull the conditionals out of the loops for box_blur_double() (NEON). This is conceptually the same change as https://codereview.chromium.org/1426583004/ for the NEON double-pixel loop. R=mtklein@google.com BUG=skia:4526 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1412793009
* SkBlurImageFilter_opt.h: break conditions into separate loops.Gravatar senorblanco2015-10-28
| | | | | | | | | | | This gives ~15% improvement on blur_image on Linux Z620, and should allow me to implement cropping without incurring a perf hit. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1426583004
* move reinterpret_cast into SK_PREFETCHGravatar mtklein2015-10-28
| | | | | | | | | no public API changes TBR=reed@google.com BUG=skia: Review URL: https://codereview.chromium.org/1419573011
* Refactor SkBlurImageFilter_Opts.h.Gravatar senorblanco2015-10-27
| | | | | | | | | | | | Refactor box_blur() into a single driver function which SSE*, NEON and generic code paths can use. I've used macros to do this in order to keep debug performance reasonable, but it's fairly ugly. I'm open to other suggestions. BUG=skia: CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1408003007
* Revert of SkPx: new approach to fixed-point SIMD (patchset #9 id:160001 of ↵Gravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1317233005/ ) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.8-Clang-Arm7-Debug-Android/builds/4627 Original issue's description: > SkPx: new approach to fixed-point SIMD > > SkPx is like Sk4px, except each platform implementation of SkPx can declare > a different sweet spot of N pixels, with extra loads and stores to handle the > ragged edge of 0<n<N pixels. > > In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so > we can now use NEON's transposing loads and stores, and _none is just 1. > This makes operations involving alpha considerably more efficient on NEON, > as alpha is its own distinct 8x8 bit plane that's easy to toss around. > > This incorporates a few other improvements I've been wanting: > - no requirement that we're dealing with SkPMColor. SkColor works too. > - no anonymous namespace hack to differentiate implementations. > > Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. > The NEON code looks very similar to the old NEON code, as intended. > No .skp or GM diffs on my laptop. Don't expect any. > > I intend this to replace Sk4px. Plan after landing: > - port SkXfermode_opts.h > - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other > SkOpts code) > - delete all Sk4px-related code > - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) > leaving Sk2f, Sk4f (and Sk2s, Sk4s). > - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels > at a time. > > In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 TBR=mtklein@google.com,msarett@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1336423002
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Review URL: https://codereview.chromium.org/1317233005
* Revert of use new shuffle to speed up affine matrix mappts (patchset #3 ↵Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | id:40001 of https://codereview.chromium.org/1333983002/ ) Reason for revert: Unexpected perf impact, and a whole bunch of new images in gold (mostly invisibly different). Original issue's description: > use new shuffle to speed up affine matrix mappts > > sse: 25 -> 18 > neon: 95 -> 86 > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/e70afc9f48d00828ee6b707899a8ff542b0e8b98 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1335003002
* use new shuffle to speed up affine matrix mapptsGravatar mtklein2015-09-10
| | | | | | | | | sse: 25 -> 18 neon: 95 -> 86 BUG=skia: Review URL: https://codereview.chromium.org/1333983002
* SkNx_shuffleGravatar mtklein2015-09-10
| | | | | | | | | | | This allows us to express shuffles more directly in code while also giving us a convenient point to platform-specify particular shuffles for particular types. No specializations yet. Everyone just uses the (pretty good) default option. BUG=skia: Review URL: https://codereview.chromium.org/1301413006
* Port SkMatrix opts to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Port uses of SkLazyPtr to SkOncePtr.Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | | | This gives SkOncePtr a non-trivial destructor that uses std::default_delete by default. This is overrideable, as seen in SkColorTable. SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. BUG=skia: No public API changes. TBR=reed@google.com Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c Review URL: https://codereview.chromium.org/1322933005
* Revert of Port uses of SkLazyPtr to SkOncePtr. (patchset #7 id:110001 of ↵Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1322933005/ ) Reason for revert: Breaks Chrome roll. obj/skia/ext/skia_chrome.skia_memory_dump_provider.o does not have -I include/private on its include path, but transitively includes SkMessageBus.h. Original issue's description: > Port uses of SkLazyPtr to SkOncePtr. > > This gives SkOncePtr a non-trivial destructor that uses std::default_delete > by default. This is overrideable, as seen in SkColorTable. > > SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. > > BUG=skia: > > No public API changes. > TBR=reed@google.com > > Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c TBR=herb@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1334523002
* Port uses of SkLazyPtr to SkOncePtr.Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | This gives SkOncePtr a non-trivial destructor that uses std::default_delete by default. This is overrideable, as seen in SkColorTable. SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. BUG=skia: No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1322933005
* Restore old NEON blit_mask_d32_a8 methods.Gravatar mtklein2015-09-01
| | | | | | | | | | | | | | As you'll see from the BUG line, we have a strong indication that the new Sk4px methods regress some devices. This restores the old code back as literally as possible while still fitting in SkOpts framework. This is ideally temporary breathing room. We should get an early indication of if those bugs will improve by watching https://perf.skia.org/#4004 BUG=skia:4117,525844,519596,524149 Review URL: https://codereview.chromium.org/1312763009
* SkColorCubeFilter_opts: rounding is actually free here.Gravatar mtklein2015-09-01
| | | | | | | | (Sk4f(float) is statically initializable, unlike the old SkPMFlor(SkPMColor).) BUG=skia:4117 Review URL: https://codereview.chromium.org/1317593007
* Require Sk4f::toBytes() clampsGravatar mtklein2015-09-01
| | | | | | | | BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1312053004
* Clean up remaining users of SkPMFloatGravatar mtklein2015-08-31
| | | | | | | | | | | | This switches over SkXfermodes_opts.h and SkColorMatrixFilter to use Sk4f, and converts the SkPMFloat benches to Sk4f benches. No pixels should change here, and no code beyond the Sk4f_ benches should change speed. The benches are faster than the old versions. BUG=skia:4117 Review URL: https://codereview.chromium.org/1324743002
* Move float<->byte conversions into Sk4f.Gravatar mtklein2015-08-31
| | | | | | | | | | | | | | | | | | | This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary for SkColorCubeFilter_opts.h. Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs. Rough perf effect: SSSE3: 110 -> 93 (~15%) NEON: 465 -> 375 (~20%) This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f. I've kept it for now so I can convert sites one by one and think about how things that really want to keep PM color order will work. BUG=skia:4117 Review URL: https://codereview.chromium.org/1319413003
* Style Change: NULL->nullptrGravatar halcanary2015-08-27
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316233002 Review URL: https://codereview.chromium.org/1316233002
* SkColorCubeFilter_opts: start with a statically-initializable zero.Gravatar mtklein2015-08-27
| | | | | | | | | | | | | | | | SkPMFloat(0) and SkPMFloat(0,0,0,0) end up with the same value, but the first goes through math to get there. The second is a lot more transparent to the compiler, and should compile all the way down to just `xorps xmmN,xmmN` or even be optimized away. Didn't measure any additional benefit from hoisting the zero outside the loop and writing `SkPMFloat color = zero;`. Perf win is <2%. BUG=skia: Review URL: https://codereview.chromium.org/1314763007
* Style Change: SkNEW->new; SkDELETE->deleteGravatar halcanary2015-08-26
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316123003 Review URL: https://codereview.chromium.org/1316123003
* trifurcate blit_mask_d32_a8 into _black, _opaque, _general.Gravatar mtklein2015-08-26
| | | | | | | | | | | We used to split the NEON code this way, and just had one path for SSE. It's unclear to me testing locally if there's any major win here, but there's at least a small one. No pixel diffs or even any math changes, just folding constants through. BUG=skia:4117 Review URL: https://codereview.chromium.org/1304373006
* SkColorCubeFilter: require alpha == 0xFF.Gravatar mtklein2015-08-19
| | | | | | | | This is about a 12% improvement on my desktop, from 134 to 118ms on our bench. BUG=skia: Review URL: https://codereview.chromium.org/1295873004
* Bug fix: we're using SkPMFloat methods on SkColor.Gravatar mtklein2015-08-19
| | | | | | | | | Annoyingly our test bot that forces SkPMFloat_none is a Linux bot using BGRA SkPMColors, so we'd never notice the bug there. BUG=skia: Review URL: https://codereview.chromium.org/1296383006
* Patches on top of Radu's latest.Gravatar mtklein2015-08-19
| | | | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Committed: https://skia.googlesource.com/skia/+/2d141ba2df8f7506848aa9369f502944e837cd09 Review URL: https://codereview.chromium.org/1288323004
* Try again to put SkXfermode_opts in SK_OPTS_NSGravatar mtklein2015-08-18
| | | | | | | | Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++. BUG=skia:4117 Review URL: https://codereview.chromium.org/1296183004
* Update SkOpts namespaces.Gravatar mtklein2015-08-18
| | | | | | | | portable -> default, and everyone gets an sk_ prefix. BUG=skia:4117 Review URL: https://codereview.chromium.org/1299013003
* Patches on top of Radu's latest.Gravatar mtklein2015-08-18
| | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Review URL: https://codereview.chromium.org/1288323004
* Remove SkOpts_sse2.cpp.Gravatar mtklein2015-08-18
| | | | | | | | | | | | | | | | | | It's sort of pointless: all our clients that will have SSE2 at runtime have it unconditionally at compile time, so the functions in namespace portable will pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code. A couple of the procs we had in _sse2.cpp can benefit slightly when compiled with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups on platforms like Linux and Windows that have a baseline of SSE2. Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op. Renaming namespace portable to something clearer is TBD. BUG=skia:4117 Review URL: https://codereview.chromium.org/1294213002
* Normalize SkXfermode_opts.h argument order as d,s[,aa].Gravatar mtklein2015-08-13
| | | | | | | | | | | | | | | | | | At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa] so we ended up having to write weird little lambda shims to match impedance. There's no reason for these to disagree, and d,s[,aa] is the One True Order (because no matter what you're doing in graphics, there's always a dst). Should be no perf or image diff, though I'm suspicious it might help MSVC code generation. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f CQ_EXTRA_TRYBOTS=client.skia:Test-Win8-MSVC-ShuttleB-CPU-AVX2-x86-Release-Trybot Review URL: https://codereview.chromium.org/1289903002
* Revert of Normalize SkXfermode_opts.h argument order as d,s[,aa]. (patchset ↵Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #1 id:1 of https://codereview.chromium.org/1289903002/ ) Reason for revert: ? Original issue's description: > Normalize SkXfermode_opts.h argument order as d,s[,aa]. > > At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa] > so we ended up having to write weird little lambda shims to match impedance. > > There's no reason for these to disagree, and d,s[,aa] is the One True Order > (because no matter what you're doing in graphics, there's always a dst). > > Should be no perf or image diff, though I'm suspicious it might help MSVC code generation. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284363002