aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Revert of SkPx: new approach to fixed-point SIMD (patchset #9 id:160001 of ↵Gravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1317233005/ ) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.8-Clang-Arm7-Debug-Android/builds/4627 Original issue's description: > SkPx: new approach to fixed-point SIMD > > SkPx is like Sk4px, except each platform implementation of SkPx can declare > a different sweet spot of N pixels, with extra loads and stores to handle the > ragged edge of 0<n<N pixels. > > In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so > we can now use NEON's transposing loads and stores, and _none is just 1. > This makes operations involving alpha considerably more efficient on NEON, > as alpha is its own distinct 8x8 bit plane that's easy to toss around. > > This incorporates a few other improvements I've been wanting: > - no requirement that we're dealing with SkPMColor. SkColor works too. > - no anonymous namespace hack to differentiate implementations. > > Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. > The NEON code looks very similar to the old NEON code, as intended. > No .skp or GM diffs on my laptop. Don't expect any. > > I intend this to replace Sk4px. Plan after landing: > - port SkXfermode_opts.h > - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other > SkOpts code) > - delete all Sk4px-related code > - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) > leaving Sk2f, Sk4f (and Sk2s, Sk4s). > - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels > at a time. > > In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9 TBR=mtklein@google.com,msarett@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1336423002
* SkPx: new approach to fixed-point SIMDGravatar mtklein2015-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SkPx is like Sk4px, except each platform implementation of SkPx can declare a different sweet spot of N pixels, with extra loads and stores to handle the ragged edge of 0<n<N pixels. In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so we can now use NEON's transposing loads and stores, and _none is just 1. This makes operations involving alpha considerably more efficient on NEON, as alpha is its own distinct 8x8 bit plane that's easy to toss around. This incorporates a few other improvements I've been wanting: - no requirement that we're dealing with SkPMColor. SkColor works too. - no anonymous namespace hack to differentiate implementations. Codegen and perf look good on Clang/x86-64 and GCC/ARMv7. The NEON code looks very similar to the old NEON code, as intended. No .skp or GM diffs on my laptop. Don't expect any. I intend this to replace Sk4px. Plan after landing: - port SkXfermode_opts.h - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other SkOpts code) - delete all Sk4px-related code - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.) leaving Sk2f, Sk4f (and Sk2s, Sk4s). - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels at a time. In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels. BUG=skia:4117 Review URL: https://codereview.chromium.org/1317233005
* Revert of use new shuffle to speed up affine matrix mappts (patchset #3 ↵Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | id:40001 of https://codereview.chromium.org/1333983002/ ) Reason for revert: Unexpected perf impact, and a whole bunch of new images in gold (mostly invisibly different). Original issue's description: > use new shuffle to speed up affine matrix mappts > > sse: 25 -> 18 > neon: 95 -> 86 > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/e70afc9f48d00828ee6b707899a8ff542b0e8b98 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1335003002
* use new shuffle to speed up affine matrix mapptsGravatar mtklein2015-09-10
| | | | | | | | | sse: 25 -> 18 neon: 95 -> 86 BUG=skia: Review URL: https://codereview.chromium.org/1333983002
* SkNx_shuffleGravatar mtklein2015-09-10
| | | | | | | | | | | This allows us to express shuffles more directly in code while also giving us a convenient point to platform-specify particular shuffles for particular types. No specializations yet. Everyone just uses the (pretty good) default option. BUG=skia: Review URL: https://codereview.chromium.org/1301413006
* Port SkMatrix opts to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Port uses of SkLazyPtr to SkOncePtr.Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | | | This gives SkOncePtr a non-trivial destructor that uses std::default_delete by default. This is overrideable, as seen in SkColorTable. SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. BUG=skia: No public API changes. TBR=reed@google.com Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c Review URL: https://codereview.chromium.org/1322933005
* Revert of Port uses of SkLazyPtr to SkOncePtr. (patchset #7 id:110001 of ↵Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1322933005/ ) Reason for revert: Breaks Chrome roll. obj/skia/ext/skia_chrome.skia_memory_dump_provider.o does not have -I include/private on its include path, but transitively includes SkMessageBus.h. Original issue's description: > Port uses of SkLazyPtr to SkOncePtr. > > This gives SkOncePtr a non-trivial destructor that uses std::default_delete > by default. This is overrideable, as seen in SkColorTable. > > SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. > > BUG=skia: > > No public API changes. > TBR=reed@google.com > > Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c TBR=herb@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1334523002
* Port uses of SkLazyPtr to SkOncePtr.Gravatar mtklein2015-09-09
| | | | | | | | | | | | | | This gives SkOncePtr a non-trivial destructor that uses std::default_delete by default. This is overrideable, as seen in SkColorTable. SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP. BUG=skia: No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1322933005
* Restore old NEON blit_mask_d32_a8 methods.Gravatar mtklein2015-09-01
| | | | | | | | | | | | | | As you'll see from the BUG line, we have a strong indication that the new Sk4px methods regress some devices. This restores the old code back as literally as possible while still fitting in SkOpts framework. This is ideally temporary breathing room. We should get an early indication of if those bugs will improve by watching https://perf.skia.org/#4004 BUG=skia:4117,525844,519596,524149 Review URL: https://codereview.chromium.org/1312763009
* SkColorCubeFilter_opts: rounding is actually free here.Gravatar mtklein2015-09-01
| | | | | | | | (Sk4f(float) is statically initializable, unlike the old SkPMFlor(SkPMColor).) BUG=skia:4117 Review URL: https://codereview.chromium.org/1317593007
* Require Sk4f::toBytes() clampsGravatar mtklein2015-09-01
| | | | | | | | BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1312053004
* Clean up remaining users of SkPMFloatGravatar mtklein2015-08-31
| | | | | | | | | | | | This switches over SkXfermodes_opts.h and SkColorMatrixFilter to use Sk4f, and converts the SkPMFloat benches to Sk4f benches. No pixels should change here, and no code beyond the Sk4f_ benches should change speed. The benches are faster than the old versions. BUG=skia:4117 Review URL: https://codereview.chromium.org/1324743002
* Move float<->byte conversions into Sk4f.Gravatar mtklein2015-08-31
| | | | | | | | | | | | | | | | | | | This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary for SkColorCubeFilter_opts.h. Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs. Rough perf effect: SSSE3: 110 -> 93 (~15%) NEON: 465 -> 375 (~20%) This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f. I've kept it for now so I can convert sites one by one and think about how things that really want to keep PM color order will work. BUG=skia:4117 Review URL: https://codereview.chromium.org/1319413003
* Style Change: NULL->nullptrGravatar halcanary2015-08-27
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316233002 Review URL: https://codereview.chromium.org/1316233002
* SkColorCubeFilter_opts: start with a statically-initializable zero.Gravatar mtklein2015-08-27
| | | | | | | | | | | | | | | | SkPMFloat(0) and SkPMFloat(0,0,0,0) end up with the same value, but the first goes through math to get there. The second is a lot more transparent to the compiler, and should compile all the way down to just `xorps xmmN,xmmN` or even be optimized away. Didn't measure any additional benefit from hoisting the zero outside the loop and writing `SkPMFloat color = zero;`. Perf win is <2%. BUG=skia: Review URL: https://codereview.chromium.org/1314763007
* Style Change: SkNEW->new; SkDELETE->deleteGravatar halcanary2015-08-26
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316123003 Review URL: https://codereview.chromium.org/1316123003
* trifurcate blit_mask_d32_a8 into _black, _opaque, _general.Gravatar mtklein2015-08-26
| | | | | | | | | | | We used to split the NEON code this way, and just had one path for SSE. It's unclear to me testing locally if there's any major win here, but there's at least a small one. No pixel diffs or even any math changes, just folding constants through. BUG=skia:4117 Review URL: https://codereview.chromium.org/1304373006
* SkColorCubeFilter: require alpha == 0xFF.Gravatar mtklein2015-08-19
| | | | | | | | This is about a 12% improvement on my desktop, from 134 to 118ms on our bench. BUG=skia: Review URL: https://codereview.chromium.org/1295873004
* Bug fix: we're using SkPMFloat methods on SkColor.Gravatar mtklein2015-08-19
| | | | | | | | | Annoyingly our test bot that forces SkPMFloat_none is a Linux bot using BGRA SkPMColors, so we'd never notice the bug there. BUG=skia: Review URL: https://codereview.chromium.org/1296383006
* Patches on top of Radu's latest.Gravatar mtklein2015-08-19
| | | | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Committed: https://skia.googlesource.com/skia/+/2d141ba2df8f7506848aa9369f502944e837cd09 Review URL: https://codereview.chromium.org/1288323004
* Try again to put SkXfermode_opts in SK_OPTS_NSGravatar mtklein2015-08-18
| | | | | | | | Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++. BUG=skia:4117 Review URL: https://codereview.chromium.org/1296183004
* Update SkOpts namespaces.Gravatar mtklein2015-08-18
| | | | | | | | portable -> default, and everyone gets an sk_ prefix. BUG=skia:4117 Review URL: https://codereview.chromium.org/1299013003
* Patches on top of Radu's latest.Gravatar mtklein2015-08-18
| | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Review URL: https://codereview.chromium.org/1288323004
* Remove SkOpts_sse2.cpp.Gravatar mtklein2015-08-18
| | | | | | | | | | | | | | | | | | It's sort of pointless: all our clients that will have SSE2 at runtime have it unconditionally at compile time, so the functions in namespace portable will pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code. A couple of the procs we had in _sse2.cpp can benefit slightly when compiled with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups on platforms like Linux and Windows that have a baseline of SSE2. Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op. Renaming namespace portable to something clearer is TBD. BUG=skia:4117 Review URL: https://codereview.chromium.org/1294213002
* Normalize SkXfermode_opts.h argument order as d,s[,aa].Gravatar mtklein2015-08-13
| | | | | | | | | | | | | | | | | | At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa] so we ended up having to write weird little lambda shims to match impedance. There's no reason for these to disagree, and d,s[,aa] is the One True Order (because no matter what you're doing in graphics, there's always a dst). Should be no perf or image diff, though I'm suspicious it might help MSVC code generation. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f CQ_EXTRA_TRYBOTS=client.skia:Test-Win8-MSVC-ShuttleB-CPU-AVX2-x86-Release-Trybot Review URL: https://codereview.chromium.org/1289903002
* Revert of Normalize SkXfermode_opts.h argument order as d,s[,aa]. (patchset ↵Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #1 id:1 of https://codereview.chromium.org/1289903002/ ) Reason for revert: ? Original issue's description: > Normalize SkXfermode_opts.h argument order as d,s[,aa]. > > At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa] > so we ended up having to write weird little lambda shims to match impedance. > > There's no reason for these to disagree, and d,s[,aa] is the One True Order > (because no matter what you're doing in graphics, there's always a dst). > > Should be no perf or image diff, though I'm suspicious it might help MSVC code generation. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284363002
* Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 ↵Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002
* Normalize SkXfermode_opts.h argument order as d,s[,aa].Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa] so we ended up having to write weird little lambda shims to match impedance. There's no reason for these to disagree, and d,s[,aa] is the One True Order (because no matter what you're doing in graphics, there's always a dst). Should be no perf or image diff, though I'm suspicious it might help MSVC code generation. BUG=skia:4117 Review URL: https://codereview.chromium.org/1289903002
* Refactor to put SkXfermode_opts inside SK_OPTS_NS.Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004
* Sk4px blit mask.Gravatar mtklein2015-08-10
| | | | | | | | | | | | | | | | Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups). I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results. NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range. The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version. Somewhat fantastically, I see no pixel diffs on GMs or SKPs. I will be following up with more CLs for the other procs called by SkBlitMask. BUG=skia: Review URL: https://codereview.chromium.org/1278253003
* The compiler can generate smulbb perfectly well nowadays.Gravatar mtklein2015-08-07
| | | | | | BUG=skia:4117 Review URL: https://codereview.chromium.org/1273203002
* Purge non-NEON ARM code.Gravatar mtklein2015-08-06
| | | | | | | | As I begin to wade in here, it's nice to remove as much code as possible. BUG=skia:4117 Review URL: https://codereview.chromium.org/1277953002
* Port SkTextureCompression opts to SkOptsGravatar mtklein2015-08-06
| | | | | | | | | | | | | | | | Pretty vanilla translation. I cleaned up who calls whom a little. Used to be utils -> opts -> utils, now it's just utils -> opts. I may follow up with a pass over the NEON code for readability and to clean up dead code. This turns on NEON A8->R11EAC conversion for ARMv8. Unit tests which now hit the NEON code still pass. I can't find any related bench. BUG=skia:4117 Review URL: https://codereview.chromium.org/1273103002
* Detect MIPS DSP and DSPR2 programattically.Gravatar mtklein2015-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | Procedure: $ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -dM -E - < /dev/null | sort > vanilla.mips $ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -mdsp -dM -E - < /dev/null | sort > dsp.mips $ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -mdspr2 -dM -E - < /dev/null | sort > dspr2.mips $ diff vanilla.mips dsp.mips 239a240,241 > #define __mips_dsp 1 > #define __mips_dsp_rev 1 $ diff vanilla.mips dspr2.mips 239a240,242 > #define __mips_dsp 1 > #define __mips_dsp_rev 2 > #define __mips_dspr2 1 So, defined(__mips_dsp) -> SK_MIPS_HAS_DSP, defined(__mips_dspr2) -> SK_MIPS_HAS_DSPR2. BUG=skia: Review URL: https://codereview.chromium.org/1274873002
* Port morphology to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | Nothing too fancy. Direction enums become enum classes so they don't get all confused. An alternative is to create one single Direction enum that both blur and morphology opts use. BUG=skia:4117 Review URL: https://codereview.chromium.org/1267343004
* Reorganize to keep similar code together.Gravatar Mike Klein2015-08-04
| | | | | | | | | This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change. BUG=skia:4117 R=djsollen@google.com Review URL: https://codereview.chromium.org/1264423002 .
* Remove dead code.Gravatar Mike Klein2015-08-04
| | | | | | | BUG=skia:4117 R=mtklein@google.com Review URL: https://codereview.chromium.org/1262213005 .
* Port SkBlurImage opts to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | +268 -535 lines I also rearranged the code a little bit to encapsulate itself better, mostly replacing static helper functions with lambdas. This also let me merge the SSE2 and SSE4.1 code paths. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264103004
* Port SkXfermode opts to SkOpts.hGravatar mtklein2015-07-31
| | | | | | | | | | | | | Renames Sk4pxXfermode.h to SkXfermode_opts.h, and refactors it a tiny bit internally. This moves xfermode optimization from being "compile-time everywhere but NEON" to simply "runtime everywhere". I don't anticipate any effect on perf or correctness. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264543006
* Port SkUtils opts to SkOpts.Gravatar mtklein2015-07-31
| | | | | | | | | | | | | | | | With this new arrangement, the benefits of inlining sk_memset16/32 have changed. On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower. On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster. On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster. We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization. So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8. BUG=skia:4117 Review URL: https://codereview.chromium.org/1270573002
* Runtime CPU detection for rsqrt().Gravatar mtklein2015-07-30
| | | | | | | | | | | | | | | This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time. These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback. (When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.) BUG=skia:4117,skia:4114 No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1264893002
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-30
| | | | | | | | | | This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 Review URL: https://codereview.chromium.org/1255193002
* Revert of Optimize RGB16 blitH functions with NEON for ARM platform. ↵Gravatar mtklein2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #2 id:20001 of https://codereview.chromium.org/1229673008/) Reason for revert: This doesn't draw correctly, e.g. our GM test named dashcubics. Good: https://gold.skia.org/img/images/0f7e8e226379afbad8a700e0a80fd8f1.png Bad: https://gold.skia.org/img/images/56ce15fc67436065a3db4b8ee31f13ae.png Original issue's description: > Optimize RGB16 blitH functions with NEON for ARM platform. > > Here are some performance resultsi on Nexus 9: > SkRGB16BlitterBlitH_neon: > +--------+-----------+ > |height | C/NEON | > +--------+-----------+ > |1 | 0.888531 | > +--------+-----------+ > |8 | 1.231800 | > +--------+-----------+ > |18 | 1.073327 | > +--------+-----------+ > |32 | 1.136991 | > +--------+-----------+ > |76 | 1.174638 | > +--------+-----------+ > |85 | 1.188551 | > +--------+-----------+ > |120 | 1.180261 | > +--------+-----------+ > |128 | 1.183726 | > +--------+-----------+ > |512 | 1.220806 | > +--------+-----------+ > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/6c72d5740231f47c664a8e765a8df05cd124c88c TBR=djsollen@google.com,caryclark@google.com,reed@google.com,bero@linaro.com,yang.zhang@linaro.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1268513003
* Optimize RGB16 blitH functions with NEON for ARM platform.Gravatar yang.zhang2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are some performance resultsi on Nexus 9: SkRGB16BlitterBlitH_neon: +--------+-----------+ |height | C/NEON | +--------+-----------+ |1 | 0.888531 | +--------+-----------+ |8 | 1.231800 | +--------+-----------+ |18 | 1.073327 | +--------+-----------+ |32 | 1.136991 | +--------+-----------+ |76 | 1.174638 | +--------+-----------+ |85 | 1.188551 | +--------+-----------+ |120 | 1.180261 | +--------+-----------+ |128 | 1.183726 | +--------+-----------+ |512 | 1.220806 | +--------+-----------+ BUG=skia: Review URL: https://codereview.chromium.org/1229673008
* Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of ↵Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1255193002/) Reason for revert: Chromium doesn't call SkGraphics::Init(). This setup won't work. Original issue's description: > Lay groundwork for SkOpts. > > This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 TBR=djsollen@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1261743002
* Remove sk_memcpy32Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | It's only implemented on x86, where the exisiting benchmark says memcpy() is faster for all cases: Timer overhead: 24ns curr/maxrss loops min median mean max stddev samples config bench 10/10 MB 1 35.9µs 36.2µs 36.2µs 36.6µs 1% ▁▂▄▅▅▃█▄▄▅ nonrendering sk_memcpy32_100000 10/10 MB 13 2.27µs 2.28µs 2.28µs 2.29µs 0% █▄▃▅▃▁▃▅▁▄ nonrendering sk_memcpy32_10000 11/11 MB 677 91.6ns 95.9ns 94.5ns 99.4ns 3% ▅▅▅▅▅█▁▁▁▁ nonrendering sk_memcpy32_1000 11/11 MB 1171 20ns 20.9ns 21.3ns 23.4ns 6% ▁▁▇▃▃▃█▇▃▃ nonrendering sk_memcpy32_100 11/11 MB 1952 14ns 14ns 14.3ns 15.2ns 3% ▁▁██▁▁▁▁▁▁ nonrendering sk_memcpy32_10 11/11 MB 5 33.6µs 33.7µs 34.1µs 35.2µs 2% ▆▇█▁▁▁▁▁▁▁ nonrendering memcpy32_memcpy_100000 11/11 MB 18 2.12µs 2.22µs 2.24µs 2.39µs 5% ▂█▄▇█▄▇▁▁▁ nonrendering memcpy32_memcpy_10000 11/11 MB 1112 87.3ns 87.3ns 89.1ns 93.7ns 3% ▄██▄▁▁▁▁▁▁ nonrendering memcpy32_memcpy_1000 11/11 MB 2124 12.8ns 13.3ns 13.5ns 14.8ns 6% ▁▁▁█▃▃█▇▃▃ nonrendering memcpy32_memcpy_100 11/11 MB 3077 9ns 9.41ns 9.52ns 10.2ns 4% ▃█▁█▃▃▃▃▃▃ nonrendering memcpy32_memcpy_10 (Why? One fewer thing to port to SkOpts.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1256763003
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-27
| | | | | | | | This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Review URL: https://codereview.chromium.org/1255193002
* NEON has a ternary instruction.Gravatar mtklein2015-07-27
| | | | | | | | Nothing seems to run any faster or slower, but it is terser. BUG=skia: Review URL: https://codereview.chromium.org/1255913004