aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkOpts.cpp
Commit message (Collapse)AuthorAge
...
* AVX 2 SrcOver blits: color32, blitmask.Gravatar mtklein2016-01-25
| | | | | | | | | | | | | | | | | | | | As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot Review URL: https://codereview.chromium.org/1532613002
* Revert of de-proc sk_float_rsqrt (patchset #3 id:40001 of ↵Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1616013003/ ) Reason for revert: This is somehow blocking the Google3 roll in ways neither Ben nor I understand. Precautionary revert... will try again Monday. Original issue's description: > de-proc sk_float_rsqrt > > This is the first of many little baby steps to have us stop runtime-detecting NEON. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d TBR=reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1629503002
* Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzlerGravatar msarett2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P xxx_xxxa 0.32x xxx_swaprb_xxxa 0.31x Swizzle Bench Runtime Nexus 9 xxx_xxxa 1.11x xxx_swaprb_xxxa 1.14x (This is a slow down.) Swizzle Bench Runtime Nexus 5 xxx_xxxa 0.12x xxx_swaprb 0.12x RGB PNG Decode Runtime Nexus 6P 0.94x Nexus 9 0.98x I don't know how to explain the fact that the Swizzle Bench was slower on Nexus 9, but the decode times got faster. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618003002
* de-proc sk_float_rsqrtGravatar mtklein2016-01-22
| | | | | | | | | | This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1616013003
* Refactor swizzle names and types.Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | - Plant a flag to say "pretend all the inputs are RGBA". This is how libpng thinks. This is the opposite of what the implementation had been doing, so I've rearranged everything to reflect the new orientation. - Rewrite the names to be less mysterious looking. No more Xs. - Make the src type uniformly const void*, to allow for 888 (RGB) srcs. This should be performance and pixel neutral. (Please revert if it's not.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1626463002
* Optimized premultiplying swizzles for NEONGravatar msarett2016-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves decode performance for RGBA encoded PNGs. Swizzle Time on Nexus 9 (with clang): SwapPremul 0.44x Premul 0.44x Decode Time On Nexus 9 (with clang): ZeroInit Decodes 0.85x Regular Decodes 0.86x Swizzle Time on Nexus 6P (with clang) SwapPremul 0.14x Premul 0.14x Decode Time On Nexus 6P (with clang): ZeroInit Decodes 0.93x Regular Decodes 0.95x Notes: ZeroInit means memory is zero initialized, and we do not write to memory for large sections of zero pixels (memory use opt for Android). A profile on Nexus 9 shows that the premultiplication step of PNG decoding is now ~5% of decode time (down from ~20%). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1577703006
* Set up some hooks for premul/swizzzle opts.Gravatar mtklein2016-01-07
| | | | | | | | | | | | | You can call these as SkOpts::premul_xxxa, SkOpts::swaprb_xxxa, etc. For now, I just backed the function pointers with some (untested) portable code, which may autovectorize. We can override with optimized versions in Init_ssse3() (in SkOpts_ssse3.cpp), Init_neon() (SkOpts_neon.cpp), etc. BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1569013002 Review URL: https://codereview.chromium.org/1569013002
* Fix -Wunused-function warnings in chrome/ios builds.Gravatar thakis2016-01-03
| | | | | | | BUG=chromium:573779 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1559783002 Review URL: https://codereview.chromium.org/1559783002
* De-spam SkOpts.cppGravatar mtklein2015-11-16
| | | | | | | | | - stop printing when we detect sse4.2 and avx2 - that TODO is pretty well done BUG=skia: Review URL: https://codereview.chromium.org/1445343002
* float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX.Gravatar mtklein2015-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1432903002
* sse 4.2 detectionGravatar mtklein2015-11-09
| | | | | | | | While we're detecting instruction sets, let's fill in this hole too. BUG=skia: Review URL: https://codereview.chromium.org/1419553007
* avx and avx2 detectionGravatar mtklein2015-11-06
| | | | | | | | | | | | This doesn't do anything yet beyond print out a message in Debug mode, but it's a start. Those messages should match the -SSE4-, -AVX-, or -AVX2- in the Test-...-Debug-Trybots below. The Release ones are just running by accident. So far they look right to me. BUG=skia: Review URL: https://codereview.chromium.org/1428153003
* Android framework builds can't see cpu-features.hGravatar mtklein2015-10-28
| | | | | | | | Don't try to do NEON detection for Android framework builds. BUG=skia: Review URL: https://codereview.chromium.org/1423953004
* Port SkMatrix opts to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Try again to put SkXfermode_opts in SK_OPTS_NSGravatar mtklein2015-08-18
| | | | | | | | Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++. BUG=skia:4117 Review URL: https://codereview.chromium.org/1296183004
* Update SkOpts namespaces.Gravatar mtklein2015-08-18
| | | | | | | | portable -> default, and everyone gets an sk_ prefix. BUG=skia:4117 Review URL: https://codereview.chromium.org/1299013003
* Patches on top of Radu's latest.Gravatar mtklein2015-08-18
| | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Review URL: https://codereview.chromium.org/1288323004
* Remove SkOpts_sse2.cpp.Gravatar mtklein2015-08-18
| | | | | | | | | | | | | | | | | | It's sort of pointless: all our clients that will have SSE2 at runtime have it unconditionally at compile time, so the functions in namespace portable will pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code. A couple of the procs we had in _sse2.cpp can benefit slightly when compiled with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups on platforms like Linux and Windows that have a baseline of SSE2. Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op. Renaming namespace portable to something clearer is TBD. BUG=skia:4117 Review URL: https://codereview.chromium.org/1294213002
* Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 ↵Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002
* Refactor to put SkXfermode_opts inside SK_OPTS_NS.Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004
* Sk4px blit mask.Gravatar mtklein2015-08-10
| | | | | | | | | | | | | | | | Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups). I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results. NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range. The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version. Somewhat fantastically, I see no pixel diffs on GMs or SKPs. I will be following up with more CLs for the other procs called by SkBlitMask. BUG=skia: Review URL: https://codereview.chromium.org/1278253003
* Port SkTextureCompression opts to SkOptsGravatar mtklein2015-08-06
| | | | | | | | | | | | | | | | Pretty vanilla translation. I cleaned up who calls whom a little. Used to be utils -> opts -> utils, now it's just utils -> opts. I may follow up with a pass over the NEON code for readability and to clean up dead code. This turns on NEON A8->R11EAC conversion for ARMv8. Unit tests which now hit the NEON code still pass. I can't find any related bench. BUG=skia:4117 Review URL: https://codereview.chromium.org/1273103002
* Port morphology to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | Nothing too fancy. Direction enums become enum classes so they don't get all confused. An alternative is to create one single Direction enum that both blur and morphology opts use. BUG=skia:4117 Review URL: https://codereview.chromium.org/1267343004
* Reorganize to keep similar code together.Gravatar Mike Klein2015-08-04
| | | | | | | | | This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change. BUG=skia:4117 R=djsollen@google.com Review URL: https://codereview.chromium.org/1264423002 .
* Port SkBlurImage opts to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | +268 -535 lines I also rearranged the code a little bit to encapsulate itself better, mostly replacing static helper functions with lambdas. This also let me merge the SSE2 and SSE4.1 code paths. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264103004
* disable SkOpts on x86 iOS (simulator)Gravatar mtklein2015-07-31
| | | | | | | | TBR= BUG=skia: Review URL: https://codereview.chromium.org/1266443006
* Port SkXfermode opts to SkOpts.hGravatar mtklein2015-07-31
| | | | | | | | | | | | | Renames Sk4pxXfermode.h to SkXfermode_opts.h, and refactors it a tiny bit internally. This moves xfermode optimization from being "compile-time everywhere but NEON" to simply "runtime everywhere". I don't anticipate any effect on perf or correctness. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264543006
* Port SkUtils opts to SkOpts.Gravatar mtklein2015-07-31
| | | | | | | | | | | | | | | | With this new arrangement, the benefits of inlining sk_memset16/32 have changed. On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower. On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster. On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster. We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization. So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8. BUG=skia:4117 Review URL: https://codereview.chromium.org/1270573002
* Runtime CPU detection for rsqrt().Gravatar mtklein2015-07-30
| | | | | | | | | | | | | | | This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time. These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback. (When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.) BUG=skia:4117,skia:4114 No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1264893002
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-30
| | | | | | | | | | This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 Review URL: https://codereview.chromium.org/1255193002
* Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of ↵Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1255193002/) Reason for revert: Chromium doesn't call SkGraphics::Init(). This setup won't work. Original issue's description: > Lay groundwork for SkOpts. > > This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 TBR=djsollen@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1261743002
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-27
This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Review URL: https://codereview.chromium.org/1255193002