aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkOpts.h
Commit message (Collapse)AuthorAge
...
* Remove static initializer for SkOpts::Init()Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Static initializers run in a confusing unspecified order, so it's best to have as few of them as possible. Most tools and clients I can find already call SkGraphics::Init(), (or equivalently create an SkAutoGraphics) which calls SkOpts::Init(): - Chrome - Chrome renderer - Android - DM - nanobench - SampleApp - VisualBench - the old debugger Seems like the only thing relying on this static initializer today is the new debugger, fixed here. TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1903503002 Review URL: https://codereview.chromium.org/1903503002
* Graduate matrix map-point procs out of SkOpts.Gravatar mtklein2016-04-14
| | | | | | | | | | | | | | | These are implemented generically with Sk4s and don't benefit from anything fancier than vanilla SSE/NEON. This means there's no need to hide this code away in another file or behind a function pointer... it's readable and we have compile-time support for all the instructions it needs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1872193002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1872193002
* Port S32A_opaque blit row to SkOpts.Gravatar mtklein2016-03-23
| | | | | | | | | | This should be a pixel-for-pixel (i.e. bug-for-bug) port. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1820313002
* try plain-old code for sk_memset16/32 now that NEON is compile-timeGravatar mtklein2016-02-17
| | | | | | | | | | | | | Most of these implementations now just say "always inline". Let's see if we can get away with the simplicity of doing that all the time. These inlined implementations can autovectorize easily. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1639863002
* skeleton for float <-> half optimized procsGravatar mtklein2016-02-09
| | | | | | | | | | | | | | | | Nothing fancy yet, just calls the serial code in a loop. I will try to folow this up with at least some of: - SSE2 version of serial code - NEON version of serial code - NEON version using vcvt.f32.f16/vcvt.f16.f32 - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph The last two are fastest but need runtime detection. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003 Review URL: https://codereview.chromium.org/1686543003
* Optimize CMYK->RGBA (BGRA) transform for jpeg decodesGravatar msarett2016-02-08
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.14x Dell Venue 8 0.12x CMYK Jpeg Decode Runtime Nexus 6P 0.81x Dell Venue 8 0.85x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676773003
* NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/UnpremulGravatar msarett2016-02-03
| | | | | | | | | | | | | | PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs) Regular Unpremul 0.91x Zero Init Unpremul 0.92x Regular Premul 0.84x Zero Init Premul 0.86x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1663623002
* NEON optimizations for gray -> RGBA (or BGRA) conversionsGravatar msarett2016-02-02
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.32x Nexus 9 0.89x PNG Decode Time (for test set of gray encoded PNGs) Nexus 6P 0.88x Nexus 9 0.91x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1656383002
* de-proc sk_float_rsqrtGravatar mtklein2016-01-26
| | | | | | | | | | | | This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d Review URL: https://codereview.chromium.org/1616013003
* Revert of de-proc sk_float_rsqrt (patchset #3 id:40001 of ↵Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1616013003/ ) Reason for revert: This is somehow blocking the Google3 roll in ways neither Ben nor I understand. Precautionary revert... will try again Monday. Original issue's description: > de-proc sk_float_rsqrt > > This is the first of many little baby steps to have us stop runtime-detecting NEON. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d TBR=reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1629503002
* Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzlerGravatar msarett2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P xxx_xxxa 0.32x xxx_swaprb_xxxa 0.31x Swizzle Bench Runtime Nexus 9 xxx_xxxa 1.11x xxx_swaprb_xxxa 1.14x (This is a slow down.) Swizzle Bench Runtime Nexus 5 xxx_xxxa 0.12x xxx_swaprb 0.12x RGB PNG Decode Runtime Nexus 6P 0.94x Nexus 9 0.98x I don't know how to explain the fact that the Swizzle Bench was slower on Nexus 9, but the decode times got faster. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618003002
* de-proc sk_float_rsqrtGravatar mtklein2016-01-22
| | | | | | | | | | This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1616013003
* Refactor swizzle names and types.Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | - Plant a flag to say "pretend all the inputs are RGBA". This is how libpng thinks. This is the opposite of what the implementation had been doing, so I've rearranged everything to reflect the new orientation. - Rewrite the names to be less mysterious looking. No more Xs. - Make the src type uniformly const void*, to allow for 888 (RGB) srcs. This should be performance and pixel neutral. (Please revert if it's not.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1626463002
* Set up some hooks for premul/swizzzle opts.Gravatar mtklein2016-01-07
| | | | | | | | | | | | | You can call these as SkOpts::premul_xxxa, SkOpts::swaprb_xxxa, etc. For now, I just backed the function pointers with some (untested) portable code, which may autovectorize. We can override with optimized versions in Init_ssse3() (in SkOpts_ssse3.cpp), Init_neon() (SkOpts_neon.cpp), etc. BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1569013002 Review URL: https://codereview.chromium.org/1569013002
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 Review URL: https://codereview.chromium.org/1415653003
* Revert of Make SkBlurImageFilter capable of cropping during blur (patchset ↵Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #16 id:400001 of https://codereview.chromium.org/1415653003/ ) Reason for revert: ASAN failures (see https://codereview.chromium.org/1415653003/) Original issue's description: > Make SkBlurImageFilter capable of cropping during blur (raster path) > > SkBlurImageFilter can currently only process a source image > which is larger than or equal to the destination rect. If > the source image (or crop rect) is smaller, it is padded > out to dest size with transparent black via the 6-param > version of applyCropRect(). > > Fixing this requires modifying all the flavours of RGBA > box_blur() to accept a src crop rect. > > BUG=skia:4502, skia:4526 > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 TBR=mtklein@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4502, skia:4526 Review URL: https://codereview.chromium.org/1428053002
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1415653003
* Port SkMatrix opts to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Patches on top of Radu's latest.Gravatar mtklein2015-08-18
| | | | | | | | patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001) BUG=skia: Review URL: https://codereview.chromium.org/1288323004
* Sk4px blit mask.Gravatar mtklein2015-08-10
| | | | | | | | | | | | | | | | Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups). I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results. NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range. The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version. Somewhat fantastically, I see no pixel diffs on GMs or SKPs. I will be following up with more CLs for the other procs called by SkBlitMask. BUG=skia: Review URL: https://codereview.chromium.org/1278253003
* Port SkTextureCompression opts to SkOptsGravatar mtklein2015-08-06
| | | | | | | | | | | | | | | | Pretty vanilla translation. I cleaned up who calls whom a little. Used to be utils -> opts -> utils, now it's just utils -> opts. I may follow up with a pass over the NEON code for readability and to clean up dead code. This turns on NEON A8->R11EAC conversion for ARMv8. Unit tests which now hit the NEON code still pass. I can't find any related bench. BUG=skia:4117 Review URL: https://codereview.chromium.org/1273103002
* Port morphology to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | Nothing too fancy. Direction enums become enum classes so they don't get all confused. An alternative is to create one single Direction enum that both blur and morphology opts use. BUG=skia:4117 Review URL: https://codereview.chromium.org/1267343004
* Port SkBlurImage opts to SkOpts.Gravatar mtklein2015-08-04
| | | | | | | | | | | | +268 -535 lines I also rearranged the code a little bit to encapsulate itself better, mostly replacing static helper functions with lambdas. This also let me merge the SSE2 and SSE4.1 code paths. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264103004
* Move SkOpts.h back to src/core.Gravatar mtklein2015-07-31
| | | | | | | | | | | The Chrome opts targets (sse2, ssse3, sse41, etc) don't have include/private on their include path. This should unblock the roll. TBR=reed@google.com BUG=skia:4117 Review URL: https://codereview.chromium.org/1268853007
* Port SkUtils opts to SkOpts.Gravatar mtklein2015-07-31
| | | | | | | | | | | | | | | | With this new arrangement, the benefits of inlining sk_memset16/32 have changed. On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower. On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster. On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster. We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization. So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8. BUG=skia:4117 Review URL: https://codereview.chromium.org/1270573002
* Runtime CPU detection for rsqrt().Gravatar mtklein2015-07-30
| | | | | | | | | | | | | | | This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time. These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback. (When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.) BUG=skia:4117,skia:4114 No public API changes. TBR=reed@google.com Review URL: https://codereview.chromium.org/1264893002
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-30
| | | | | | | | | | This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 Review URL: https://codereview.chromium.org/1255193002
* Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of ↵Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1255193002/) Reason for revert: Chromium doesn't call SkGraphics::Init(). This setup won't work. Original issue's description: > Lay groundwork for SkOpts. > > This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 TBR=djsollen@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1261743002
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-27
This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Review URL: https://codereview.chromium.org/1255193002