aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkSwizzler_opts.h
Commit message (Collapse)AuthorAge
* make most of SkColorPriv.h privateGravatar Cary Clark2017-09-15
| | | | | | | | | | | | | created new file src/core/SkColorData.h for internal consumption. Note that many of the functions there are unused as well. Bug: skia: 6898 R: reed@google.com Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d Reviewed-on: https://skia-review.googlesource.com/46848 Commit-Queue: Cary Clark <caryclark@google.com> Reviewed-by: Mike Reed <reed@google.com>
* make SkOpts functions inline, not staticGravatar Mike Klein2017-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | When Skia's built with an interestingly advanced instruction set baseline like SSSE3 or SSE4.1, we end up with two distinct copies of some SkOpts functions, one default in SkOpts.o and one specialization from SkOpts_{ssse3,sse41}.o. These functions are static, and so are technically unrelated, even though they're the same code compiled with the same instructions available. They're going to be identical. What we want here is to remove static but mark them as inline instead. In this case inline means "if the linker sees multiple copies of this, that's cool, just pick any one arbitrarily". That's just what we want. Now, when I disassemble a binary before and after this change, I do see the redundant routines removed. However, the file size change is minimal... I suspect that this must mean the linker has noticed that we had identical code and physically folded the two logically independent routines. I don't know how prevalent this optimization is, though, so it doesn't hurt to give it more of a "one copy please" hint with inline. There may also be a difference here between the binary size (~unchanged) and the in-memory layout of that binary? Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857 Reviewed-on: https://skia-review.googlesource.com/37501 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Ben Wagner <bungeman@google.com>
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a Review-Url: https://codereview.chromium.org/2045633002
* Revert of Move immintrin/arm_neon includes to where they are used. (patchset ↵Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #2 id:20001 of https://codereview.chromium.org/2045633002/ ) Reason for revert: Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways. Original issue's description: > Move immintrin/arm_neon includes to where they are used. > > On my Mac (so, immintrin), this improves compile time, both wall and cpu, > by about 16%. To test I ran this on an SSD with files hot in their caches: > > $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ > ninja -C out/Release -t clean && \ > time ninja -C out/Release > > Before: 159 wall / 3367 cpu > 159 wall / 3368 cpu > > After: 137 wall / 2860 cpu > 136 wall / 2863 cpu > > I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. > That made no signficant difference, so I've kept immintrin for its simplicity. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > TBR=reed@google.com > No public API changes. > > Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2046213002
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Review-Url: https://codereview.chromium.org/2045633002
* Optimize CMYK->RGBA (BGRA) transform for jpeg decodesGravatar msarett2016-02-08
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.14x Dell Venue 8 0.12x CMYK Jpeg Decode Runtime Nexus 6P 0.81x Dell Venue 8 0.85x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676773003
* SSE optimizations for GrayAlpha -> RGBA/BGRA Premul/UnpremulGravatar msarett2016-02-03
| | | | | | | | | | | | | | | | | | Swizzle Runtime (Dell Venue 8) Unpremul 0.17x Premul 0.20x PNG Decode Runtime on GrayAlpha Encoded PNGs (Dell Venue 8) Unpremul Regular 0.91x Unpremul ZeroInit 0.92x Premul Regular 0.84x Premul ZeroInit 0.85x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666853002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1666853002
* NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/UnpremulGravatar msarett2016-02-03
| | | | | | | | | | | | | | PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs) Regular Unpremul 0.91x Zero Init Unpremul 0.92x Regular Premul 0.84x Zero Init Premul 0.86x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1663623002
* SSSE3 optimizations for gray -> RGBA (or BGRA)Gravatar msarett2016-02-02
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Dell Venue 8 0.16x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.80x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1657393002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1657393002
* NEON optimizations for gray -> RGBA (or BGRA) conversionsGravatar msarett2016-02-02
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.32x Nexus 9 0.89x PNG Decode Time (for test set of gray encoded PNGs) Nexus 6P 0.88x Nexus 9 0.91x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1656383002
* SSSE3 opts for RGB -> RGB(FF) or BGR(FF)Gravatar msarett2016-01-22
| | | | | | | | | | | | | | | | Swizzle Bench Runtime z620 0.21x Dell Venue 8 0.26x RGB PNGs Decode Runtime z620 0.91x Dell Venus 8 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618603003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618603003
* Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzlerGravatar msarett2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P xxx_xxxa 0.32x xxx_swaprb_xxxa 0.31x Swizzle Bench Runtime Nexus 9 xxx_xxxa 1.11x xxx_swaprb_xxxa 1.14x (This is a slow down.) Swizzle Bench Runtime Nexus 5 xxx_xxxa 0.12x xxx_swaprb 0.12x RGB PNG Decode Runtime Nexus 6P 0.94x Nexus 9 0.98x I don't know how to explain the fact that the Swizzle Bench was slower on Nexus 9, but the decode times got faster. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618003002
* Refactor swizzle names and types.Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | - Plant a flag to say "pretend all the inputs are RGBA". This is how libpng thinks. This is the opposite of what the implementation had been doing, so I've rearranged everything to reflect the new orientation. - Rewrite the names to be less mysterious looking. No more Xs. - Make the src type uniformly const void*, to allow for 888 (RGB) srcs. This should be performance and pixel neutral. (Please revert if it's not.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1626463002
* Add SSSE3 Optimizations for premul and swapGravatar msarett2016-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves deocde performance for RGBA pngs. Swizzler Time on z620 (clang): SwapPremul 0.24x Premul 0.24x Swap 0.37x Decode Time on z620 (clang): Premul ZeroInit Decodes 0.88x Unpremul ZeroInit Decodes 0.94x Premul Regular Decodes 0.91x Unpremul Regular Decodes 0.98x Swizzler Time in Dell Venue 8 (gcc): SwapPremul 0.14x Premul 0.14x Swap 0.08x Decode Time on Dell Venus 8 (gcc): Premul ZeroInit Decodes 0.79x Premul Regular Decodes 0.77x Note: ZeroInit means memory is zero initialized, and we do not write to memory for large sections of zero pixels (memory use opt for Android). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1601883002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1601883002
* Add NEON swap opts and use opts in SkSwizzlerGravatar msarett2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. I am relying on Gold to help test all possible cases. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1581933006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1581933006
* Optimized premultiplying swizzles for NEONGravatar msarett2016-01-13
Improves decode performance for RGBA encoded PNGs. Swizzle Time on Nexus 9 (with clang): SwapPremul 0.44x Premul 0.44x Decode Time On Nexus 9 (with clang): ZeroInit Decodes 0.85x Regular Decodes 0.86x Swizzle Time on Nexus 6P (with clang) SwapPremul 0.14x Premul 0.14x Decode Time On Nexus 6P (with clang): ZeroInit Decodes 0.93x Regular Decodes 0.95x Notes: ZeroInit means memory is zero initialized, and we do not write to memory for large sections of zero pixels (memory use opt for Android). A profile on Nexus 9 shows that the premultiplication step of PNG decoding is now ~5% of decode time (down from ~20%). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1577703006