aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkXfermode_opts_arm_neon.cpp
Commit message (Collapse)AuthorAge
* add/fix copyrightsGravatar reed2015-06-26
| | | | | | | BUG=skia: TBR= Review URL: https://codereview.chromium.org/1212393002
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1196713004
* Revert of Implement four more xfermodes with Sk4px. (patchset #16 id:290001 ↵Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1196713004/) Reason for revert: 64-bit ARM build failures. Original issue's description: > Implement four more xfermodes with Sk4px. > > HardLight, Overlay, Darken, and Lighten are all > ~2x faster with SSE, ~25% faster with NEON. > > This covers all previously-implemented NEON xfermodes. > 3 previous SSE xfermodes remain. Those need division > and sqrt, so I'm planning on using SkPMFloat for them. > It'll help the readability and NEON speed if I move that > into [0,1] space first. > > The main new concept here is c.thenElse(t,e), which behaves like > (c ? t : e) except, of course, both t and e are evaluated. This allows > us to emulate conditionals with vectors. > > This also removes the concept of SkNb. Instead of a standalone bool > vector, each SkNi or SkNf will just return their own types for > comparisons. Turns out to be a lot more manageable this way. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1205703008
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Review URL: https://codereview.chromium.org/1196713004
* Move Sk4px Xfermode code to a header so we can use it twice.Gravatar mtklein2015-05-22
| | | | | | | | | | | | | | | | - Once in SkXfermode as usual to pick up compile-time SSE and NEON - Once in SkXfermode_arm_neon to pick up run-time NEON This allows us to start cleaning up SkXfermode_arm_neon as we've done for SkXfermode_SSE2. I'm saving this catharsis for a day when I need it. The Sk4px xfermodes are generally faster than the existing NEON procs, so this should also have the side effect of a perf win there. This means our new Plus-AA code works for runtime NEON too. BUG=skia:3852 Review URL: https://codereview.chromium.org/1150313003
* Avoid crash on some 64b ARM NEON platforms.Gravatar tomhudson2014-12-09
| | | | | | | | | | The compiler may choose to use x30 for a local loop counter; ensure it's saved. Patch from kevin.petit@arm.com, verified by benm@google.com. R=djsollen@google.com Review URL: https://codereview.chromium.org/786273003
* Remove SK_SUPPORT_LEGACY_DEEPFLATTENING.Gravatar mtklein2014-12-01
| | | | | | | | | | | | This was needed for pictures before v33, and we're now requiring v35+. Will follow up with the same for skia/ext/pixel_ref_utils_unittest.cc BUG=skia: Committed: https://skia.googlesource.com/skia/+/52c293547b973f7fb5de3c83f5062b07d759ab88 Review URL: https://codereview.chromium.org/769953002
* Revert of Remove SK_SUPPORT_LEGACY_DEEPFLATTENING. (patchset #1 id:1 of ↵Gravatar mtklein2014-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/769953002/) Reason for revert: Breaks canary builds. Will reland after the Chromium change lands. Original issue's description: > Remove SK_SUPPORT_LEGACY_DEEPFLATTENING. > > This was needed for pictures before v33, and we're now requiring v35+. > > Will follow up with the same for skia/ext/pixel_ref_utils_unittest.cc > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/52c293547b973f7fb5de3c83f5062b07d759ab88 TBR=reed@google.com,mtklein@chromium.org NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/768183002
* Remove SK_SUPPORT_LEGACY_DEEPFLATTENING.Gravatar mtklein2014-12-01
| | | | | | | | | | This was needed for pictures before v33, and we're now requiring v35+. Will follow up with the same for skia/ext/pixel_ref_utils_unittest.cc BUG=skia: Review URL: https://codereview.chromium.org/769953002
* Simplify flattening to just write enough to call the ↵Gravatar reed2014-08-21
| | | | | | | | | | | | | | | | | | factory/public-constructor for the class. We want to *not* rely on private constructors, and not rely on calling through the inheritance hierarchy for either flattening or unflattening(CreateProc). Refactoring pattern: 1. guard the existing constructor(readbuffer) with the legacy build-flag 2. If you are a instancable subclass, implement CreateProc(readbuffer) to create a new instances from the buffer params (or return NULL). If you're a shader subclass 1. You must read/write the local matrix if your class accepts that in its factory/constructor, else ignore it. R=robertphillips@google.com, mtklein@google.com, senorblanco@google.com, senorblanco@chromium.org, sugoi@chromium.org Author: reed@google.com Review URL: https://codereview.chromium.org/395603002
* ARM Skia NEON patches - 41 - arm64: SkXfermode::xfer32Gravatar kevin.petit2014-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the NEON code for Xfermodes performs well on arm64 targets except for dstout and dstin which are significantly slower than the C code. This patch fixes this and gives further improvements on other modes. Here are some perf results: +------------+------------+------------+ | mode | Cortex-A53 | Cortex-A57 | +------------+------------+------------+ | multiply | +24.58% | +23.71% | +------------+------------+------------+ | exclusion | +22.72% | +22.05% | +------------+------------+------------+ | difference | +34.67% | +36.82% | +------------+------------+------------+ | hardlight | +17.07% | +14.74% | +------------+------------+------------+ | lighten | +38.21% | +32.87% | +------------+------------+------------+ | darken | +37.59% | +32.99% | +------------+------------+------------+ | overlay | +17.36% | +16.88% | +------------+------------+------------+ | screen | +52.56% | +54.43% | +------------+------------+------------+ | modulate | +62.85% | +61.32% | +------------+------------+------------+ | plus | +91.52% | +117.41% | +------------+------------+------------+ | xor | +42.86% | +43.38% | +------------+------------+------------+ | dstatop | +48.46% | +48.99% | +------------+------------+------------+ | srcatop | +50.50% | +48.51% | +------------+------------+------------+ | dstout | +67.83% | +78.09% | +------------+------------+------------+ | srcout | +69.02% | +78.26% | +------------+------------+------------+ | dstin | +70.92% | +79.24% | +------------+------------+------------+ | srcin | +68.90% | +78.23% | +------------+------------+------------+ | dstover | +73.80% | +68.10% | +------------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/350343002
* ARM Skia NEON patches - 35 - First AArch64 supportGravatar commit-bot@chromium.org2014-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aarch64 support This change contains the necessary modifications to have Skia build and run properly on an ARMv8 processor in aarch64 execution state. Here's a list of the changes: - add an arm64 target to the build system + SK_CPU_ARM64 flag - MatrixTest was failing when built in Release mode. Fused MAC instructions were generated which made some intermediate results more accurate. As the test relies on result comparison, the more precise results when compared to others led to a gap bigger than what was tolerated. As I don't know if some actual skia code relies on results being comparable, I've disabled fused MAC instruction with -ffp-contract=off for arm64. - Modify include/core/SkOnce.h to have barriers work. - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. - use existing Xfermode optimisations with modifications that can be removed in the future when toolchains are ready. Also save a few instructions is two Xfermodes (will apply to ARM too). - use existing SkBoxBlur and SkMorphology optimisations. - use existing SkBlitMask optimisations - use existing BitmapProcState and Convolution optimisations. Future changes will include: - Blitters (only partialy merged upstream) - SkUtils (there's little value in sending asm optimisations without having them benchmarked on real hardware). Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: Committed: http://code.google.com/p/skia/source/detail?r=13980 R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/143423004 git-svn-id: http://skia.googlecode.com/svn/trunk@14025 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of ARM Skia NEON patches - 35 - First AArch64 support ↵Gravatar commit-bot@chromium.org2014-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/143423004/) Reason for revert: GYP's failing on most (all?) bots. Original issue's description: > ARM Skia NEON patches - 35 - First AArch64 support > > Aarch64 support > > This change contains the necessary modifications to have Skia build and > run properly on an ARMv8 processor in aarch64 execution state. > > Here's a list of the changes: > > - add an arm64 target to the build system + SK_CPU_ARM64 flag > > - MatrixTest was failing when built in Release mode. Fused MAC > instructions were generated which made some intermediate results > more accurate. As the test relies on result comparison, the more > precise results when compared to others led to a gap bigger than > what was tolerated. As I don't know if some actual skia code relies > on results being comparable, I've disabled fused MAC instruction > with -ffp-contract=off for arm64. > > - Modify include/core/SkOnce.h to have barriers work. > > - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. > > - use existing Xfermode optimisations with modifications that can be > removed in the future when toolchains are ready. Also save a few > instructions is two Xfermodes (will apply to ARM too). > > - use existing SkBoxBlur and SkMorphology optimisations. > > - use existing SkBlitMask optimisations > > - use existing BitmapProcState and Convolution optimisations. > > Future changes will include: > > - Blitters (only partialy merged upstream) > > - SkUtils (there's little value in sending asm optimisations without > having them benchmarked on real hardware). > > Signed-off-by: Kevin PETIT <kevin.petit@arm.com> > > BUG=skia: > > Committed: http://code.google.com/p/skia/source/detail?r=13980 R=djsollen@google.com, reed@google.com, halcanary@google.com, kevin.petit@arm.com TBR=djsollen@google.com, halcanary@google.com, kevin.petit@arm.com, reed@google.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: mtklein@google.com Review URL: https://codereview.chromium.org/216113005 git-svn-id: http://skia.googlecode.com/svn/trunk@13983 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 35 - First AArch64 supportGravatar commit-bot@chromium.org2014-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aarch64 support This change contains the necessary modifications to have Skia build and run properly on an ARMv8 processor in aarch64 execution state. Here's a list of the changes: - add an arm64 target to the build system + SK_CPU_ARM64 flag - MatrixTest was failing when built in Release mode. Fused MAC instructions were generated which made some intermediate results more accurate. As the test relies on result comparison, the more precise results when compared to others led to a gap bigger than what was tolerated. As I don't know if some actual skia code relies on results being comparable, I've disabled fused MAC instruction with -ffp-contract=off for arm64. - Modify include/core/SkOnce.h to have barriers work. - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. - use existing Xfermode optimisations with modifications that can be removed in the future when toolchains are ready. Also save a few instructions is two Xfermodes (will apply to ARM too). - use existing SkBoxBlur and SkMorphology optimisations. - use existing SkBlitMask optimisations - use existing BitmapProcState and Convolution optimisations. Future changes will include: - Blitters (only partialy merged upstream) - SkUtils (there's little value in sending asm optimisations without having them benchmarked on real hardware). Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/143423004 git-svn-id: http://skia.googlecode.com/svn/trunk@13980 2bbb7eff-a529-9590-31e7-b0007b416f81
* Allow toString capability to be toggled independent of developer mode.Gravatar commit-bot@chromium.org2014-03-13
| | | | | | | | | | | | This change is motivated by the desire to see the text information in the debugger when not in developer mode. It is structured so user's can disable it if the capability is not wanted. R=bsalomon@google.com Author: robertphillips@google.com Review URL: https://codereview.chromium.org/197763008 git-svn-id: http://skia.googlecode.com/svn/trunk@13795 2bbb7eff-a529-9590-31e7-b0007b416f81
* Refactor read and write buffers.Gravatar commit-bot@chromium.org2014-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminates SkFlattenable{Read,Write}Buffer, promoting SkOrdered{Read,Write}Buffer a step each in the hierarchy. What used to be this: SkFlattenableWriteBuffer -> SkOrderedWriteBuffer SkFlattenableReadBuffer -> SkOrderedReadBuffer SkFlattenableReadBuffer -> SkValidatingReadBuffer is now SkWriteBuffer SkReadBuffer -> SkValidatingReadBuffer Benefits: - code is simpler, names are less wordy - the generic SkFlattenableFooBuffer code in SkPaint was incorrect; removed - write buffers are completely devirtualized, important for record speed This refactoring was mostly mechanical. You aren't going to find anything interesting in files with less than 10 lines changed. BUG=skia: R=reed@google.com, scroggo@google.com, djsollen@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/134163010 git-svn-id: http://skia.googlecode.com/svn/trunk@13245 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 32 - Xfermode: 1-pixel NEON modeprocsGravatar commit-bot@chromium.org2013-12-06
| | | | | | | | | | | | | | | | | In some cases, it's easy to provide a NEON version of the 1-pixel modeprocs. Combined with https://codereview.chromium.org/23724013/ (merged) it allows up to 35% speed improvement on Xfermodes when aa is non-NULL. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, reed@google.com, mtklein@google.com, luisjoseromeroesclusa@hotmail.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/104883004 git-svn-id: http://skia.googlecode.com/svn/trunk@12525 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 29 - Xfermode: SkFourByteInterpGravatar commit-bot@chromium.org2013-12-02
| | | | | | | | | | | | | | | | | | | | Xfermode: add a NEON version of SkFourByteInterp Brings a modest performance improvement on its own in ProcXfermodes when aa is neither zero nor FF. Combined with 1-pixel NEON modeprocs, it brings up to 35% speed improvement on the aa case. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com, reed@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/23724013 git-svn-id: http://skia.googlecode.com/svn/trunk@12448 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 31 - Xfermode: xfer16Gravatar commit-bot@chromium.org2013-11-08
| | | | | | | | | | | | | | | | | | Xfermode: xfer16 This adds support for 16bit Xfermodes. It also tunes the gcc test macros in xfer32() to add compatibility for gcc > 4. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com, reed@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/33063002 git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 30 - Xfermode: NEON modeprocsGravatar commit-bot@chromium.org2013-10-17
| | | | | | | | | | | | | | | | | | | | | | | | Xfermode: NEON implementation of SIMD procs This patch contains a NEON implementation for a number of Xfermodes. It provides a big speedup on Xfermode benchmarks (currently up to 3x with gcc4.7 but up to 10x when gcc produces optimal code for it). Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= Committed: http://code.google.com/p/skia/source/detail?r=11777 Committed: http://code.google.com/p/skia/source/detail?r=11813 R=djsollen@google.com, mtklein@google.com, reed@google.com, robertphillips@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/26627004 git-svn-id: http://skia.googlecode.com/svn/trunk@11843 2bbb7eff-a529-9590-31e7-b0007b416f81
* Reverting r11813 (ARM Skia NEON patches - 30 - Xfermode: NEON modeprocs - ↵Gravatar robertphillips@google.com2013-10-17
| | | | | | https://codereview.chromium.org/26627004) due to Chromium compilation faliures. git-svn-id: http://skia.googlecode.com/svn/trunk@11833 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 30 - Xfermode: NEON modeprocsGravatar commit-bot@chromium.org2013-10-16
| | | | | | | | | | | | | | | | | | | | | | Xfermode: NEON implementation of SIMD procs This patch contains a NEON implementation for a number of Xfermodes. It provides a big speedup on Xfermode benchmarks (currently up to 3x with gcc4.7 but up to 10x when gcc produces optimal code for it). Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= Committed: http://code.google.com/p/skia/source/detail?r=11777 R=djsollen@google.com, mtklein@google.com, reed@google.com, robertphillips@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/26627004 git-svn-id: http://skia.googlecode.com/svn/trunk@11813 2bbb7eff-a529-9590-31e7-b0007b416f81
* Reverting r11777 (ARM Skia NEON patches - 30 - Xfermode: NEON modeprocs) due ↵Gravatar robertphillips@google.com2013-10-16
| | | | | | to Chromium compilation failure git-svn-id: http://skia.googlecode.com/svn/trunk@11799 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 30 - Xfermode: NEON modeprocsGravatar commit-bot@chromium.org2013-10-15
Xfermode: NEON implementation of SIMD procs This patch contains a NEON implementation for a number of Xfermodes. It provides a big speedup on Xfermode benchmarks (currently up to 3x with gcc4.7 but up to 10x when gcc produces optimal code for it). Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com, reed@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/26627004 git-svn-id: http://skia.googlecode.com/svn/trunk@11777 2bbb7eff-a529-9590-31e7-b0007b416f81