aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkBlitRow_opts_arm_neon.cpp
Commit message (Collapse)AuthorAge
* Fix skia bug 2845Gravatar kui.zheng2015-02-17
| | | | | | | | | | Shouldn't call Fast Blur path(DoubleRowBoxBlur_NEON) when kernelsize is 1. Or, uint16x8_t resultPixels will be overflow. BUG=skia:2845 R=senorblanco@chromium.org Review URL: https://codereview.chromium.org/587543003
* skia: blend32_16_row for neon versionGravatar mlee2015-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This includes blend32_16_row neon implementation for aarch32 and aarch64. For performance, blend32_16_row is called in following tests in nanobench. - Xfermode_SrcOver - tablebench - rotated_rects_bw_alternating_transparent_and_opaque_srcover - rotated_rects_bw_changing_transparent_srcover - rotated_rects_bw_same_transparent_srcover - luma_colorfilter_large - luma_colorfilter_small - chart_bw I can see perf increase in following two tests, especially. For others, looks similar. For each, I tried to run two times. 1) Xfermode_SrcOver <org> - D/skia ( 2000): 3M 57 17.3µs 17.4µs 17.4µs 17.7µs 1% █▃▂▃▂▂▂▁▃▂ 565 Xfermode_SrcOver - D/skia ( 1915): 3M 70 13.5µs 16.9µs 16.7µs 18.8µs 9% ▆█▄▅█▁▅▅▆▄ 565 Xfermode_SrcOver <new> - D/skia ( 2000): 3M 8 11.6µs 11.8µs 12.1µs 14.4µs 7% ▃█▁▁▂▁▁▁▂▂ 565 Xfermode_SrcOver - D/skia ( 2004): 3M 62 10.3µs 12.9µs 13µs 15.2µs 11% █▅▅▆▁▅▅▅▇▃ 565 Xfermode_SrcOver 2) luma_colorfilter_large <org> - D/skia ( 2000): 159M 8 136µs 136µs 136µs 139µs 1% █▃▁▂▁▁▁▁▁▁ 565 luma_colorfilter_large - D/skia ( 1915): 158M 2 135µs 177µs 182µs 269µs 22% ▆▃█▁▁▃▃▃▃▃ 565 luma_colorfilter_large <new> - D/skia ( 2000): 157M 5 84.2µs 85.3µs 87.5µs 110µs 9% █▁▂▁▁▁▁▁▁▁ 565 luma_colorfilter_large - D/skia ( 2004): 159M 6 84.7µs 110µs 112µs 144µs 18% █▄▇▁▁▄▃▄▄▆ 565 luma_colorfilter_large Review URL: https://codereview.chromium.org/847363002
* rename blitrow::proc and add (uncalled) hook for colorproc16Gravatar reed2015-01-13
| | | | | | BUG=skia:3302 Review URL: https://codereview.chromium.org/847443003
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | | | | BUG=skia:2797 Committed: https://skia.googlesource.com/skia/+/84cab93186fbe3e87d931fea73cb31b70ff5017b R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | BUG=skia:2797 R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* disable neon proc that is triggering assertsGravatar reed2014-08-22
| | | | | | | | | BUG=skia:2845 R=mtklein@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/498733002
* Let skia build with clang's integrated assembler.Gravatar thakis2014-08-11
| | | | | | | | | | | | | | | | 1. vuzpq is a gcc instruction. Replace it with the equivalent vuzp (see http://llvm.org/PR20423) 2. .func / .endfunc only have an effect with -gstabs, which we don't use. As it's unused and clang doesn't support it, remove .func / .endfunc (also see http://llvm.org/20424) BUG=chromium:124610 R=mtklein@google.com Author: thakis@chromium.org Review URL: https://codereview.chromium.org/461693004
* Fix S32A_D565_Opaque for RGBA on arm64Gravatar kevin.petit2014-08-09
| | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia:2813 R=halcanary@google.com, djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/458453002
* Disable suspect NEON function for 64-bit AndroidGravatar djsollen2014-08-07
| | | | | | | | R=halcanary@google.com, mtklein@google.com, kevin.petit@arm.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/451633006
* ARM Skia NEON patches - 40 - arm64: S32A_D565_OpaqueGravatar kevin.petit2014-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are some perf results: +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.54% | -5.39% | +-------+------------+------------+ | 2 | -0.66% | -2.08% | +-------+------------+------------+ | 4 | -11.13% | 0.00% | +-------+------------+------------+ | 8 | -5.79% | -1.30% | +-------+------------+------------+ | 16 | 71.60% | 93.27% | +-------+------------+------------+ | 64 | 30.99% | 57.35% | +-------+------------+------------+ | 256 | 25.41% | 52.59% | +-------+------------+------------+ | 1024 | 25.56% | 53.76% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/346843003
* ARM Skia NEON patches - 39 - arm64 565 blittersGravatar kevin.petit2014-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -18.37% | -13.04% | +-------+------------+------------+ | 2 | -9.90% | -13.78% | +-------+------------+------------+ | 4 | -8.28% | -6.77% | +-------+------------+------------+ | 8 | 157.63% | 78.15% | +-------+------------+------------+ | 16 | 72.67% | 44.81% | +-------+------------+------------+ | 64 | 76.78% | 40.89% | +-------+------------+------------+ | 256 | 73.85% | 36.05% | +-------+------------+------------+ | 1024 | 75.73% | 36.70% | +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -9.99% | -13.79% | +-------+------------+------------+ | 2 | -9.17% | -6.74% | +-------+------------+------------+ | 4 | -6.73% | -4.42% | +-------+------------+------------+ | 8 | 163.31% | 112.82% | +-------+------------+------------+ | 16 | 55.21% | 44.68% | +-------+------------+------------+ | 64 | 54.09% | 41.99% | +-------+------------+------------+ | 256 | 52.63% | 40.64% | +-------+------------+------------+ | 1024 | 52.46% | 40.45% | +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -5.88% | -6.06% | +-------+------------+------------+ | 2 | -4.74% | -0.01% | +-------+------------+------------+ | 4 | -5.42% | -3.03% | +-------+------------+------------+ | 8 | 78.78% | 77.96% | +-------+------------+------------+ | 16 | 98.19% | 79.61% | +-------+------------+------------+ | 64 | 111.56% | 72.60% | +-------+------------+------------+ | 256 | 113.80% | 69.96% | +-------+------------+------------+ | 1024 | 114.42% | 70.85% | +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.18% | -0.93% | +-------+------------+------------+ | 2 | -2.43% | -2.04% | +-------+------------+------------+ | 4 | -1.09% | -1.23% | +-------+------------+------------+ | 8 | 184.89% | 136.53% | +-------+------------+------------+ | 16 | 128.64% | 89.11% | +-------+------------+------------+ | 64 | 132.68% | 100.98% | +-------+------------+------------+ | 256 | 157.02% | 100.86% | +-------+------------+------------+ | 1024 | 163.85% | 103.62% | +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.87% | 0.01% | +-------+------------+------------+ | 2 | -2.71% | 2.97% | +-------+------------+------------+ | 4 | -2.20% | 0.28% | +-------+------------+------------+ | 8 | 149.76% | 146.80% | +-------+------------+------------+ | 16 | 85.69% | 95.77% | +-------+------------+------------+ | 64 | 88.81% | 101.39% | +-------+------------+------------+ | 256 | 97.32% | 107.22% | +-------+------------+------------+ | 1024 | 98.08% | 115.71% | +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -1.86% | 0.02% | +-------+------------+------------+ | 2 | -0.58% | -1.52% | +-------+------------+------------+ | 4 | -0.75% | 1.16% | +-------+------------+------------+ | 8 | 240.74% | 155.16% | +-------+------------+------------+ | 16 | 181.97% | 132.15% | +-------+------------+------------+ | 64 | 203.11% | 136.48% | +-------+------------+------------+ | 256 | 223.45% | 133.05% | +-------+------------+------------+ | 1024 | 225.96% | 134.05% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
* SK_CPU_ARM --> SK_CPU_ARM32Gravatar mtklein2014-06-03
| | | | | | | | | | | That's what it means. It keeps confusing us as named today. BUG=skia: R=djsollen@google.com, mtklein@google.com, reed@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/314643004
* ARM Skia NEON patches - 38 - arm64 8888 blittersGravatar kevin.petit2014-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable NEON on arm64 for most 8888 blitters This patch enables NEON optimisation for the Color32, S32_Blend, S32A_Opaque blitters on arm64. Here are the perf improvements vs the existing code: Color32: ======== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.39% | 23.78% | +-------+------------+------------+ | 2 | -5.46% | 8.88% | +-------+------------+------------+ | 4 | -4.74% | 4.89% | +-------+------------+------------+ | 8 | 67.74% | 107.12% | +-------+------------+------------+ | 16 | 40.03% | 101.20% | +-------+------------+------------+ | 64 | 11.09% | 98.40% | +-------+------------+------------+ | 256 | -2.20% | 74.81% | +-------+------------+------------+ | 1024 | -4.28% | 78.90% | +-------+------------+------------+ S32_Blend: ========== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | 7.84% | -6.75% | +-------+------------+------------+ | 2 | 28.95% | 39.77% | +-------+------------+------------+ | 4 | 5.80% | 8.26% | +-------+------------+------------+ | 8 | 1.35% | 33.80% | +-------+------------+------------+ | 16 | -2.13% | 41.13% | +-------+------------+------------+ | 64 | -4.91% | 42.84% | +-------+------------+------------+ | 256 | -6.53% | 48.72% | +-------+------------+------------+ | 1024 | -6.65% | 46.66% | +-------+------------+------------+ S32A_Opaque: ============ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -7.51% | -19.06% | +-------+------------+------------+ | 2 | -5.02% | -27.70% | +-------+------------+------------+ | 4 | 15.38% | -21.66% | +-------+------------+------------+ | 8 | -0.98% | 1.05% | +-------+------------+------------+ | 16 | -7.35% | 3.34% | +-------+------------+------------+ | 64 | 50.53% | 94.63% | +-------+------------+------------+ | 256 | 71.17% | 164.10% | +-------+------------+------------+ | 1024 | 79.58% | 197.60% | +-------+------------+------------+ Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/302283003
* Remove the unused SkCachePreload_armGravatar commit-bot@chromium.org2014-04-30
| | | | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/263553008 git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 36 - Color32Gravatar commit-bot@chromium.org2014-04-29
| | | | | | | | | | | | | | | | | | Convert Color32 to intrinsics This change is performance-neutral for high values of count and is a big improvement for values smaller than 64. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, borenet@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/258173005 git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 22 - S32_D565_BlendGravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: new NEON version of S32_D565_Blend This new implementation brings a good speedup in most cases and gives exact results (removes one mismatch in gm). Here are the benchmark results (speedup vs. existing S32A_D565_Blend): +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | -26,7% | -27,5% | +-------+-----------+------------+ | 2 | 0% | +53% | +-------+-----------+------------+ | 4 | +38,3% | +26,5% | +-------+-----------+------------+ | 8 | +10,9% | -4,5% | +-------+-----------+------------+ | 16 | +18,2% | +1,6% | +-------+-----------+------------+ | 64 | +22,3% | +8,75% | +-------+-----------+------------+ | 256 | +12,3% | +11,2% | +-------+-----------+------------+ | 1024 | +79,2% | +10,9% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/181523002 git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 25 - S32A_D565_Opaque_Dither clean/bugfix/speedGravatar commit-bot@chromium.org2014-02-25
| | | | | | | | | | | | | | | | | | | | BlitRow565: S32A_D565_Opaque_Dither: some improvements - Supports ARGB and ABGR - Less magic numbers - Reduced instruction count : 5-25% speedup - Fixed indentation, removed some commented and useless code Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/177963003 git-svn-id: http://skia.googlecode.com/svn/trunk@13577 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 12 - S32_BlendGravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32_Blend fix and little speed improvement - the results are now exactly similar as the C code - the speed has improved, especially for small values of count +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +30% | +18% | +-------+-----------+------------+ | 2 | 0 | 0 | +-------+-----------+------------+ | 4 | - <1% | +14% | +-------+-----------+------------+ | > 4 | -0.5..+5% | -0.5..+4% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: Committed: http://code.google.com/p/skia/source/detail?r=13532 R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/158973002 git-svn-id: http://skia.googlecode.com/svn/trunk@13543 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of ARM Skia NEON patches - 12 - S32_Blend ↵Gravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/158973002/) Reason for revert: Breaking the build. See http://108.170.219.164:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/builds/2966 (and others). We are getting warnings that vsrc and vdst may be uninitialized. Please fix and resubmit. Original issue's description: > ARM Skia NEON patches - 12 - S32_Blend > > Blitrow32: S32_Blend fix and little speed improvement > > - the results are now exactly similar as the C code > - the speed has improved, especially for small values of count > > +-------+-----------+------------+ > | count | Cortex-A9 | Cortex-A15 | > +-------+-----------+------------+ > | 1 | +30% | +18% | > +-------+-----------+------------+ > | 2 | 0 | 0 | > +-------+-----------+------------+ > | 4 | - <1% | +14% | > +-------+-----------+------------+ > | > 4 | -0.5..+5% | -0.5..+4% | > +-------+-----------+------------+ > > Signed-off-by: Kévin PETIT <kevin.petit@arm.com> > > BUG=skia: > > Committed: http://code.google.com/p/skia/source/detail?r=13532 R=djsollen@google.com, mtklein@google.com, kevin.petit@arm.com TBR=djsollen@google.com, kevin.petit@arm.com, mtklein@google.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: scroggo@google.com Review URL: https://codereview.chromium.org/175433002 git-svn-id: http://skia.googlecode.com/svn/trunk@13534 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 12 - S32_BlendGravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32_Blend fix and little speed improvement - the results are now exactly similar as the C code - the speed has improved, especially for small values of count +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +30% | +18% | +-------+-----------+------------+ | 2 | 0 | 0 | +-------+-----------+------------+ | 4 | - <1% | +14% | +-------+-----------+------------+ | > 4 | -0.5..+5% | -0.5..+4% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/158973002 git-svn-id: http://skia.googlecode.com/svn/trunk@13532 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 27 - S32A_D565_BlendGravatar commit-bot@chromium.org2014-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: new intrinsics version of S32A_D565_Blend This new version is basically a rewrite of the existing code with a few speed and accuracy improvements. There is a switch to enable pixel perfect results at the cost of a (quite big) decrease of performances (disabled in this patch). Here are the benchmark results (speedup vs. existing code): +-------+------------+------------+ | count | Cortex -A9 | Cortex-A15 | +-------+------------+------------+ | 1 | +103.6% | +12% | +-------+------------+------------+ | 2 | +3.6% | +21.6% | +-------+------------+------------+ | 4 | +0.8% | -0.8% | +-------+------------+------------+ | 8 | +3.9% | -1% | +-------+------------+------------+ | 16 | +14.7% | +5.7% | +-------+------------+------------+ | 64 | +18.1% | +13.2% | +-------+------------+------------+ | 256 | +16.3% | +27.4% | +-------+------------+------------+ | 1024 | +78.2% | +17.4% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/156113005 git-svn-id: http://skia.googlecode.com/svn/trunk@13438 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 31 - Xfermode: xfer16Gravatar commit-bot@chromium.org2013-11-08
| | | | | | | | | | | | | | | | | | Xfermode: xfer16 This adds support for 16bit Xfermodes. It also tunes the gcc test macros in xfer32() to add compatibility for gcc > 4. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com, reed@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/33063002 git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 24 - S32_D565_Blend_Dither slight speedup/bugfixGravatar commit-bot@chromium.org2013-09-26
| | | | | | | | | | | | | | | | | | | | BlitRow565: S32_D565_Blend_Dither, slight speedup + bugfix This patch adds a rewrite of S32_D565_Blend_Dither in intrinsics. The newer version is faster (10-20% depending on the value of count) and also supports ARGB as well as ABGR. It also adds the missing assert at the beginning of the function. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22566002 git-svn-id: http://skia.googlecode.com/svn/trunk@11473 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 21 - new NEON S32_D565_OpaqueGravatar commit-bot@chromium.org2013-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: NEON version of S32_D565_Opaque Here's a new implementation of S32_D565_Opaque in NEON. It improves dramatically the speed compared to S32A_D565_Opaque. Here are the benchmark results (speedup vs. existing NEON): +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +130% | +139% | +-------+-----------+------------+ | 2 | +65,2% | +51% | +-------+-----------+------------+ | 4 | -25,5% | +10,2% | +-------+-----------+------------+ | 8 | +63,8% | +32,1% | +-------+-----------+------------+ | 16 | +110% | +49,2% | +-------+-----------+------------+ | 64 | +153% | +123,5% | +-------+-----------+------------+ | 256 | +151% | +144,7% | +-------+-----------+------------+ | 1024 | +272% | +157,2% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22351006 git-svn-id: http://skia.googlecode.com/svn/trunk@11415 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 23 - S32_D565_Opaque_Dither cleanup/bugfix/speedGravatar commit-bot@chromium.org2013-09-18
| | | | | | | | | | | | | | | | | | | | | | | BlitRow565: S32_D565_Opaque_Dither: cleaning / bugfix This patch brings a little code cleaning (spaces/comments) and a little speed improvement (by using post-incrementation in the asm) but more importantly it fixes a bug on Linux. The new code now supports ARGB as well as ABGR. I removed the comment as I have confirmed with benchmarks that this code bring a *massive* (3x-7x) speedup compared to the C code. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22269003 git-svn-id: http://skia.googlecode.com/svn/trunk@11339 2bbb7eff-a529-9590-31e7-b0007b416f81
* Cleanup the ARM blitrow optimizationsGravatar djsollen@google.com2013-08-09
| | | | | | | | R=mtklein@google.com Review URL: https://codereview.chromium.org/22229002 git-svn-id: http://skia.googlecode.com/svn/trunk@10652 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 14 - S32A_BlendGravatar commit-bot@chromium.org2013-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32A_Blend new NEON version Adding a NEON version of S32A_Blend_BlitRow32. Here are the benchmark results: +-------+--------------------------+--------------------------+ | | Speedup vs. C | Speedup vs. ARM asm | | count +------------+-------------+------------+-------------+ | | Cortex A-9 | Cortex A-15 | Cortex A-9 | Cortex A-15 | +-------+------------+-------------+------------+-------------+ | 1 | +8,5% | +18,5% | +0.9% | +2,9% | +-------+------------+-------------+------------+-------------+ | 2 | +65,6% | +94% | +70,3% | +80% | +-------+------------+-------------+------------+-------------+ | 4 | +42,4% | +87,8% | +56,8% | +84,4% | +-------+------------+-------------+------------+-------------+ | 8 | +30% | +90% | +49,9% | +82,7% | +-------+------------+-------------+------------+-------------+ | 16 | +23,1% | +95,4% | +46,6% | +87,6% | +-------+------------+-------------+------------+-------------+ | 64 | +23,1% | +95,7% | +46,1% | +89,4% | +-------+------------+-------------+------------+-------------+ | 256 | +35,5% | +122% | +53,6% | +99,2% | +-------+------------+-------------+------------+-------------+ | 1024 | +61,8% | +101% | +64,2% | +91,2% | +-------+------------+-------------+------------+-------------+ BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18614010 git-svn-id: http://skia.googlecode.com/svn/trunk@10480 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 01 - Simple fixesGravatar commit-bot@chromium.org2013-07-15
| | | | | | | | | | | | | | | | | | | | | | | | This series contains a few fairly non-controversial fixes. Misc: remove dead references to neon 4444 functions Misc: avoid the double _neon_neon suffix in the clamp matrix functions. MAKENAME already adds the _neon suffix Misc: a few stupid / obvious fixes BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18666004 git-svn-id: http://skia.googlecode.com/svn/trunk@10072 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 13 - S32A_OpaqueGravatar commit-bot@chromium.org2013-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32A_Opaque code cleaning and speed improvement - the old way of calculating alpha doesn't seem to be used anymore, so remove the remaining code - adding prefetching allows to improve performance greatly in some cases at the expense of a little trade-off: +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1,2 | 0 | 0 | +-------+-----------+------------+ | 4 | 0 | -3% | +-------+-----------+------------+ | 8 | 0 | -4% | +-------+-----------+------------+ | 16 | 0 | -5% | +-------+-----------+------------+ | 64 | +14% | 0 | +-------+-----------+------------+ | 256 | +14% | +12% | +-------+-----------+------------+ | 1024 | +115% | +15% | +-------+-----------+------------+ BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18459008 git-svn-id: http://skia.googlecode.com/svn/trunk@10026 2bbb7eff-a529-9590-31e7-b0007b416f81
* Partial reapply of r5364 minus the non-neon code path.Gravatar djsollen@google.com2013-04-09
| | | | | | | | See https://codereview.appspot.com/6465075 for a more detailed description of the contents of this CL. Review URL: https://codereview.chromium.org/13060004 git-svn-id: http://skia.googlecode.com/svn/trunk@8579 2bbb7eff-a529-9590-31e7-b0007b416f81
* Fix errors when compiling with -Wall -Werror on Android.Gravatar djsollen@google.com2013-02-07
| | | | | | | | This CL also turns those features on by default on Android Review URL: https://codereview.appspot.com/7313049 git-svn-id: http://skia.googlecode.com/svn/trunk@7645 2bbb7eff-a529-9590-31e7-b0007b416f81
* Reverting r5364 (Update ARM and NEON optimizations for S32A_Opaque_BlitRow32)Gravatar robertphillips@google.com2012-09-04
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@5378 2bbb7eff-a529-9590-31e7-b0007b416f81
* Update ARM and NEON optimizations for S32A_Opaque_BlitRow32.Gravatar djsollen@google.com2012-08-31
| | | | | | | | These patches replace those written by ARM with ones provided by NVidia. Review URL: https://codereview.appspot.com/6465075 git-svn-id: http://skia.googlecode.com/svn/trunk@5364 2bbb7eff-a529-9590-31e7-b0007b416f81
* Result of running tools/sanitize_source_files.py (which was added in ↵Gravatar rmistry@google.com2012-08-23
| | | | | | | | | https://codereview.appspot.com/6465078/) This CL is part I of IV (I broke down the 1280 files into 4 CLs). Review URL: https://codereview.appspot.com/6485054 git-svn-id: http://skia.googlecode.com/svn/trunk@5262 2bbb7eff-a529-9590-31e7-b0007b416f81
* arm: dynamic NEON support for SkBlitRow_opts_arm.cppGravatar digit@google.com2012-08-08
This patch moves all NEON-specific code from the source src/opts/SkBlitRow_opts_arm.cpp into a new file that is built as part of the 'opts_arm_neon' static library. Review URL: https://codereview.appspot.com/6449110 git-svn-id: http://skia.googlecode.com/svn/trunk@5016 2bbb7eff-a529-9590-31e7-b0007b416f81