aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkBlitRow_opts_arm_neon.cpp
Commit message (Collapse)AuthorAge
* SkBlendARGB32 and S32[A]_Blend_BlitRow32 are currently formulated as: ↵Gravatar lsalzman2016-08-05
| | | | | | | | | | | | | | | | | | | | | | SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate. In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684 In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338. This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off. This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein adds...] No public API changes. TBR=reed@google.com Review-Url: https://codereview.chromium.org/2097883002
* Port S32A_opaque blit row to SkOpts.Gravatar mtklein2016-03-23
| | | | | | | | | | This should be a pixel-for-pixel (i.e. bug-for-bug) port. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1820313002
* count is an int, so constrain it to a 32-bit w-register.Gravatar mtklein2015-12-16
| | | | | | | | | | | | | | This piece of code is already 64-bit only, so we don't need to think about ARMv7. Hopefully this shuts up the warnings. They were harmless. If this doesn't work (it's relatively new modifier, so maybe some compilers barf), an alternative is to cast count to a size_t. BUG=skia:4686 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1527123003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1527123003
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Style Change: NULL->nullptrGravatar halcanary2015-08-27
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316233002 Review URL: https://codereview.chromium.org/1316233002
* The compiler can generate smulbb perfectly well nowadays.Gravatar mtklein2015-08-07
| | | | | | BUG=skia:4117 Review URL: https://codereview.chromium.org/1273203002
* Purge non-NEON ARM code.Gravatar mtklein2015-08-06
| | | | | | | | As I begin to wade in here, it's nice to remove as much code as possible. BUG=skia:4117 Review URL: https://codereview.chromium.org/1277953002
* Update some Sk4px APIs.Gravatar mtklein2015-06-22
| | | | | | | | | | | | | | | Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002
* Re-proc SkBlitRow::Color32 for ARM.Gravatar mtklein2015-05-22
| | | | | | | | | | This is a spiritual revert of http://crrev.com/1104183004. BUG=skia: Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004 Review URL: https://codereview.chromium.org/1145283003
* Revert of Re-proc SkBlitRow::Color32 for ARM. (patchset #3 id:40001 of ↵Gravatar mtklein2015-05-22
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1145283003/) Reason for revert: http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ninja/builds/70016/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > Re-proc SkBlitRow::Color32 for ARM. > > This is a spiritual revert of http://crrev.com/1104183004. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1157633003
* Re-proc SkBlitRow::Color32 for ARM.Gravatar mtklein2015-05-21
| | | | | | | | This is a spiritual revert of http://crrev.com/1104183004. BUG=skia: Review URL: https://codereview.chromium.org/1145283003
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1104183004
* Revert of De-proc Color32 (patchset #5 id:80001 of ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1104183004/) Reason for revert: duh Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b > > Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1102363006
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Review URL: https://codereview.chromium.org/1104183004
* Revert of De-proc Color32 (patchset #4 id:60001 of ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1104183004/) Reason for revert: MIPS Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1108163002
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Review URL: https://codereview.chromium.org/1104183004
* Revert of Convert Color32 code to perfect blend. (patchset #6 id:100001 of ↵Gravatar mtklein2015-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1098913002/) Reason for revert: Xfermode_SrcOver not looking encouraging. Up to 50% regressions. https://perf.skia.org/#3242 Original issue's description: > Convert Color32 code to perfect blend. > > Before we commit to blend_256_round_alt, let's make sure blend_perfect is > really slower in practice (i.e. regresses on perf.skia.org). > > blend_perfect is really the most desirable algorithm if we can afford it. Not > only is it correct, but it's easy to think about and break into correct pieces: > for instance, its div255() doesn't require any coordination with the multiply. > > This looks like a 30% hit according to microbenches. That said, microbenches > said my previous change would be a 20-25% perf improvement, but it didn't end > up showing a significant effect at a high level. > > As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt > (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a > bunch of overdraw. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2 TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1083923006
* Convert Color32 code to perfect blend.Gravatar mtklein2015-04-20
| | | | | | | | | | | | | | | | | | | | | Before we commit to blend_256_round_alt, let's make sure blend_perfect is really slower in practice (i.e. regresses on perf.skia.org). blend_perfect is really the most desirable algorithm if we can afford it. Not only is it correct, but it's easy to think about and break into correct pieces: for instance, its div255() doesn't require any coordination with the multiply. This looks like a 30% hit according to microbenches. That said, microbenches said my previous change would be a 20-25% perf improvement, but it didn't end up showing a significant effect at a high level. As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a bunch of overdraw. BUG=skia: Review URL: https://codereview.chromium.org/1098913002
* Rework SSE and NEON Color32 algorithms to be more correct and faster.Gravatar mtklein2015-04-17
| | | | | | | | | | | | | | | | | | | | This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges. If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`. This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard. I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as: - multiply 8-bits and 8-bits producing 16-bits - add 16-bits to 16-bits, returning the top 8 bits. All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two. We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction. (I am probably missing several more related bugs here.) BUG=skia:3738,skia:420,chromium:111470 Review URL: https://codereview.chromium.org/1092433002
* I suspect S32A_D565_Opaque_neon for Daisy problems.Gravatar Mike Klein2015-04-02
| | | | | | | | | | | I don't see any color-order handling logic in the 32-bit code. BUG=skia:1843 CQ_EXCLUDE_TRYBOTS=client.skia.compile:Build-Win-MSVC-x86-Debug-Trybot,Build-Win-MSVC-x86_64-Debug-Trybot R=mtklein@google.com Review URL: https://codereview.chromium.org/1051683003
* Fix skia bug 2845Gravatar kui.zheng2015-02-17
| | | | | | | | | | Shouldn't call Fast Blur path(DoubleRowBoxBlur_NEON) when kernelsize is 1. Or, uint16x8_t resultPixels will be overflow. BUG=skia:2845 R=senorblanco@chromium.org Review URL: https://codereview.chromium.org/587543003
* skia: blend32_16_row for neon versionGravatar mlee2015-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This includes blend32_16_row neon implementation for aarch32 and aarch64. For performance, blend32_16_row is called in following tests in nanobench. - Xfermode_SrcOver - tablebench - rotated_rects_bw_alternating_transparent_and_opaque_srcover - rotated_rects_bw_changing_transparent_srcover - rotated_rects_bw_same_transparent_srcover - luma_colorfilter_large - luma_colorfilter_small - chart_bw I can see perf increase in following two tests, especially. For others, looks similar. For each, I tried to run two times. 1) Xfermode_SrcOver <org> - D/skia ( 2000): 3M 57 17.3µs 17.4µs 17.4µs 17.7µs 1% █▃▂▃▂▂▂▁▃▂ 565 Xfermode_SrcOver - D/skia ( 1915): 3M 70 13.5µs 16.9µs 16.7µs 18.8µs 9% ▆█▄▅█▁▅▅▆▄ 565 Xfermode_SrcOver <new> - D/skia ( 2000): 3M 8 11.6µs 11.8µs 12.1µs 14.4µs 7% ▃█▁▁▂▁▁▁▂▂ 565 Xfermode_SrcOver - D/skia ( 2004): 3M 62 10.3µs 12.9µs 13µs 15.2µs 11% █▅▅▆▁▅▅▅▇▃ 565 Xfermode_SrcOver 2) luma_colorfilter_large <org> - D/skia ( 2000): 159M 8 136µs 136µs 136µs 139µs 1% █▃▁▂▁▁▁▁▁▁ 565 luma_colorfilter_large - D/skia ( 1915): 158M 2 135µs 177µs 182µs 269µs 22% ▆▃█▁▁▃▃▃▃▃ 565 luma_colorfilter_large <new> - D/skia ( 2000): 157M 5 84.2µs 85.3µs 87.5µs 110µs 9% █▁▂▁▁▁▁▁▁▁ 565 luma_colorfilter_large - D/skia ( 2004): 159M 6 84.7µs 110µs 112µs 144µs 18% █▄▇▁▁▄▃▄▄▆ 565 luma_colorfilter_large Review URL: https://codereview.chromium.org/847363002
* rename blitrow::proc and add (uncalled) hook for colorproc16Gravatar reed2015-01-13
| | | | | | BUG=skia:3302 Review URL: https://codereview.chromium.org/847443003
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | | | | BUG=skia:2797 Committed: https://skia.googlesource.com/skia/+/84cab93186fbe3e87d931fea73cb31b70ff5017b R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | BUG=skia:2797 R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* disable neon proc that is triggering assertsGravatar reed2014-08-22
| | | | | | | | | BUG=skia:2845 R=mtklein@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/498733002
* Let skia build with clang's integrated assembler.Gravatar thakis2014-08-11
| | | | | | | | | | | | | | | | 1. vuzpq is a gcc instruction. Replace it with the equivalent vuzp (see http://llvm.org/PR20423) 2. .func / .endfunc only have an effect with -gstabs, which we don't use. As it's unused and clang doesn't support it, remove .func / .endfunc (also see http://llvm.org/20424) BUG=chromium:124610 R=mtklein@google.com Author: thakis@chromium.org Review URL: https://codereview.chromium.org/461693004
* Fix S32A_D565_Opaque for RGBA on arm64Gravatar kevin.petit2014-08-09
| | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia:2813 R=halcanary@google.com, djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/458453002
* Disable suspect NEON function for 64-bit AndroidGravatar djsollen2014-08-07
| | | | | | | | R=halcanary@google.com, mtklein@google.com, kevin.petit@arm.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/451633006
* ARM Skia NEON patches - 40 - arm64: S32A_D565_OpaqueGravatar kevin.petit2014-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are some perf results: +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.54% | -5.39% | +-------+------------+------------+ | 2 | -0.66% | -2.08% | +-------+------------+------------+ | 4 | -11.13% | 0.00% | +-------+------------+------------+ | 8 | -5.79% | -1.30% | +-------+------------+------------+ | 16 | 71.60% | 93.27% | +-------+------------+------------+ | 64 | 30.99% | 57.35% | +-------+------------+------------+ | 256 | 25.41% | 52.59% | +-------+------------+------------+ | 1024 | 25.56% | 53.76% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/346843003
* ARM Skia NEON patches - 39 - arm64 565 blittersGravatar kevin.petit2014-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -18.37% | -13.04% | +-------+------------+------------+ | 2 | -9.90% | -13.78% | +-------+------------+------------+ | 4 | -8.28% | -6.77% | +-------+------------+------------+ | 8 | 157.63% | 78.15% | +-------+------------+------------+ | 16 | 72.67% | 44.81% | +-------+------------+------------+ | 64 | 76.78% | 40.89% | +-------+------------+------------+ | 256 | 73.85% | 36.05% | +-------+------------+------------+ | 1024 | 75.73% | 36.70% | +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -9.99% | -13.79% | +-------+------------+------------+ | 2 | -9.17% | -6.74% | +-------+------------+------------+ | 4 | -6.73% | -4.42% | +-------+------------+------------+ | 8 | 163.31% | 112.82% | +-------+------------+------------+ | 16 | 55.21% | 44.68% | +-------+------------+------------+ | 64 | 54.09% | 41.99% | +-------+------------+------------+ | 256 | 52.63% | 40.64% | +-------+------------+------------+ | 1024 | 52.46% | 40.45% | +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -5.88% | -6.06% | +-------+------------+------------+ | 2 | -4.74% | -0.01% | +-------+------------+------------+ | 4 | -5.42% | -3.03% | +-------+------------+------------+ | 8 | 78.78% | 77.96% | +-------+------------+------------+ | 16 | 98.19% | 79.61% | +-------+------------+------------+ | 64 | 111.56% | 72.60% | +-------+------------+------------+ | 256 | 113.80% | 69.96% | +-------+------------+------------+ | 1024 | 114.42% | 70.85% | +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.18% | -0.93% | +-------+------------+------------+ | 2 | -2.43% | -2.04% | +-------+------------+------------+ | 4 | -1.09% | -1.23% | +-------+------------+------------+ | 8 | 184.89% | 136.53% | +-------+------------+------------+ | 16 | 128.64% | 89.11% | +-------+------------+------------+ | 64 | 132.68% | 100.98% | +-------+------------+------------+ | 256 | 157.02% | 100.86% | +-------+------------+------------+ | 1024 | 163.85% | 103.62% | +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.87% | 0.01% | +-------+------------+------------+ | 2 | -2.71% | 2.97% | +-------+------------+------------+ | 4 | -2.20% | 0.28% | +-------+------------+------------+ | 8 | 149.76% | 146.80% | +-------+------------+------------+ | 16 | 85.69% | 95.77% | +-------+------------+------------+ | 64 | 88.81% | 101.39% | +-------+------------+------------+ | 256 | 97.32% | 107.22% | +-------+------------+------------+ | 1024 | 98.08% | 115.71% | +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -1.86% | 0.02% | +-------+------------+------------+ | 2 | -0.58% | -1.52% | +-------+------------+------------+ | 4 | -0.75% | 1.16% | +-------+------------+------------+ | 8 | 240.74% | 155.16% | +-------+------------+------------+ | 16 | 181.97% | 132.15% | +-------+------------+------------+ | 64 | 203.11% | 136.48% | +-------+------------+------------+ | 256 | 223.45% | 133.05% | +-------+------------+------------+ | 1024 | 225.96% | 134.05% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
* SK_CPU_ARM --> SK_CPU_ARM32Gravatar mtklein2014-06-03
| | | | | | | | | | | That's what it means. It keeps confusing us as named today. BUG=skia: R=djsollen@google.com, mtklein@google.com, reed@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/314643004
* ARM Skia NEON patches - 38 - arm64 8888 blittersGravatar kevin.petit2014-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable NEON on arm64 for most 8888 blitters This patch enables NEON optimisation for the Color32, S32_Blend, S32A_Opaque blitters on arm64. Here are the perf improvements vs the existing code: Color32: ======== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.39% | 23.78% | +-------+------------+------------+ | 2 | -5.46% | 8.88% | +-------+------------+------------+ | 4 | -4.74% | 4.89% | +-------+------------+------------+ | 8 | 67.74% | 107.12% | +-------+------------+------------+ | 16 | 40.03% | 101.20% | +-------+------------+------------+ | 64 | 11.09% | 98.40% | +-------+------------+------------+ | 256 | -2.20% | 74.81% | +-------+------------+------------+ | 1024 | -4.28% | 78.90% | +-------+------------+------------+ S32_Blend: ========== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | 7.84% | -6.75% | +-------+------------+------------+ | 2 | 28.95% | 39.77% | +-------+------------+------------+ | 4 | 5.80% | 8.26% | +-------+------------+------------+ | 8 | 1.35% | 33.80% | +-------+------------+------------+ | 16 | -2.13% | 41.13% | +-------+------------+------------+ | 64 | -4.91% | 42.84% | +-------+------------+------------+ | 256 | -6.53% | 48.72% | +-------+------------+------------+ | 1024 | -6.65% | 46.66% | +-------+------------+------------+ S32A_Opaque: ============ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -7.51% | -19.06% | +-------+------------+------------+ | 2 | -5.02% | -27.70% | +-------+------------+------------+ | 4 | 15.38% | -21.66% | +-------+------------+------------+ | 8 | -0.98% | 1.05% | +-------+------------+------------+ | 16 | -7.35% | 3.34% | +-------+------------+------------+ | 64 | 50.53% | 94.63% | +-------+------------+------------+ | 256 | 71.17% | 164.10% | +-------+------------+------------+ | 1024 | 79.58% | 197.60% | +-------+------------+------------+ Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/302283003
* Remove the unused SkCachePreload_armGravatar commit-bot@chromium.org2014-04-30
| | | | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/263553008 git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 36 - Color32Gravatar commit-bot@chromium.org2014-04-29
| | | | | | | | | | | | | | | | | | Convert Color32 to intrinsics This change is performance-neutral for high values of count and is a big improvement for values smaller than 64. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, borenet@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/258173005 git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 22 - S32_D565_BlendGravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: new NEON version of S32_D565_Blend This new implementation brings a good speedup in most cases and gives exact results (removes one mismatch in gm). Here are the benchmark results (speedup vs. existing S32A_D565_Blend): +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | -26,7% | -27,5% | +-------+-----------+------------+ | 2 | 0% | +53% | +-------+-----------+------------+ | 4 | +38,3% | +26,5% | +-------+-----------+------------+ | 8 | +10,9% | -4,5% | +-------+-----------+------------+ | 16 | +18,2% | +1,6% | +-------+-----------+------------+ | 64 | +22,3% | +8,75% | +-------+-----------+------------+ | 256 | +12,3% | +11,2% | +-------+-----------+------------+ | 1024 | +79,2% | +10,9% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/181523002 git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 25 - S32A_D565_Opaque_Dither clean/bugfix/speedGravatar commit-bot@chromium.org2014-02-25
| | | | | | | | | | | | | | | | | | | | BlitRow565: S32A_D565_Opaque_Dither: some improvements - Supports ARGB and ABGR - Less magic numbers - Reduced instruction count : 5-25% speedup - Fixed indentation, removed some commented and useless code Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/177963003 git-svn-id: http://skia.googlecode.com/svn/trunk@13577 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 12 - S32_BlendGravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32_Blend fix and little speed improvement - the results are now exactly similar as the C code - the speed has improved, especially for small values of count +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +30% | +18% | +-------+-----------+------------+ | 2 | 0 | 0 | +-------+-----------+------------+ | 4 | - <1% | +14% | +-------+-----------+------------+ | > 4 | -0.5..+5% | -0.5..+4% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: Committed: http://code.google.com/p/skia/source/detail?r=13532 R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/158973002 git-svn-id: http://skia.googlecode.com/svn/trunk@13543 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of ARM Skia NEON patches - 12 - S32_Blend ↵Gravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/158973002/) Reason for revert: Breaking the build. See http://108.170.219.164:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/builds/2966 (and others). We are getting warnings that vsrc and vdst may be uninitialized. Please fix and resubmit. Original issue's description: > ARM Skia NEON patches - 12 - S32_Blend > > Blitrow32: S32_Blend fix and little speed improvement > > - the results are now exactly similar as the C code > - the speed has improved, especially for small values of count > > +-------+-----------+------------+ > | count | Cortex-A9 | Cortex-A15 | > +-------+-----------+------------+ > | 1 | +30% | +18% | > +-------+-----------+------------+ > | 2 | 0 | 0 | > +-------+-----------+------------+ > | 4 | - <1% | +14% | > +-------+-----------+------------+ > | > 4 | -0.5..+5% | -0.5..+4% | > +-------+-----------+------------+ > > Signed-off-by: Kévin PETIT <kevin.petit@arm.com> > > BUG=skia: > > Committed: http://code.google.com/p/skia/source/detail?r=13532 R=djsollen@google.com, mtklein@google.com, kevin.petit@arm.com TBR=djsollen@google.com, kevin.petit@arm.com, mtklein@google.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: scroggo@google.com Review URL: https://codereview.chromium.org/175433002 git-svn-id: http://skia.googlecode.com/svn/trunk@13534 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 12 - S32_BlendGravatar commit-bot@chromium.org2014-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32_Blend fix and little speed improvement - the results are now exactly similar as the C code - the speed has improved, especially for small values of count +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +30% | +18% | +-------+-----------+------------+ | 2 | 0 | 0 | +-------+-----------+------------+ | 4 | - <1% | +14% | +-------+-----------+------------+ | > 4 | -0.5..+5% | -0.5..+4% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/158973002 git-svn-id: http://skia.googlecode.com/svn/trunk@13532 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 27 - S32A_D565_BlendGravatar commit-bot@chromium.org2014-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: new intrinsics version of S32A_D565_Blend This new version is basically a rewrite of the existing code with a few speed and accuracy improvements. There is a switch to enable pixel perfect results at the cost of a (quite big) decrease of performances (disabled in this patch). Here are the benchmark results (speedup vs. existing code): +-------+------------+------------+ | count | Cortex -A9 | Cortex-A15 | +-------+------------+------------+ | 1 | +103.6% | +12% | +-------+------------+------------+ | 2 | +3.6% | +21.6% | +-------+------------+------------+ | 4 | +0.8% | -0.8% | +-------+------------+------------+ | 8 | +3.9% | -1% | +-------+------------+------------+ | 16 | +14.7% | +5.7% | +-------+------------+------------+ | 64 | +18.1% | +13.2% | +-------+------------+------------+ | 256 | +16.3% | +27.4% | +-------+------------+------------+ | 1024 | +78.2% | +17.4% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/156113005 git-svn-id: http://skia.googlecode.com/svn/trunk@13438 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 31 - Xfermode: xfer16Gravatar commit-bot@chromium.org2013-11-08
| | | | | | | | | | | | | | | | | | Xfermode: xfer16 This adds support for 16bit Xfermodes. It also tunes the gcc test macros in xfer32() to add compatibility for gcc > 4. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com, reed@google.com Author: kevin.petit.arm@gmail.com Review URL: https://codereview.chromium.org/33063002 git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 24 - S32_D565_Blend_Dither slight speedup/bugfixGravatar commit-bot@chromium.org2013-09-26
| | | | | | | | | | | | | | | | | | | | BlitRow565: S32_D565_Blend_Dither, slight speedup + bugfix This patch adds a rewrite of S32_D565_Blend_Dither in intrinsics. The newer version is faster (10-20% depending on the value of count) and also supports ARGB as well as ABGR. It also adds the missing assert at the beginning of the function. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22566002 git-svn-id: http://skia.googlecode.com/svn/trunk@11473 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 21 - new NEON S32_D565_OpaqueGravatar commit-bot@chromium.org2013-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: NEON version of S32_D565_Opaque Here's a new implementation of S32_D565_Opaque in NEON. It improves dramatically the speed compared to S32A_D565_Opaque. Here are the benchmark results (speedup vs. existing NEON): +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | +130% | +139% | +-------+-----------+------------+ | 2 | +65,2% | +51% | +-------+-----------+------------+ | 4 | -25,5% | +10,2% | +-------+-----------+------------+ | 8 | +63,8% | +32,1% | +-------+-----------+------------+ | 16 | +110% | +49,2% | +-------+-----------+------------+ | 64 | +153% | +123,5% | +-------+-----------+------------+ | 256 | +151% | +144,7% | +-------+-----------+------------+ | 1024 | +272% | +157,2% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22351006 git-svn-id: http://skia.googlecode.com/svn/trunk@11415 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 23 - S32_D565_Opaque_Dither cleanup/bugfix/speedGravatar commit-bot@chromium.org2013-09-18
| | | | | | | | | | | | | | | | | | | | | | | BlitRow565: S32_D565_Opaque_Dither: cleaning / bugfix This patch brings a little code cleaning (spaces/comments) and a little speed improvement (by using post-incrementation in the asm) but more importantly it fixes a bug on Linux. The new code now supports ARGB as well as ABGR. I removed the comment as I have confirmed with benchmarks that this code bring a *massive* (3x-7x) speedup compared to the C code. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG= R=djsollen@google.com, mtklein@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/22269003 git-svn-id: http://skia.googlecode.com/svn/trunk@11339 2bbb7eff-a529-9590-31e7-b0007b416f81
* Cleanup the ARM blitrow optimizationsGravatar djsollen@google.com2013-08-09
| | | | | | | | R=mtklein@google.com Review URL: https://codereview.chromium.org/22229002 git-svn-id: http://skia.googlecode.com/svn/trunk@10652 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 14 - S32A_BlendGravatar commit-bot@chromium.org2013-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32A_Blend new NEON version Adding a NEON version of S32A_Blend_BlitRow32. Here are the benchmark results: +-------+--------------------------+--------------------------+ | | Speedup vs. C | Speedup vs. ARM asm | | count +------------+-------------+------------+-------------+ | | Cortex A-9 | Cortex A-15 | Cortex A-9 | Cortex A-15 | +-------+------------+-------------+------------+-------------+ | 1 | +8,5% | +18,5% | +0.9% | +2,9% | +-------+------------+-------------+------------+-------------+ | 2 | +65,6% | +94% | +70,3% | +80% | +-------+------------+-------------+------------+-------------+ | 4 | +42,4% | +87,8% | +56,8% | +84,4% | +-------+------------+-------------+------------+-------------+ | 8 | +30% | +90% | +49,9% | +82,7% | +-------+------------+-------------+------------+-------------+ | 16 | +23,1% | +95,4% | +46,6% | +87,6% | +-------+------------+-------------+------------+-------------+ | 64 | +23,1% | +95,7% | +46,1% | +89,4% | +-------+------------+-------------+------------+-------------+ | 256 | +35,5% | +122% | +53,6% | +99,2% | +-------+------------+-------------+------------+-------------+ | 1024 | +61,8% | +101% | +64,2% | +91,2% | +-------+------------+-------------+------------+-------------+ BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18614010 git-svn-id: http://skia.googlecode.com/svn/trunk@10480 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 01 - Simple fixesGravatar commit-bot@chromium.org2013-07-15
| | | | | | | | | | | | | | | | | | | | | | | | This series contains a few fairly non-controversial fixes. Misc: remove dead references to neon 4444 functions Misc: avoid the double _neon_neon suffix in the clamp matrix functions. MAKENAME already adds the _neon suffix Misc: a few stupid / obvious fixes BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18666004 git-svn-id: http://skia.googlecode.com/svn/trunk@10072 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 13 - S32A_OpaqueGravatar commit-bot@chromium.org2013-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blitrow32: S32A_Opaque code cleaning and speed improvement - the old way of calculating alpha doesn't seem to be used anymore, so remove the remaining code - adding prefetching allows to improve performance greatly in some cases at the expense of a little trade-off: +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1,2 | 0 | 0 | +-------+-----------+------------+ | 4 | 0 | -3% | +-------+-----------+------------+ | 8 | 0 | -4% | +-------+-----------+------------+ | 16 | 0 | -5% | +-------+-----------+------------+ | 64 | +14% | 0 | +-------+-----------+------------+ | 256 | +14% | +12% | +-------+-----------+------------+ | 1024 | +115% | +15% | +-------+-----------+------------+ BUG= R=djsollen@google.com Author: kevin.petit.arm@gmail.com Review URL: https://chromiumcodereview.appspot.com/18459008 git-svn-id: http://skia.googlecode.com/svn/trunk@10026 2bbb7eff-a529-9590-31e7-b0007b416f81
* Partial reapply of r5364 minus the non-neon code path.Gravatar djsollen@google.com2013-04-09
| | | | | | | | See https://codereview.appspot.com/6465075 for a more detailed description of the contents of this CL. Review URL: https://codereview.chromium.org/13060004 git-svn-id: http://skia.googlecode.com/svn/trunk@8579 2bbb7eff-a529-9590-31e7-b0007b416f81