| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate.
In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684
In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338.
This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off.
This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
[mtklein adds...]
No public API changes.
TBR=reed@google.com
Review-Url: https://codereview.chromium.org/2097883002
|
|
|
|
|
|
|
|
|
|
| |
This should be a pixel-for-pixel (i.e. bug-for-bug) port.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1820313002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This piece of code is already 64-bit only, so we don't need to think about ARMv7.
Hopefully this shuts up the warnings. They were harmless.
If this doesn't work (it's relatively new modifier, so maybe some compilers barf), an alternative is to cast count to a size_t.
BUG=skia:4686
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1527123003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1527123003
|
|
|
|
|
|
|
|
|
|
| |
This was a pre-SkOpts attempt that we can bring under its wing now.
This should be a perf no-op, deo volente.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1314863006
|
|
|
|
|
|
| |
DOCS_PREVIEW= https://skia.org/?cl=1316233002
Review URL: https://codereview.chromium.org/1316233002
|
|
|
|
|
|
| |
BUG=skia:4117
Review URL: https://codereview.chromium.org/1273203002
|
|
|
|
|
|
|
|
| |
As I begin to wade in here, it's nice to remove as much code as possible.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1277953002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.
- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.
BUG=skia:
Review URL: https://codereview.chromium.org/1202013002
|
|
|
|
|
|
|
|
|
|
| |
This is a spiritual revert of http://crrev.com/1104183004.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004
Review URL: https://codereview.chromium.org/1145283003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1145283003/)
Reason for revert:
http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ninja/builds/70016/steps/compile%20%28with%20patch%29/logs/stdio
Original issue's description:
> Re-proc SkBlitRow::Color32 for ARM.
>
> This is a spiritual revert of http://crrev.com/1104183004.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1157633003
|
|
|
|
|
|
|
|
| |
This is a spiritual revert of http://crrev.com/1104183004.
BUG=skia:
Review URL: https://codereview.chromium.org/1145283003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.
Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.
Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1104183004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1104183004/)
Reason for revert:
duh
Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
>
> Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1102363006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.
Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.
Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
Review URL: https://codereview.chromium.org/1104183004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1104183004/)
Reason for revert:
MIPS
Original issue's description:
> De-proc Color32
>
> Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
> which is no longer needed.
>
> Seems handy to have SkTypes include the relevant intrinsics when
> we know we've got them, but I'm not married to it.
>
> Locally this looks like a pointlessly small perf win, but I'm mostly
> keen to get all the code together.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1108163002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also strips SK_SUPPORT_LEGACY_COLOR32_MATH,
which is no longer needed.
Seems handy to have SkTypes include the relevant intrinsics when
we know we've got them, but I'm not married to it.
Locally this looks like a pointlessly small perf win, but I'm mostly
keen to get all the code together.
BUG=skia:
Review URL: https://codereview.chromium.org/1104183004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1098913002/)
Reason for revert:
Xfermode_SrcOver not looking encouraging. Up to 50% regressions.
https://perf.skia.org/#3242
Original issue's description:
> Convert Color32 code to perfect blend.
>
> Before we commit to blend_256_round_alt, let's make sure blend_perfect is
> really slower in practice (i.e. regresses on perf.skia.org).
>
> blend_perfect is really the most desirable algorithm if we can afford it. Not
> only is it correct, but it's easy to think about and break into correct pieces:
> for instance, its div255() doesn't require any coordination with the multiply.
>
> This looks like a 30% hit according to microbenches. That said, microbenches
> said my previous change would be a 20-25% perf improvement, but it didn't end
> up showing a significant effect at a high level.
>
> As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
> (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
> bunch of overdraw.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2
TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1083923006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before we commit to blend_256_round_alt, let's make sure blend_perfect is
really slower in practice (i.e. regresses on perf.skia.org).
blend_perfect is really the most desirable algorithm if we can afford it. Not
only is it correct, but it's easy to think about and break into correct pieces:
for instance, its div255() doesn't require any coordination with the multiply.
This looks like a 30% hit according to microbenches. That said, microbenches
said my previous change would be a 20-25% perf improvement, but it didn't end
up showing a significant effect at a high level.
As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt
(exactly what we'd expect), and one off-by-3 in a GM that looks like it has a
bunch of overdraw.
BUG=skia:
Review URL: https://codereview.chromium.org/1098913002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges.
If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`.
This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard.
I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as:
- multiply 8-bits and 8-bits producing 16-bits
- add 16-bits to 16-bits, returning the top 8 bits.
All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two.
We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction.
(I am probably missing several more related bugs here.)
BUG=skia:3738,skia:420,chromium:111470
Review URL: https://codereview.chromium.org/1092433002
|
|
|
|
|
|
|
|
|
|
|
| |
I don't see any color-order handling logic in the 32-bit code.
BUG=skia:1843
CQ_EXCLUDE_TRYBOTS=client.skia.compile:Build-Win-MSVC-x86-Debug-Trybot,Build-Win-MSVC-x86_64-Debug-Trybot
R=mtklein@google.com
Review URL: https://codereview.chromium.org/1051683003
|
|
|
|
|
|
|
|
|
|
| |
Shouldn't call Fast Blur path(DoubleRowBoxBlur_NEON)
when kernelsize is 1. Or, uint16x8_t resultPixels will be overflow.
BUG=skia:2845
R=senorblanco@chromium.org
Review URL: https://codereview.chromium.org/587543003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This includes blend32_16_row neon implementation
for aarch32 and aarch64.
For performance,
blend32_16_row is called in following tests in nanobench.
- Xfermode_SrcOver
- tablebench
- rotated_rects_bw_alternating_transparent_and_opaque_srcover
- rotated_rects_bw_changing_transparent_srcover
- rotated_rects_bw_same_transparent_srcover
- luma_colorfilter_large
- luma_colorfilter_small
- chart_bw
I can see perf increase in following two tests, especially. For others, looks
similar.
For each, I tried to run two times.
1) Xfermode_SrcOver
<org>
- D/skia ( 2000): 3M 57 17.3µs 17.4µs 17.4µs 17.7µs 1%
█▃▂▃▂▂▂▁▃▂ 565 Xfermode_SrcOver
- D/skia ( 1915): 3M 70 13.5µs 16.9µs 16.7µs 18.8µs 9%
▆█▄▅█▁▅▅▆▄ 565 Xfermode_SrcOver
<new>
- D/skia ( 2000): 3M 8 11.6µs 11.8µs 12.1µs 14.4µs 7%
▃█▁▁▂▁▁▁▂▂ 565 Xfermode_SrcOver
- D/skia ( 2004): 3M 62 10.3µs 12.9µs 13µs 15.2µs 11%
█▅▅▆▁▅▅▅▇▃ 565 Xfermode_SrcOver
2)
luma_colorfilter_large
<org>
- D/skia ( 2000): 159M 8 136µs 136µs 136µs 139µs 1%
█▃▁▂▁▁▁▁▁▁ 565 luma_colorfilter_large
- D/skia ( 1915): 158M 2 135µs 177µs 182µs 269µs 22%
▆▃█▁▁▃▃▃▃▃ 565 luma_colorfilter_large
<new>
- D/skia ( 2000): 157M 5 84.2µs 85.3µs 87.5µs 110µs 9%
█▁▂▁▁▁▁▁▁▁ 565 luma_colorfilter_large
- D/skia ( 2004): 159M 6 84.7µs 110µs 112µs 144µs 18%
█▄▇▁▁▄▃▄▄▆ 565 luma_colorfilter_large
Review URL: https://codereview.chromium.org/847363002
|
|
|
|
|
|
| |
BUG=skia:3302
Review URL: https://codereview.chromium.org/847443003
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:2797
Committed: https://skia.googlesource.com/skia/+/84cab93186fbe3e87d931fea73cb31b70ff5017b
R=mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/497823002
|
|
|
|
|
|
|
|
|
| |
BUG=skia:2797
R=mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/497823002
|
|
|
|
|
|
|
|
|
| |
BUG=skia:2845
R=mtklein@google.com
Author: reed@google.com
Review URL: https://codereview.chromium.org/498733002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. vuzpq is a gcc instruction. Replace it with the equivalent vuzp
(see http://llvm.org/PR20423)
2. .func / .endfunc only have an effect with -gstabs, which we don't
use. As it's unused and clang doesn't support it, remove
.func / .endfunc (also see http://llvm.org/20424)
BUG=chromium:124610
R=mtklein@google.com
Author: thakis@chromium.org
Review URL: https://codereview.chromium.org/461693004
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:2813
R=halcanary@google.com, djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/458453002
|
|
|
|
|
|
|
|
| |
R=halcanary@google.com, mtklein@google.com, kevin.petit@arm.com
Author: djsollen@google.com
Review URL: https://codereview.chromium.org/451633006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here are some perf results:
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.54% | -5.39% |
+-------+------------+------------+
| 2 | -0.66% | -2.08% |
+-------+------------+------------+
| 4 | -11.13% | 0.00% |
+-------+------------+------------+
| 8 | -5.79% | -1.30% |
+-------+------------+------------+
| 16 | 71.60% | 93.27% |
+-------+------------+------------+
| 64 | 30.99% | 57.35% |
+-------+------------+------------+
| 256 | 25.41% | 52.59% |
+-------+------------+------------+
| 1024 | 25.56% | 53.76% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=mtklein@google.com, djsollen@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/346843003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables all 565 blitters except S32A_D565_Opaque.
Here are some performance results:
S32_D565_Opaque:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -18.37% | -13.04% |
+-------+------------+------------+
| 2 | -9.90% | -13.78% |
+-------+------------+------------+
| 4 | -8.28% | -6.77% |
+-------+------------+------------+
| 8 | 157.63% | 78.15% |
+-------+------------+------------+
| 16 | 72.67% | 44.81% |
+-------+------------+------------+
| 64 | 76.78% | 40.89% |
+-------+------------+------------+
| 256 | 73.85% | 36.05% |
+-------+------------+------------+
| 1024 | 75.73% | 36.70% |
+-------+------------+------------+
S32_D565_Blend:
===============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -9.99% | -13.79% |
+-------+------------+------------+
| 2 | -9.17% | -6.74% |
+-------+------------+------------+
| 4 | -6.73% | -4.42% |
+-------+------------+------------+
| 8 | 163.31% | 112.82% |
+-------+------------+------------+
| 16 | 55.21% | 44.68% |
+-------+------------+------------+
| 64 | 54.09% | 41.99% |
+-------+------------+------------+
| 256 | 52.63% | 40.64% |
+-------+------------+------------+
| 1024 | 52.46% | 40.45% |
+-------+------------+------------+
S32A_D565_Blend:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -5.88% | -6.06% |
+-------+------------+------------+
| 2 | -4.74% | -0.01% |
+-------+------------+------------+
| 4 | -5.42% | -3.03% |
+-------+------------+------------+
| 8 | 78.78% | 77.96% |
+-------+------------+------------+
| 16 | 98.19% | 79.61% |
+-------+------------+------------+
| 64 | 111.56% | 72.60% |
+-------+------------+------------+
| 256 | 113.80% | 69.96% |
+-------+------------+------------+
| 1024 | 114.42% | 70.85% |
+-------+------------+------------+
S32_D565_Opaque_Dither:
=======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.18% | -0.93% |
+-------+------------+------------+
| 2 | -2.43% | -2.04% |
+-------+------------+------------+
| 4 | -1.09% | -1.23% |
+-------+------------+------------+
| 8 | 184.89% | 136.53% |
+-------+------------+------------+
| 16 | 128.64% | 89.11% |
+-------+------------+------------+
| 64 | 132.68% | 100.98% |
+-------+------------+------------+
| 256 | 157.02% | 100.86% |
+-------+------------+------------+
| 1024 | 163.85% | 103.62% |
+-------+------------+------------+
S32_D565_Blend_Dither:
======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.87% | 0.01% |
+-------+------------+------------+
| 2 | -2.71% | 2.97% |
+-------+------------+------------+
| 4 | -2.20% | 0.28% |
+-------+------------+------------+
| 8 | 149.76% | 146.80% |
+-------+------------+------------+
| 16 | 85.69% | 95.77% |
+-------+------------+------------+
| 64 | 88.81% | 101.39% |
+-------+------------+------------+
| 256 | 97.32% | 107.22% |
+-------+------------+------------+
| 1024 | 98.08% | 115.71% |
+-------+------------+------------+
S32A_D565_Opaque_Dither:
========================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -1.86% | 0.02% |
+-------+------------+------------+
| 2 | -0.58% | -1.52% |
+-------+------------+------------+
| 4 | -0.75% | 1.16% |
+-------+------------+------------+
| 8 | 240.74% | 155.16% |
+-------+------------+------------+
| 16 | 181.97% | 132.15% |
+-------+------------+------------+
| 64 | 203.11% | 136.48% |
+-------+------------+------------+
| 256 | 223.45% | 133.05% |
+-------+------------+------------+
| 1024 | 225.96% | 134.05% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/317193003
|
|
|
|
|
|
|
|
|
|
|
| |
That's what it means. It keeps confusing us as named today.
BUG=skia:
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/314643004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable NEON on arm64 for most 8888 blitters
This patch enables NEON optimisation for the Color32, S32_Blend,
S32A_Opaque blitters on arm64.
Here are the perf improvements vs the existing code:
Color32:
========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.39% | 23.78% |
+-------+------------+------------+
| 2 | -5.46% | 8.88% |
+-------+------------+------------+
| 4 | -4.74% | 4.89% |
+-------+------------+------------+
| 8 | 67.74% | 107.12% |
+-------+------------+------------+
| 16 | 40.03% | 101.20% |
+-------+------------+------------+
| 64 | 11.09% | 98.40% |
+-------+------------+------------+
| 256 | -2.20% | 74.81% |
+-------+------------+------------+
| 1024 | -4.28% | 78.90% |
+-------+------------+------------+
S32_Blend:
==========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | 7.84% | -6.75% |
+-------+------------+------------+
| 2 | 28.95% | 39.77% |
+-------+------------+------------+
| 4 | 5.80% | 8.26% |
+-------+------------+------------+
| 8 | 1.35% | 33.80% |
+-------+------------+------------+
| 16 | -2.13% | 41.13% |
+-------+------------+------------+
| 64 | -4.91% | 42.84% |
+-------+------------+------------+
| 256 | -6.53% | 48.72% |
+-------+------------+------------+
| 1024 | -6.65% | 46.66% |
+-------+------------+------------+
S32A_Opaque:
============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -7.51% | -19.06% |
+-------+------------+------------+
| 2 | -5.02% | -27.70% |
+-------+------------+------------+
| 4 | 15.38% | -21.66% |
+-------+------------+------------+
| 8 | -0.98% | 1.05% |
+-------+------------+------------+
| 16 | -7.35% | 3.34% |
+-------+------------+------------+
| 64 | 50.53% | 94.63% |
+-------+------------+------------+
| 256 | 71.17% | 164.10% |
+-------+------------+------------+
| 1024 | 79.58% | 197.60% |
+-------+------------+------------+
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/302283003
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/263553008
git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert Color32 to intrinsics
This change is performance-neutral for high values of count and is
a big improvement for values smaller than 64.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com, borenet@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/258173005
git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: new NEON version of S32_D565_Blend
This new implementation brings a good speedup in most cases and
gives exact results (removes one mismatch in gm).
Here are the benchmark results (speedup vs. existing S32A_D565_Blend):
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | -26,7% | -27,5% |
+-------+-----------+------------+
| 2 | 0% | +53% |
+-------+-----------+------------+
| 4 | +38,3% | +26,5% |
+-------+-----------+------------+
| 8 | +10,9% | -4,5% |
+-------+-----------+------------+
| 16 | +18,2% | +1,6% |
+-------+-----------+------------+
| 64 | +22,3% | +8,75% |
+-------+-----------+------------+
| 256 | +12,3% | +11,2% |
+-------+-----------+------------+
| 1024 | +79,2% | +10,9% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/181523002
git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: S32A_D565_Opaque_Dither: some improvements
- Supports ARGB and ABGR
- Less magic numbers
- Reduced instruction count : 5-25% speedup
- Fixed indentation, removed some commented and useless code
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/177963003
git-svn-id: http://skia.googlecode.com/svn/trunk@13577 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitrow32: S32_Blend fix and little speed improvement
- the results are now exactly similar as the C code
- the speed has improved, especially for small values of count
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +30% | +18% |
+-------+-----------+------------+
| 2 | 0 | 0 |
+-------+-----------+------------+
| 4 | - <1% | +14% |
+-------+-----------+------------+
| > 4 | -0.5..+5% | -0.5..+4% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
Committed: http://code.google.com/p/skia/source/detail?r=13532
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/158973002
git-svn-id: http://skia.googlecode.com/svn/trunk@13543 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/158973002/)
Reason for revert:
Breaking the build.
See http://108.170.219.164:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/builds/2966 (and others).
We are getting warnings that vsrc and vdst may be uninitialized. Please fix and resubmit.
Original issue's description:
> ARM Skia NEON patches - 12 - S32_Blend
>
> Blitrow32: S32_Blend fix and little speed improvement
>
> - the results are now exactly similar as the C code
> - the speed has improved, especially for small values of count
>
> +-------+-----------+------------+
> | count | Cortex-A9 | Cortex-A15 |
> +-------+-----------+------------+
> | 1 | +30% | +18% |
> +-------+-----------+------------+
> | 2 | 0 | 0 |
> +-------+-----------+------------+
> | 4 | - <1% | +14% |
> +-------+-----------+------------+
> | > 4 | -0.5..+5% | -0.5..+4% |
> +-------+-----------+------------+
>
> Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
>
> BUG=skia:
>
> Committed: http://code.google.com/p/skia/source/detail?r=13532
R=djsollen@google.com, mtklein@google.com, kevin.petit@arm.com
TBR=djsollen@google.com, kevin.petit@arm.com, mtklein@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: scroggo@google.com
Review URL: https://codereview.chromium.org/175433002
git-svn-id: http://skia.googlecode.com/svn/trunk@13534 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitrow32: S32_Blend fix and little speed improvement
- the results are now exactly similar as the C code
- the speed has improved, especially for small values of count
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +30% | +18% |
+-------+-----------+------------+
| 2 | 0 | 0 |
+-------+-----------+------------+
| 4 | - <1% | +14% |
+-------+-----------+------------+
| > 4 | -0.5..+5% | -0.5..+4% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/158973002
git-svn-id: http://skia.googlecode.com/svn/trunk@13532 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: new intrinsics version of S32A_D565_Blend
This new version is basically a rewrite of the existing code with
a few speed and accuracy improvements. There is a switch to enable
pixel perfect results at the cost of a (quite big) decrease of
performances (disabled in this patch).
Here are the benchmark results (speedup vs. existing code):
+-------+------------+------------+
| count | Cortex -A9 | Cortex-A15 |
+-------+------------+------------+
| 1 | +103.6% | +12% |
+-------+------------+------------+
| 2 | +3.6% | +21.6% |
+-------+------------+------------+
| 4 | +0.8% | -0.8% |
+-------+------------+------------+
| 8 | +3.9% | -1% |
+-------+------------+------------+
| 16 | +14.7% | +5.7% |
+-------+------------+------------+
| 64 | +18.1% | +13.2% |
+-------+------------+------------+
| 256 | +16.3% | +27.4% |
+-------+------------+------------+
| 1024 | +78.2% | +17.4% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com, halcanary@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/156113005
git-svn-id: http://skia.googlecode.com/svn/trunk@13438 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: xfer16
This adds support for 16bit Xfermodes. It also tunes the gcc test
macros in xfer32() to add compatibility for gcc > 4.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/33063002
git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: S32_D565_Blend_Dither, slight speedup + bugfix
This patch adds a rewrite of S32_D565_Blend_Dither in intrinsics.
The newer version is faster (10-20% depending on the value of count)
and also supports ARGB as well as ABGR. It also adds the missing
assert at the beginning of the function.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22566002
git-svn-id: http://skia.googlecode.com/svn/trunk@11473 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: NEON version of S32_D565_Opaque
Here's a new implementation of S32_D565_Opaque in NEON. It
improves dramatically the speed compared to S32A_D565_Opaque.
Here are the benchmark results (speedup vs. existing NEON):
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +130% | +139% |
+-------+-----------+------------+
| 2 | +65,2% | +51% |
+-------+-----------+------------+
| 4 | -25,5% | +10,2% |
+-------+-----------+------------+
| 8 | +63,8% | +32,1% |
+-------+-----------+------------+
| 16 | +110% | +49,2% |
+-------+-----------+------------+
| 64 | +153% | +123,5% |
+-------+-----------+------------+
| 256 | +151% | +144,7% |
+-------+-----------+------------+
| 1024 | +272% | +157,2% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22351006
git-svn-id: http://skia.googlecode.com/svn/trunk@11415 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: S32_D565_Opaque_Dither: cleaning / bugfix
This patch brings a little code cleaning (spaces/comments) and a little
speed improvement (by using post-incrementation in the asm) but more
importantly it fixes a bug on Linux. The new code now supports ARGB
as well as ABGR.
I removed the comment as I have confirmed with benchmarks that this
code bring a *massive* (3x-7x) speedup compared to the C code.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22269003
git-svn-id: http://skia.googlecode.com/svn/trunk@11339 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
R=mtklein@google.com
Review URL: https://codereview.chromium.org/22229002
git-svn-id: http://skia.googlecode.com/svn/trunk@10652 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitrow32: S32A_Blend new NEON version
Adding a NEON version of S32A_Blend_BlitRow32. Here are the
benchmark results:
+-------+--------------------------+--------------------------+
| | Speedup vs. C | Speedup vs. ARM asm |
| count +------------+-------------+------------+-------------+
| | Cortex A-9 | Cortex A-15 | Cortex A-9 | Cortex A-15 |
+-------+------------+-------------+------------+-------------+
| 1 | +8,5% | +18,5% | +0.9% | +2,9% |
+-------+------------+-------------+------------+-------------+
| 2 | +65,6% | +94% | +70,3% | +80% |
+-------+------------+-------------+------------+-------------+
| 4 | +42,4% | +87,8% | +56,8% | +84,4% |
+-------+------------+-------------+------------+-------------+
| 8 | +30% | +90% | +49,9% | +82,7% |
+-------+------------+-------------+------------+-------------+
| 16 | +23,1% | +95,4% | +46,6% | +87,6% |
+-------+------------+-------------+------------+-------------+
| 64 | +23,1% | +95,7% | +46,1% | +89,4% |
+-------+------------+-------------+------------+-------------+
| 256 | +35,5% | +122% | +53,6% | +99,2% |
+-------+------------+-------------+------------+-------------+
| 1024 | +61,8% | +101% | +64,2% | +91,2% |
+-------+------------+-------------+------------+-------------+
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/18614010
git-svn-id: http://skia.googlecode.com/svn/trunk@10480 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This series contains a few fairly non-controversial fixes.
Misc: remove dead references to neon 4444 functions
Misc: avoid the double _neon_neon suffix in the clamp matrix functions.
MAKENAME already adds the _neon suffix
Misc: a few stupid / obvious fixes
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/18666004
git-svn-id: http://skia.googlecode.com/svn/trunk@10072 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitrow32: S32A_Opaque code cleaning and speed improvement
- the old way of calculating alpha doesn't seem to be used anymore,
so remove the remaining code
- adding prefetching allows to improve performance greatly in some
cases at the expense of a little trade-off:
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1,2 | 0 | 0 |
+-------+-----------+------------+
| 4 | 0 | -3% |
+-------+-----------+------------+
| 8 | 0 | -4% |
+-------+-----------+------------+
| 16 | 0 | -5% |
+-------+-----------+------------+
| 64 | +14% | 0 |
+-------+-----------+------------+
| 256 | +14% | +12% |
+-------+-----------+------------+
| 1024 | +115% | +15% |
+-------+-----------+------------+
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/18459008
git-svn-id: http://skia.googlecode.com/svn/trunk@10026 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
See https://codereview.appspot.com/6465075 for a more detailed description of the contents of this CL.
Review URL: https://codereview.chromium.org/13060004
git-svn-id: http://skia.googlecode.com/svn/trunk@8579 2bbb7eff-a529-9590-31e7-b0007b416f81
|