| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
created new file src/core/SkColorData.h for
internal consumption. Note that many of the
functions there are unused as well.
Bug: skia: 6898
R: reed@google.com
Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d
Reviewed-on: https://skia-review.googlesource.com/46848
Commit-Queue: Cary Clark <caryclark@google.com>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems that MSVC + __vectorcall don't play well together,
so back ourselves out into a situation where we don't need it.
- Inline transfermode functions. This removes the need for SK_VECTORCALL.
- Remove 565 destination specializations.
Blending into 565 is not speed-critical enough to merit the code bloat.
- Removing 565 specializations means a bunch of Sk4px code is now dead.
8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate.
565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate.
[1] the 565->8888 conversion is actually being autovectorized
BUG=skia:4765,skia:4776
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1565223002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1565223002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mm_mulhi_epu16 makes the (...*257)>>16 part simple.
This seems to speed up every transfermode that uses div255(),
in the 7-25% range.
It even appears to obviate the need for approxMulDiv255() on SSE.
I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now.
Should be no pixels change:
https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1452903004
|
|
|
|
|
|
| |
BUG=skia:4117
Review URL: https://codereview.chromium.org/1312283004
|
|
|
|
|
|
|
|
|
|
| |
There's a crasher (null dereference it looks like) in Clank with an impossible stack, but with blit mask on top (apparently sk_free() calls blit mask...). Can't hurt to check that all these Sk4px-using methods are iterating over non-null arrays.
(This is that bug I mentioned that got me thinking about blit mask in the first place.)
BUG=520354
Review URL: https://codereview.chromium.org/1295443002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:1 of https://codereview.chromium.org/1286093004/ )
Reason for revert:
Maybe causing test / gold problems?
Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f
TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1284333002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.
To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.
BUG=skia:4117
[1] Here is what those errors looked like:
In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors
Review URL: https://codereview.chromium.org/1286093004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups).
I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results.
NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range.
The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version.
Somewhat fantastically, I see no pixel diffs on GMs or SKPs.
I will be following up with more CLs for the other procs called by SkBlitMask.
BUG=skia:
Review URL: https://codereview.chromium.org/1278253003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
Clearly, we can optimize these loads and stores. That's a TODO.
The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
This will cause minor GM diffs, but I don't think any layout test changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066
Review URL: https://codereview.chromium.org/1245673002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1245673002/)
Reason for revert:
NEON 565 gold images have gone ugly. This is what I get for writing and testing SSE and just writing NEON.
E.g. colortype_xfermodes, dstreadshuffle, bigbitmaprect, pictures, textbloblooper, aaxfermodes (only Plus)
Original issue's description:
> 565 support for SIMD xfermodes
>
> This uses the most basic approach possible:
> - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
> - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
>
> Clearly, we can optimize these loads and stores. That's a TODO.
>
> The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
>
> The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
>
> This will cause minor GM diffs, but I don't think any layout test changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
>
> Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066
TBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1248893004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
Clearly, we can optimize these loads and stores. That's a TODO.
The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
This will cause minor GM diffs, but I don't think any layout test changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
Review URL: https://codereview.chromium.org/1245673002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1245673002/)
Reason for revert:
942930d (included in this roll) introduced a 140 kB sizes regression in
libskia.so. Please investigate and reland if this regression is necessary.
Original issue's description:
> 565 support for SIMD xfermodes
>
> This uses the most basic approach possible:
> - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
> - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
>
> Clearly, we can optimize these loads and stores. That's a TODO.
>
> The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
>
> The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
>
> This will cause minor GM diffs, but I don't think any layout test changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
TBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1242973004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
Clearly, we can optimize these loads and stores. That's a TODO.
The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
This will cause minor GM diffs, but I don't think any layout test changes.
BUG=skia:
Review URL: https://codereview.chromium.org/1245673002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.
This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().
NEON speeds up by a more modest ~3%.
BUG=skia:
Review URL: https://codereview.chromium.org/1230663005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1196713004/)
Reason for revert:
64-bit ARM build failures.
Original issue's description:
> Implement four more xfermodes with Sk4px.
>
> HardLight, Overlay, Darken, and Lighten are all
> ~2x faster with SSE, ~25% faster with NEON.
>
> This covers all previously-implemented NEON xfermodes.
> 3 previous SSE xfermodes remain. Those need division
> and sqrt, so I'm planning on using SkPMFloat for them.
> It'll help the readability and NEON speed if I move that
> into [0,1] space first.
>
> The main new concept here is c.thenElse(t,e), which behaves like
> (c ? t : e) except, of course, both t and e are evaluated. This allows
> us to emulate conditionals with vectors.
>
> This also removes the concept of SkNb. Instead of a standalone bool
> vector, each SkNi or SkNf will just return their own types for
> comparisons. Turns out to be a lot more manageable this way.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1205703008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.
- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.
BUG=skia:
Review URL: https://codereview.chromium.org/1202013002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different
SIMD flags, that could cause different definitions of the same method.
Normally that's moot, because all the code inlines, but in Debug it tends not
to. So in Debug, the linker picks one definition for us. That breaks _someone_.
Wrapping everything in a namespace {} keeps the definitions separate.
Tested locally, it fixes this bug.
BUG=skia:3861
This code is not yet enabled in Chrome, so shouldn't affect the roll.
NOTREECHECKS=true
Review URL: https://codereview.chromium.org/1154523004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds and uses fastMulDiv255Round() where possible,
which approximates x*y/255 as (x*y+x)/256. Seems like a sizeable
speedup, as seen below on Exclusion, Screen, and Modulate. The
existing NEON code uses this approximation for
{Src,Dst}x{In,Out,Over}, and without it we'd regress speed there.
This will require rebaselines whether or not we use this
approximation: the x86 bots change if we do, the ARM bots change
if we don't. None of the diffs are significant.
Desktop:
Xfermode_Screen_aa 5.82ms -> 5.54ms 0.95x
Xfermode_Modulate_aa 5.67ms -> 5.36ms 0.95x
Xfermode_Exclusion_aa 6.18ms -> 5.81ms 0.94x
Xfermode_Exclusion 5.03ms -> 4.24ms 0.84x
Xfermode_Screen 4.51ms -> 3.59ms 0.8x
Xfermode_Modulate 4.2ms -> 3.19ms 0.76x
Xfermode_DstOver 6.73ms -> 3.88ms 0.58x
Xfermode_SrcOut 6.47ms -> 3.48ms 0.54x
Xfermode_SrcIn 6.46ms -> 3.46ms 0.54x
Xfermode_DstOut 6.49ms -> 3.41ms 0.52x
Xfermode_DstIn 6.5ms -> 3.32ms 0.51x
Xfermode_Src_aa 9.53ms -> 4.75ms 0.5x
Xfermode_Clear_aa 9.65ms -> 4.8ms 0.5x
Xfermode_DstIn_aa 11.5ms -> 5.57ms 0.49x
Xfermode_DstOver_aa 11.6ms -> 5.63ms 0.49x
Xfermode_SrcOut_aa 11.6ms -> 5.5ms 0.47x
Xfermode_SrcIn_aa 11.7ms -> 5.51ms 0.47x
Xfermode_DstOut_aa 11.7ms -> 5.4ms 0.46x
N7 performance is close enough to 1x that I'm not sure whether
this is a net win, net loss, or truly neutral. I figure the bots will
show that.
I experimented with another approximation,
(x*(255-y))/255 ≈ (x*(256-y))/256. This was inconclusive, so I'm
leaving it out for now.
The remaining modes are the complicated conditional ones.
BUG=skia:
Review URL: https://codereview.chromium.org/1141953004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will cause minor (off-by-one) diffs due to a little lost precision:
colortype_xfermodes
mixed_xfermodes
xfermodes2
xfermodeimagefilter
xfermodes3
xfermodes
Desktop:
Xfermode_Difference_aa 9.77ms -> 7.32ms 0.75x
Xfermode_Exclusion_aa 8.49ms -> 6.21ms 0.73x
Xfermode_Difference 17ms -> 7.54ms 0.44x
Xfermode_Exclusion 13.5ms -> 5.09ms 0.38x
N7:
Xfermode_Difference_aa 32.2ms -> 27.6ms 0.86x
Xfermode_Difference 43.9ms -> 32ms 0.73x
Xfermode_Exclusion_aa 40.5ms -> 26.7ms 0.66x
Xfermode_Exclusion 71.5ms -> 23.9ms 0.33x
This wraps up the xfermodes implemented in Sk4f.
BUG=skia:
Review URL: https://codereview.chromium.org/1141213002
|
|
|
|
|
|
|
|
|
|
| |
SSE runs 2-3x faster (than 4f), NEON runs 1.2-1.4x faster (than existing NEON).
Small diffs on {aarectmodes, imagefilters_xfermodes, hairmodes, mixed_xfermodes} only on AA edges due to precision drop.
BUG=skia:
Review URL: https://codereview.chromium.org/1132853005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
alphas() extracts the 4 alphas from an existing Sk4px as another Sk4px.
LoadNAlphas() constructs an Sk4px from N packed alphas.
In both cases, we end up with 4x repeated alphas aligned with their pixels.
alphas()
A0 R0 G0 B0 A1 R1 G1 B1 A2 R2 G2 B2 A3 R3 G3 B3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load4Alphas()
A0 A1 A2 A3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load2Alphas()
A0 A1
->
A0 A0 A0 A0 A1 A1 A1 A1 0 0 0 0 0 0 0 0
This is a 5-10% speedup for AA on Intel, and wash on ARM.
AA is still mostly dominated by the final lerp.
alphas() isn't used yet, but it's similar enough to Load[24]Alphas()
that it was easier to write all at once.
BUG=skia:
Review URL: https://codereview.chromium.org/1138333003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode_Plus runs 4-5x faster.
We expect mixed_xfermodes to have a small diff. This is because kFoldCoverageIntoSrcAlpha was incorrectly set to true.
This implementation handily beats the Sk4f impl, the portable impl, and the existing SSE2 impl. Reading the SkXfermodes_opts_SSE2.cpp file, I'm pretty confident that we'll be able to beat all SSE2 impls.
I believe this impl will beat or match the existing NEON impl too, but that may not be true for more complicated xfermodes. They can take advantage of transposing ARGBARGB... to AAAARRRR.... cheaply and I haven't figured out an abstraction for that yet that doesn't screw SSE.
Adds:
- MapDstSrc() to Sk4px
- saturatedAdd() to SkNi (only implemented as far as it's used).
- div255Narrow()
BUG=skia:
Review URL: https://codereview.chromium.org/1138893002
|
|
Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change
BUG=skia:
Review URL: https://codereview.chromium.org/1132273004
|