| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:40001 of https://codereview.chromium.org/1333983002/ )
Reason for revert:
Unexpected perf impact, and a whole bunch of new images in gold (mostly invisibly different).
Original issue's description:
> use new shuffle to speed up affine matrix mappts
>
> sse: 25 -> 18
> neon: 95 -> 86
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e70afc9f48d00828ee6b707899a8ff542b0e8b98
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1335003002
|
|
|
|
|
|
|
|
|
| |
sse: 25 -> 18
neon: 95 -> 86
BUG=skia:
Review URL: https://codereview.chromium.org/1333983002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary
for SkColorCubeFilter_opts.h.
Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs.
Rough perf effect:
SSSE3: 110 -> 93 (~15%)
NEON: 465 -> 375 (~20%)
This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f.
I've kept it for now so I can convert sites one by one and think about how things
that really want to keep PM color order will work.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1319413003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:1 of https://codereview.chromium.org/1286093004/ )
Reason for revert:
Maybe causing test / gold problems?
Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f
TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1284333002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.
To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.
BUG=skia:4117
[1] Here is what those errors looked like:
In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors
Review URL: https://codereview.chromium.org/1286093004
|
|
|
|
|
|
|
|
| |
Nothing seems to run any faster or slower, but it is terser.
BUG=skia:
Review URL: https://codereview.chromium.org/1255913004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.
This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().
NEON speeds up by a more modest ~3%.
BUG=skia:
Review URL: https://codereview.chromium.org/1230663005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both 25-35% faster with SSE.
With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement.
The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content.
Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows.
In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%.
Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight.
BUG=skia:
Review URL: https://codereview.chromium.org/1214443002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1196713004/)
Reason for revert:
64-bit ARM build failures.
Original issue's description:
> Implement four more xfermodes with Sk4px.
>
> HardLight, Overlay, Darken, and Lighten are all
> ~2x faster with SSE, ~25% faster with NEON.
>
> This covers all previously-implemented NEON xfermodes.
> 3 previous SSE xfermodes remain. Those need division
> and sqrt, so I'm planning on using SkPMFloat for them.
> It'll help the readability and NEON speed if I move that
> into [0,1] space first.
>
> The main new concept here is c.thenElse(t,e), which behaves like
> (c ? t : e) except, of course, both t and e are evaluated. This allows
> us to emulate conditionals with vectors.
>
> This also removes the concept of SkNb. Instead of a standalone bool
> vector, each SkNi or SkNf will just return their own types for
> comparisons. Turns out to be a lot more manageable this way.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1205703008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.
- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do.
BUG=skia:
Review URL: https://codereview.chromium.org/1202013002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different
SIMD flags, that could cause different definitions of the same method.
Normally that's moot, because all the code inlines, but in Debug it tends not
to. So in Debug, the linker picks one definition for us. That breaks _someone_.
Wrapping everything in a namespace {} keeps the definitions separate.
Tested locally, it fixes this bug.
BUG=skia:3861
This code is not yet enabled in Chrome, so shouldn't affect the roll.
NOTREECHECKS=true
Review URL: https://codereview.chromium.org/1154523004
|
|
|
|
|
|
|
|
|
|
|
| |
0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does.
I plan to use this to implement the Difference xfermode,
and it seems generally handy.
BUG=skia:
Review URL: https://codereview.chromium.org/1133933004
|
|
|
|
|
|
|
|
|
|
|
| |
For SSE, Sk4px is better than Sk4f is better than SkXfermodes_opts_SSE2 (where implemented).
For NEON, Sk4px is better than SkXfermodes_opts_arm_neon is better than Sk4f (where implemented).
This is a 1.6-1.9x speedup for Plus,Modulate, and Screen for NEON.
BUG=skia:
Review URL: https://codereview.chromium.org/1128053004
|
|
|
|
|
|
|
|
|
|
| |
Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change
BUG=skia:
Review URL: https://codereview.chromium.org/1132273004
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1109913002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/)
Reason for revert:
arm64 typos
Original issue's description:
> Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM
>
> This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1105233003
|
|
|
|
|
|
|
|
| |
This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after.
BUG=skia:
Review URL: https://codereview.chromium.org/1109913002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)
This looks quite launchable. radial_gradient3, min of 100 samples:
N5: 985µs -> 946µs
MBP: 395µs -> 279µs
On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot
Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec
Review URL: https://codereview.chromium.org/1109643002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:120001 of https://codereview.chromium.org/1109643002/)
Reason for revert:
compile failures.
Original issue's description:
> Mike's radial gradient CL with better float -> int.
>
> patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)
>
> This looks quite launchable. radial_gradient3, min of 100 samples:
> N5: 985µs -> 946µs
> MBP: 395µs -> 279µs
>
> On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?
>
> BUG=skia:
>
> CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec
TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1109883003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)
This looks quite launchable. radial_gradient3, min of 100 samples:
N5: 985µs -> 946µs
MBP: 395µs -> 279µs
On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table?
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
Review URL: https://codereview.chromium.org/1109643002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As used today, SkNi is used in bool-y contexts. This keeps that, but under a
new name, SkNb. This makes room for a new SkNi that's focused on integer-y
things like loads, stores, arithmetic, etc.
The main reason to split these is that we want different specializations for
each use case: for bools, it's important for us to specialize 32- and 64-bit to
support efficient float- and double- comparisons, but for integer work we're
more likely to be looking at 8- and 16- bit lanes. Keeping these use cases
siloed helps me manage the compexity of the backend NEON and SSE code.
BUG=skia:
Review URL: https://codereview.chromium.org/1083123002
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#floats
BUG=skia:
BUG=skia:3592
Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1061603002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:40001 of https://codereview.chromium.org/1061603002/)
Reason for revert:
missed some neon code
Original issue's description:
> Code's more readable when SkPMFloat is an Sk4f.
> #floats
>
> BUG=skia:
> BUG=skia:3592
>
> Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1056143004
|
|
|
|
|
|
|
|
|
| |
#floats
BUG=skia:
BUG=skia:3592
Review URL: https://codereview.chromium.org/1061603002
|
|
|
|
|
|
|
|
|
| |
#floats
BUG=skia:
BUG=skia:3592
Review URL: https://codereview.chromium.org/1059743002
|
|
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
|