aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/Sk4pxXfermode.h
Commit message (Collapse)AuthorAge
* SoftLight with SkPMFloatGravatar mtklein2015-06-29
| | | | | | | | | | | | | | | | | | SSE speeds up about 4.5x over existing integer SSE, NEON speeds up about 3x over serial integer code. We expect 1-2 bit component diffs in the usual GMs. Still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll now try to lift in Chrome. BUG=skia: Committed: https://skia.googlesource.com/skia/+/3e47d49b46b3ab62071218ef3dd44642c9713e04 CQ_EXTRA_TRYBOTS=client.skia:Test-ChromeOS-GCC-Daisy-CPU-NEON-Arm7-Debug-Trybot Review URL: https://codereview.chromium.org/1221493002
* Edges matter, part 2.Gravatar mtklein2015-06-29
| | | | | | | | | | | | | Affected modes: lighten, hard-light, overlay (== hard-light). This fixes a couple places where I used < when I should have used <=, or swapped the logic as I've done here. Caught by layout tests; our tests should be unchanged. https://storage.googleapis.com/chromium-layout-test-archives/linux_blink_rel/68935/layout-test-results/css3/blending/background-blend-mode-crossfade-image-gradient-diffs.html BUG=skia: Review URL: https://codereview.chromium.org/1217013003
* Revert of SoftLight with SkPMFloat (patchset #6 id:100001 of ↵Gravatar mtklein2015-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1221493002/) Reason for revert: xfermodes and xfermodes2 show major diffs on Nexus 5 and Daisy (both ARMv7 w/NEON). Nexus 9 and SSE all look fine... Original issue's description: > SoftLight with SkPMFloat > > SSE speeds up about 4.5x over existing integer SSE, > NEON speeds up about 3x over serial integer code. > > We expect 1-2 bit component diffs in the usual GMs. > > Still guarded by SK_SUPPORT_LEGACY_XFERMODES, > which I'll now try to lift in Chrome. > > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/3e47d49b46b3ab62071218ef3dd44642c9713e04 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1221683002
* SoftLight with SkPMFloatGravatar mtklein2015-06-29
| | | | | | | | | | | | | | SSE speeds up about 4.5x over existing integer SSE, NEON speeds up about 3x over serial integer code. We expect 1-2 bit component diffs in the usual GMs. Still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll now try to lift in Chrome. BUG=skia: Review URL: https://codereview.chromium.org/1221493002
* Color dodge and burn with SkPMFloat.Gravatar mtklein2015-06-26
| | | | | | | | | | | | | | | | | Both 25-35% faster with SSE. With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement. The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content. Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows. In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%. Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight. BUG=skia: Review URL: https://codereview.chromium.org/1214443002
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1196713004
* Revert of Implement four more xfermodes with Sk4px. (patchset #16 id:290001 ↵Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1196713004/) Reason for revert: 64-bit ARM build failures. Original issue's description: > Implement four more xfermodes with Sk4px. > > HardLight, Overlay, Darken, and Lighten are all > ~2x faster with SSE, ~25% faster with NEON. > > This covers all previously-implemented NEON xfermodes. > 3 previous SSE xfermodes remain. Those need division > and sqrt, so I'm planning on using SkPMFloat for them. > It'll help the readability and NEON speed if I move that > into [0,1] space first. > > The main new concept here is c.thenElse(t,e), which behaves like > (c ? t : e) except, of course, both t and e are evaluated. This allows > us to emulate conditionals with vectors. > > This also removes the concept of SkNb. Instead of a standalone bool > vector, each SkNi or SkNf will just return their own types for > comparisons. Turns out to be a lot more manageable this way. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1205703008
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Review URL: https://codereview.chromium.org/1196713004
* Update some Sk4px APIs.Gravatar mtklein2015-06-22
| | | | | | | | | | | | | | | Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002
* Simpler version of Plus w/ AA. ~25% faster too.Gravatar mtklein2015-05-22
| | | | | | | | BUG=skia:3852 TBR=fmalita@chromium.org Review URL: https://codereview.chromium.org/1150693003
* Move Sk4px Xfermode code to a header so we can use it twice.Gravatar mtklein2015-05-22
- Once in SkXfermode as usual to pick up compile-time SSE and NEON - Once in SkXfermode_arm_neon to pick up run-time NEON This allows us to start cleaning up SkXfermode_arm_neon as we've done for SkXfermode_SSE2. I'm saving this catharsis for a day when I need it. The Sk4px xfermodes are generally faster than the existing NEON procs, so this should also have the side effect of a perf win there. This means our new Plus-AA code works for runtime NEON too. BUG=skia:3852 Review URL: https://codereview.chromium.org/1150313003