aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkNx.h
Commit message (Collapse)AuthorAge
...
* Revert of Refactor to put SkXfermode_opts inside SK_OPTS_NS. (patchset #1 ↵Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:1 of https://codereview.chromium.org/1286093004/ ) Reason for revert: Maybe causing test / gold problems? Original issue's description: > Refactor to put SkXfermode_opts inside SK_OPTS_NS. > > Without this refactor I was getting warnings previously about having code > inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to > code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. > > That low-level code was in an anonymous namespace to allow multiple independent > copies of its methods to be instantiated without the linker getting confused / > offended about violating the One Definition Rule. This was only happening in > Debug mode where the methods were not being inlined. > > To fix this all, I've force-inlined the methods of the low-level code and > removed the anonymous namespace. > > BUG=skia:4117 > > > [1] Here is what those errors looked like: > > In file included from ../../../../src/core/SkOpts.cpp:18:0: > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] > class Sk4pxXfermode : public SkProcCoeffXfermode { > ^ > ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] > ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] > class SkPMFloatXfermode : public SkProcCoeffXfermode { > ^ > cc1plus: all warnings being treated as errors > > Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f TBR=djsollen@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1284333002
* Refactor to put SkXfermode_opts inside SK_OPTS_NS.Gravatar mtklein2015-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this refactor I was getting warnings previously about having code inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1]. That low-level code was in an anonymous namespace to allow multiple independent copies of its methods to be instantiated without the linker getting confused / offended about violating the One Definition Rule. This was only happening in Debug mode where the methods were not being inlined. To fix this all, I've force-inlined the methods of the low-level code and removed the anonymous namespace. BUG=skia:4117 [1] Here is what those errors looked like: In file included from ../../../../src/core/SkOpts.cpp:18:0: ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror] class Sk4pxXfermode : public SkProcCoeffXfermode { ^ ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror] ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror] class SkPMFloatXfermode : public SkProcCoeffXfermode { ^ cc1plus: all warnings being treated as errors Review URL: https://codereview.chromium.org/1286093004
* Fix typo in non-SIMD Sk4f::thenElse().Gravatar mtklein2015-07-14
| | | | | | | | | This was causing the 3 xfermodes that use floats and conditionals to draw wrong when SKNX_NO_SIMD was defined. BUG=skia:4051 Review URL: https://codereview.chromium.org/1229013003
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1196713004
* Revert of Implement four more xfermodes with Sk4px. (patchset #16 id:290001 ↵Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1196713004/) Reason for revert: 64-bit ARM build failures. Original issue's description: > Implement four more xfermodes with Sk4px. > > HardLight, Overlay, Darken, and Lighten are all > ~2x faster with SSE, ~25% faster with NEON. > > This covers all previously-implemented NEON xfermodes. > 3 previous SSE xfermodes remain. Those need division > and sqrt, so I'm planning on using SkPMFloat for them. > It'll help the readability and NEON speed if I move that > into [0,1] space first. > > The main new concept here is c.thenElse(t,e), which behaves like > (c ? t : e) except, of course, both t and e are evaluated. This allows > us to emulate conditionals with vectors. > > This also removes the concept of SkNb. Instead of a standalone bool > vector, each SkNi or SkNf will just return their own types for > comparisons. Turns out to be a lot more manageable this way. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1205703008
* Implement four more xfermodes with Sk4px.Gravatar mtklein2015-06-24
| | | | | | | | | | | | | | | | | | | | | | | HardLight, Overlay, Darken, and Lighten are all ~2x faster with SSE, ~25% faster with NEON. This covers all previously-implemented NEON xfermodes. 3 previous SSE xfermodes remain. Those need division and sqrt, so I'm planning on using SkPMFloat for them. It'll help the readability and NEON speed if I move that into [0,1] space first. The main new concept here is c.thenElse(t,e), which behaves like (c ? t : e) except, of course, both t and e are evaluated. This allows us to emulate conditionals with vectors. This also removes the concept of SkNb. Instead of a standalone bool vector, each SkNi or SkNf will just return their own types for comparisons. Turns out to be a lot more manageable this way. BUG=skia: Review URL: https://codereview.chromium.org/1196713004
* Thorough tests for saturatedAdd and mulDiv255Round.Gravatar mtklein2015-06-15
| | | | | | | | | | BUG=skia:3951 Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot Review URL: https://codereview.chromium.org/1184113003
* Remove overly-promiscuous SkNx syntax sugar.Gravatar mtklein2015-06-10
| | | | | | | | | | | | I haven't figured out a pithy way to have these apply to only classes originating from SkNx, so let's just remove them. There aren't too many use cases, and it's not really any less readable without them. Semantically, this is a no-op. BUG=skia: Review URL: https://codereview.chromium.org/1167153002
* Everyone gets a namespace {}.Gravatar mtklein2015-05-22
| | | | | | | | | | | | | | | | | | If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different SIMD flags, that could cause different definitions of the same method. Normally that's moot, because all the code inlines, but in Debug it tends not to. So in Debug, the linker picks one definition for us. That breaks _someone_. Wrapping everything in a namespace {} keeps the definitions separate. Tested locally, it fixes this bug. BUG=skia:3861 This code is not yet enabled in Chrome, so shouldn't affect the roll. NOTREECHECKS=true Review URL: https://codereview.chromium.org/1154523004
* add Min to SkNi, specialized for u8 and u16 on SSE and NEONGravatar mtklein2015-05-14
| | | | | | | | | | | 0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does. I plan to use this to implement the Difference xfermode, and it seems generally handy. BUG=skia: Review URL: https://codereview.chromium.org/1133933004
* Plus xfermode using Sk4px.Gravatar mtklein2015-05-12
| | | | | | | | | | | | | | | | | | | Xfermode_Plus runs 4-5x faster. We expect mixed_xfermodes to have a small diff. This is because kFoldCoverageIntoSrcAlpha was incorrectly set to true. This implementation handily beats the Sk4f impl, the portable impl, and the existing SSE2 impl. Reading the SkXfermodes_opts_SSE2.cpp file, I'm pretty confident that we'll be able to beat all SSE2 impls. I believe this impl will beat or match the existing NEON impl too, but that may not be true for more complicated xfermodes. They can take advantage of transposing ARGBARGB... to AAAARRRR.... cheaply and I haven't figured out an abstraction for that yet that doesn't screw SSE. Adds: - MapDstSrc() to Sk4px - saturatedAdd() to SkNi (only implemented as far as it's used). - div255Narrow() BUG=skia: Review URL: https://codereview.chromium.org/1138893002
* Sk4pxGravatar mtklein2015-05-12
| | | | | | | | | | Xfermode_SrcOver: SSE: 2.08ms -> 2.03ms (~2% faster) NEON: my N5 is noisy, but there appears to be no perf change BUG=skia: Review URL: https://codereview.chromium.org/1132273004
* Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARMGravatar mtklein2015-04-27
| | | | | | | | | | | | This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1109913002
* Revert of Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/) Reason for revert: arm64 typos Original issue's description: > Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM > > This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1105233003
* Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARMGravatar mtklein2015-04-27
| | | | | | | | This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Review URL: https://codereview.chromium.org/1109913002
* Mike's radial gradient CL with better float -> int.Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec Review URL: https://codereview.chromium.org/1109643002
* Revert of Mike's radial gradient CL with better float -> int. (patchset #7 ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:120001 of https://codereview.chromium.org/1109643002/) Reason for revert: compile failures. Original issue's description: > Mike's radial gradient CL with better float -> int. > > patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) > > This looks quite launchable. radial_gradient3, min of 100 samples: > N5: 985µs -> 946µs > MBP: 395µs -> 279µs > > On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? > > BUG=skia: > > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot > > Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1109883003
* Mike's radial gradient CL with better float -> int.Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot Review URL: https://codereview.chromium.org/1109643002
* Sk4h and Sk8h for SSEGravatar mtklein2015-04-14
| | | | | | | | | | These will underly the SkPMFloat-like class for uint16_t components. Sk4h will back a single-pixel version, and Sk8h any larger number than that. BUG=skia: Review URL: https://codereview.chromium.org/1088883005
* Rename SkNi to SkNb.Gravatar mtklein2015-04-14
| | | | | | | | | | | | | | | | As used today, SkNi is used in bool-y contexts. This keeps that, but under a new name, SkNb. This makes room for a new SkNi that's focused on integer-y things like loads, stores, arithmetic, etc. The main reason to split these is that we want different specializations for each use case: for bools, it's important for us to specialize 32- and 64-bit to support efficient float- and double- comparisons, but for integer work we're more likely to be looking at 8- and 16- bit lanes. Keeping these use cases siloed helps me manage the compexity of the backend NEON and SSE code. BUG=skia: Review URL: https://codereview.chromium.org/1083123002
* Use switch operator[](int) to kth<int>() so we can use vget_lane.Gravatar mtklein2015-04-03
| | | | | | | | | #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1059743002
* Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>Gravatar mtklein2015-03-30
| | | | | | | | | | | | | | | | | | | | | The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc. This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h. This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h. To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful. You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel: - Sk4f, Sk4s, Sk2d: feel awesome - Sk2f, Sk2s, Sk4d: feel pretty good No public API changes. TBR=reed@google.com BUG=skia:3592 Review URL: https://codereview.chromium.org/1048593002
* Guard SIMD code with !defined(SKNX_NO_SIMD).Gravatar mtklein2015-03-20
This should make it easy to compare performance of the non-SIMD Sk2x / Sk4x code with our existing portable scalar code. I'm not adding this to SkPMFloat only because we don't have an existing scalar baseline there to compare to. We'll have to keep our wits about us: I just tried your new benchmarks, and Clang's autovectorizer produced almost as good SSE as we did with intrinsics for geo_evalquadat1 and geo_evalquadtangentat1, but not for geo_chopquadat1, which went serial. BUG=skia: Review URL: https://codereview.chromium.org/1026723003