aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkPMFloat_neon.h
Commit message (Collapse)AuthorAge
* SkPMFloat: avoid loads and stores where possible.Gravatar mtklein2015-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | A store/load pair like this is a redundant no-op: store simd_register_a, memory_address load memory_address, simd_register_a Everyone seems to be good at removing those when using SSE, but GCC and Clang are pretty terrible at this for NEON. We end up issuing both redundant commands, usually to and from the stack. That's slow. Let's not do that. This CL unions in the native SIMD register type into SkPMFloat, so that we can assign to and from it directly, which is generating a lot better NEON code. On my Nexus 5, the benchmarks improve from 36ns to 23ns. SSE is just as fast either way, but I paralleled the NEON code for consistency. It's a little terser. And because it needed the platform headers anyway, I moved all includes into SkPMFloat.h, again only for consistency. I'd union in Sk4f too to make its conversion methods a little clearer, but MSVC won't let me (it has a copy constructor... they're apparently not up to speed with C++11 unrestricted unions). BUG=skia: Review URL: https://codereview.chromium.org/1015083004
* SKPMFloat: we can beat the naive loops when clampingGravatar mtklein2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clamping 4 at a time is now about 15% faster than 1 at a time with SSSE3. Clamping 4 at a time is now about 20% faster with SSE2, and this applies to non-clamping too (we still just clamp there). In all cases, 4 at a time is never worse than 1 at a time, and not clamping is never slower than clamping. Here's all the bench results, with the numbers for portable code as a fun point of reference: SSSE3: maxrss loops min median mean max stddev samples config bench 10M 2291 4.66ns 4.66ns 4.66ns 4.68ns 0% ▆█▁▁▁▇▁▇▁▃ nonrendering SkPMFloat_get_1x 10M 2040 5.29ns 5.3ns 5.3ns 5.32ns 0% ▃▆▃▃▁▁▆▃▃█ nonrendering SkPMFloat_clamp_1x 10M 7175 4.62ns 4.62ns 4.62ns 4.63ns 0% ▁▄▃████▃▄▇ nonrendering SkPMFloat_get_4x 10M 5801 4.89ns 4.89ns 4.89ns 4.91ns 0% █▂▄▃▁▃▄█▁▁ nonrendering SkPMFloat_clamp_4x SSE2: maxrss loops min median mean max stddev samples config bench 10M 1601 6.02ns 6.05ns 6.04ns 6.08ns 0% █▅▄▅▄▂▁▂▂▂ nonrendering SkPMFloat_get_1x 10M 2918 6.05ns 6.06ns 6.05ns 6.06ns 0% ▂▇▁▇▇▁▇█▇▂ nonrendering SkPMFloat_clamp_1x 10M 3569 5.43ns 5.45ns 5.44ns 5.45ns 0% ▄█▂██▇▁▁▇▇ nonrendering SkPMFloat_get_4x 10M 4168 5.43ns 5.43ns 5.43ns 5.44ns 0% █▄▇▁▇▄▁▁▁▁ nonrendering SkPMFloat_clamp_4x Portable: maxrss loops min median mean max stddev samples config bench 10M 500 27.8ns 28.1ns 28ns 28.2ns 0% ▃█▆▃▇▃▆▁▇▂ nonrendering SkPMFloat_get_1x 10M 770 40.1ns 40.2ns 40.2ns 40.3ns 0% ▅▁▃▂▆▄█▂▅▂ nonrendering SkPMFloat_clamp_1x 10M 1269 28.4ns 28.8ns 29.1ns 32.7ns 4% ▂▂▂█▂▁▁▂▁▁ nonrendering SkPMFloat_get_4x 10M 1439 40.2ns 40.4ns 40.4ns 40.5ns 0% ▆▆▆█▁▆▅█▅▆ nonrendering SkPMFloat_clamp_4x SkPMFloat_neon.h is still one big TODO as far as 4-at-a-time APIs go. BUG=skia: Review URL: https://codereview.chromium.org/982123002
* Update SkPMFloat API a bit.Gravatar mtklein2015-03-04
| | | | | | | | | | | Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor). Replace setA(), setR(), etc. with a 4 float constructor. And, promise to stick to SkPMColor order. BUG=skia: Review URL: https://codereview.chromium.org/977773002
* Test and fix SkPMFloat rounding.Gravatar mtklein2015-03-03
| | | | | | | | | | | SSE rounds for free (that was a happy accident: they also have a truncating version). NEON does not, nor obviously the portable code, so they add 0.5 before truncating. NOPRESUBMIT=true BUG=skia: Review URL: https://codereview.chromium.org/974643002
* Make SkPMFloats store floats in [0,255] instead of [0,1].Gravatar mtklein2015-03-03
| | | | | | | | | | | | | This pushes the cost of the *255 and *1/255 conversions onto only those code paths that need it. We're not doing it any more efficiently than can be done with Sk4f. In microbenchmark isolation, this is about a 15% speedup. BUG=skia: NOPRESUBMIT=true Review URL: https://codereview.chromium.org/973603002
* Spin off some fixes to land right away.Gravatar mtklein2015-02-26
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/960023002
* Sketch SkPMFloatGravatar mtklein2015-02-23
| | | | | | | | | | | | BUG=skia: Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon-Trybot http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Review URL: https://codereview.chromium.org/936633002
* Revert of Sketch SkPMFloat (patchset #15 id:270001 of ↵Gravatar mtklein2015-02-23
| | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/936633002/) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Original issue's description: > Sketch SkPMFloat > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f TBR=reed@google.com,msarrett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/952453004
* Sketch SkPMFloatGravatar mtklein2015-02-23
BUG=skia: Review URL: https://codereview.chromium.org/936633002