aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkUtils_opts.h
Commit message (Collapse)AuthorAge
* make SkOpts functions inline, not staticGravatar Mike Klein2017-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | When Skia's built with an interestingly advanced instruction set baseline like SSSE3 or SSE4.1, we end up with two distinct copies of some SkOpts functions, one default in SkOpts.o and one specialization from SkOpts_{ssse3,sse41}.o. These functions are static, and so are technically unrelated, even though they're the same code compiled with the same instructions available. They're going to be identical. What we want here is to remove static but mark them as inline instead. In this case inline means "if the linker sees multiple copies of this, that's cool, just pick any one arbitrarily". That's just what we want. Now, when I disassemble a binary before and after this change, I do see the redundant routines removed. However, the file size change is minimal... I suspect that this must mean the linker has noticed that we had identical code and physically folded the two logically independent routines. I don't know how prevalent this optimization is, though, so it doesn't hurt to give it more of a "one copy please" hint with inline. There may also be a difference here between the binary size (~unchanged) and the in-memory layout of that binary? Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857 Reviewed-on: https://skia-review.googlesource.com/37501 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Ben Wagner <bungeman@google.com>
* explicitly vectorize sk_memset{16,32,64}Gravatar Mike Klein2017-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ought to help clients who don't enable autovectorization. With autovectorization enabled, this new version is like, hyper-vectorized compared to the old autovectorization. Instead of handling 128 bytes max per loop, it now handles up to 512 bytes per loop. Pretty exciting. Locally perf effects are a mix, but we'd expect this to help Chrome unambiguously if they've turned off autovectorization. $ out/ok bench:samples=100 sw filter:match=memset32_\\d\* serial Before: [memset32_100000] 16ms @0 20.1ms @99 20.2ms @100 [memset32_10000] 1.07ms @0 1.26ms @99 1.31ms @100 [memset32_1000] 73.9µs @0 89.4µs @99 90.1µs @100 [memset32_100] 8.59µs @0 9.74µs @99 9.96µs @100 [memset32_10] 7.45µs @0 8.96µs @99 8.99µs @100 [memset32_1] 2.29µs @0 2.81µs @99 2.92µs @100 After: [memset32_100000] 16.2ms @0 17.3ms @99 17.3ms @100 [memset32_10000] 1.06ms @0 1.18ms @99 1.23ms @100 [memset32_1000] 72µs @0 75.6µs @99 84.7µs @100 [memset32_100] 9.14µs @0 10.6µs @99 10.7µs @100 [memset32_10] 5.43µs @0 5.88µs @99 5.99µs @100 [memset32_1] 3.43µs @0 3.65µs @99 3.83µs @100 BUG=chromium:755391 Change-Id: If9059a30ca7a345f1f7c37bd51473c29e8bb8922 Reviewed-on: https://skia-review.googlesource.com/34746 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* header cleanupGravatar Hal Canary2017-07-05
| | | | | | | Change-Id: I3f7667a1357194ae2bdd341ad9d46eb93920f404 Reviewed-on: https://skia-review.googlesource.com/21374 Reviewed-by: Brian Salomon <bsalomon@google.com> Commit-Queue: Hal Canary <halcanary@google.com>
* move sk_memset?? to SkOptsGravatar Mike Klein2017-05-23
| | | | | | | | | This lets the compiler generate AVX versions with wider writes. Change-Id: Ia63825e70c72bdb4d14bef97d8b4ea4be54c9d84 Reviewed-on: https://skia-review.googlesource.com/17715 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* try plain-old code for sk_memset16/32 now that NEON is compile-timeGravatar mtklein2016-02-17
| | | | | | | | | | | | | Most of these implementations now just say "always inline". Let's see if we can get away with the simplicity of doing that all the time. These inlined implementations can autovectorize easily. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1639863002
* Reorganize to keep similar code together.Gravatar Mike Klein2015-08-04
This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change. BUG=skia:4117 R=djsollen@google.com Review URL: https://codereview.chromium.org/1264423002 .