| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Skia's built with an interestingly advanced instruction set
baseline like SSSE3 or SSE4.1, we end up with two distinct copies of
some SkOpts functions, one default in SkOpts.o and one specialization
from SkOpts_{ssse3,sse41}.o. These functions are static, and so are
technically unrelated, even though they're the same code compiled with
the same instructions available. They're going to be identical.
What we want here is to remove static but mark them as inline instead.
In this case inline means "if the linker sees multiple copies of this,
that's cool, just pick any one arbitrarily". That's just what we want.
Now, when I disassemble a binary before and after this change, I do see
the redundant routines removed. However, the file size change is
minimal... I suspect that this must mean the linker has noticed that we
had identical code and physically folded the two logically independent
routines. I don't know how prevalent this optimization is, though, so
it doesn't hurt to give it more of a "one copy please" hint with inline.
There may also be a difference here between the binary size (~unchanged)
and the in-memory layout of that binary?
Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857
Reviewed-on: https://skia-review.googlesource.com/37501
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ought to help clients who don't enable autovectorization.
With autovectorization enabled, this new version is like,
hyper-vectorized compared to the old autovectorization.
Instead of handling 128 bytes max per loop, it now
handles up to 512 bytes per loop. Pretty exciting.
Locally perf effects are a mix, but we'd expect this to help
Chrome unambiguously if they've turned off autovectorization.
$ out/ok bench:samples=100 sw filter:match=memset32_\\d\* serial
Before:
[memset32_100000] 16ms @0 20.1ms @99 20.2ms @100
[memset32_10000] 1.07ms @0 1.26ms @99 1.31ms @100
[memset32_1000] 73.9µs @0 89.4µs @99 90.1µs @100
[memset32_100] 8.59µs @0 9.74µs @99 9.96µs @100
[memset32_10] 7.45µs @0 8.96µs @99 8.99µs @100
[memset32_1] 2.29µs @0 2.81µs @99 2.92µs @100
After:
[memset32_100000] 16.2ms @0 17.3ms @99 17.3ms @100
[memset32_10000] 1.06ms @0 1.18ms @99 1.23ms @100
[memset32_1000] 72µs @0 75.6µs @99 84.7µs @100
[memset32_100] 9.14µs @0 10.6µs @99 10.7µs @100
[memset32_10] 5.43µs @0 5.88µs @99 5.99µs @100
[memset32_1] 3.43µs @0 3.65µs @99 3.83µs @100
BUG=chromium:755391
Change-Id: If9059a30ca7a345f1f7c37bd51473c29e8bb8922
Reviewed-on: https://skia-review.googlesource.com/34746
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I3f7667a1357194ae2bdd341ad9d46eb93920f404
Reviewed-on: https://skia-review.googlesource.com/21374
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
|
|
|
|
|
|
|
|
|
| |
This lets the compiler generate AVX versions with wider writes.
Change-Id: Ia63825e70c72bdb4d14bef97d8b4ea4be54c9d84
Reviewed-on: https://skia-review.googlesource.com/17715
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of these implementations now just say "always inline".
Let's see if we can get away with the simplicity of doing that all the time.
These inlined implementations can autovectorize easily.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1639863002
|
|
This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change.
BUG=skia:4117
R=djsollen@google.com
Review URL: https://codereview.chromium.org/1264423002 .
|