| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I082c34a1f484715cd2dca55a8d23101235755e6a
Reviewed-on: https://skia-review.googlesource.com/5233
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=666707
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5089
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I3bebfdf635d541d92fb84236f0f6fae2da39d691
Reviewed-on: https://skia-review.googlesource.com/5089
Reviewed-by: Bruce Dawson <brucedawson@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC's not so good at inlining. So tell it where to. It won't hurt the others.
This has nothing directly to do with ODR safety. The anonymous namespaces and 'static' on freestanding functions provide the correctness we need there. But this change can help to mechanically prevent the sort of problems ODR violations can lead to.
I may follow up by extending this strategy further to Sk4px, which is used to implement a lot of the legacy xfermodes.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3608
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I927334c40910ce43da1fbabdf243c9cd5438bea6
Reviewed-on: https://skia-review.googlesource.com/3608
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422
Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7
Reviewed-on: https://skia-review.googlesource.com/3422
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should make each compilation unit's SkNx types distinct from each other's as far as C++ cares. This keeps us from violating the One Definition Rule with different implementations for the same function.
Here's an example I like. Sk4i SkNx_cast(Sk4b) has at least 4 different sensible implementations:
- SSE2: punpcklbw xmm, zero; punpcklbw xmm, zero
- SSSE3: load mask; pshufb xmm, mask
- SSE4.1: pmovzxbd
- AVX2: vpmovzxbd
We really want all these to inline, but if for some reason they don't (Debug build, poor inliner) and they're compiled in SkOpts.cpp, SkOpts_ssse3.cpp, SkOpts_sse41.cpp, SkOpts_hsw.cpp... boom!
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3461
Change-Id: I0088ebfd7640c1b0de989738ed43c81b530dc0d9
Reviewed-on: https://skia-review.googlesource.com/3461
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original review here: https://skia-review.googlesource.com/c/2990/
Second attempt here: https://skia-review.googlesource.com/c/3064/
This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out.
That omitted part is the key piece... this just lands the refactoring.
CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242
Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71
Reviewed-on: https://skia-review.googlesource.com/3242
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae.
crbug.com/654213
Looks like Chrome Canary's picking up Haswell code on non-Haswell machines.
Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a
Reviewed-on: https://skia-review.googlesource.com/3108
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original review here: https://skia-review.googlesource.com/c/2990/
Changes since:
- simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers
- fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2)
- now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows.
- all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds.
CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064
Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae
Reviewed-on: https://skia-review.googlesource.com/3064
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a.
Reason for revert: lots of failing bots.
TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I653bed3905187f43196504f19424985fa2a765b5
Reviewed-on: https://skia-review.googlesource.com/3063
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bench runtime changes:
sRGB: 7194 -> 3735 = 1.93x faster
F16: 6531 -> 2559 = 2.55x faster
Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end.
Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup.
This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a
Reviewed-on: https://skia-review.googlesource.com/2990
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every type now nominally has Load4() and Store4() methods.
The ones that we use are implemented.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd
Reviewed-on: https://skia-review.googlesource.com/3046
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
Review-Url: https://codereview.chromium.org/2339233003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:140001 of https://codereview.chromium.org/2339233003/ )
Reason for revert:
Hitting an assert
Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
> new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2347473007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2339233003
|
|
|
|
|
|
|
|
|
|
| |
This lets us get at logical >> in a nicely principled way.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2197683002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Should feel very similar to Sk4h_store4:
NEON uses its native instruction, SSE unpacks manually.
Since we'll have our F16s in 4 Sk4h by the time we're done here,
this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2184753002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2134753006
|
|
|
|
|
|
|
|
|
| |
private)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2139333002
Review-Url: https://codereview.chromium.org/2139333002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:140001 of https://codereview.chromium.org/2133413002/ )
Reason for revert:
Breaking the roll...
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio
Original issue's description:
> try to speed-up maprect + round2i + contains
>
> We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5
TBR=mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2136343002
|
|
|
|
|
|
|
|
|
|
| |
We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2133413002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- rearrange a bit
- fewer macros
- hooks for all operators
- add left and right scalar operator overrides
- add +=, &=, <<=, etc.
- add SkNx_split() and SkNx_join()
- simplify the many rsqrt() and invert() options to just what we actually use
This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them. It seems nice that this is an error rather than silently falling back to serial code.
It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()). Time will tell?
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1812233003
|
|
|
|
|
|
|
|
|
|
| |
Just some syntax cleanup. No real change: kth<...>() was calling [...] already.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1714363002
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1704583003
Review URL: https://codereview.chromium.org/1704583003
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1699293002
Review URL: https://codereview.chromium.org/1699293002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1690633003/ )
Reason for revert:
Precautionary revert for chromium:586487
Original issue's description:
> SkNx refactoring
>
> - add back Sk4i typedef
> - define SSE casts in terms of Sk4i
> * uint8 <-> float becomes uint8 <-> int <-> float
> * uint16 <-> float becomes uint16 <-> int <-> float
>
> This has the nice side effect of specializing uint8 <-> int
> and uint16 <-> int, which are useful in their own right.
>
> There are many cast specializations now, some of which call each other.
> I have tried to arrange them in some sort of sensible order, subject to
> the constraint that those called must precede those who call.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/c1eb311f4e98934476f1b2ad5d6de772cf140d60
TBR=herb@google.com,mtklein@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=chromium:586487
Review URL: https://codereview.chromium.org/1696903002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add back Sk4i typedef
- define SSE casts in terms of Sk4i
* uint8 <-> float becomes uint8 <-> int <-> float
* uint16 <-> float becomes uint16 <-> int <-> float
This has the nice side effect of specializing uint8 <-> int
and uint16 <-> int, which are useful in their own right.
There are many cast specializations now, some of which call each other.
I have tried to arrange them in some sort of sensible order, subject to
the constraint that those called must precede those who call.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1690633003
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1685773002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1685773002/ )
Reason for revert:
build break must be this, right?
Original issue's description:
> Sk4f: add floor()
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1679343004
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1685773002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup)
- expand apis a little
* v[0] == v.kth<0>()
* SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f
- remove anonymous namespace
I believe it's safe to remove the anonymous namespace right now.
We're worried about violating the One Definition Rule; the anonymous namespace protected us from that.
In Release builds, this is mostly moot, as everything tends to inline completely.
In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken.
Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function. Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions. These are not subject to the ODR.
I plan to follow up with a tedious .kth<...>() -> [...] auto-replace.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1683543002
|
|
|
|
|
|
|
|
|
|
| |
refactoring.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679053002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1679053002
|
|
|
|
|
|
|
|
|
|
| |
This means we can remove a lot of explicit casts in code that uses SkNx.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1650653002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1650653002
|
|
|
|
|
|
|
|
|
|
|
| |
There's no reason we couldn't implement this for all ints and floats;
just don't want to land unused code.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1590843003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1590843003
|
|
|
|
|
|
|
|
|
|
| |
Given the autovectorization we've seen, I wouldn't expect big speedups
from this, but it does give us a point of control over what's going on.
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1526923003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- one base case and one N=1 case instead of two each (or three with doubles)
- use SkNx_cast instead of FromBytes/toBytes
- 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes
If I did everything right, this'll be perf- and pixel- neutral.
https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1526523003
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a big speedup for float -> byte. E.g. gradient_linear_clamp_3color:
x86-64 147µs -> 103µs (Broadwell MBP)
arm64 2.03ms -> 648µs (Galaxy S6)
armv7 1.12ms -> 489µs (Galaxy S6, same device!)
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
Review URL: https://codereview.chromium.org/1483953002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SkNx_cast() can cast between any of our vector types,
provided they have the same number of elements.
Any types should work with the default implementation,
and we can drop in specializations as needed, like the
SSE and NEON Sk4f -> Sk4i I included here as an example.
To make this work, I made some internal name changes:
SkNi<N,T> -> SkNx<N, T>
SkNf<N> -> SkNx<N, float>
User aliases (Sk4f, Sk16b, etc.) stay the same.
We can land this first (it's PS1) if that makes things easier.
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1464623002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x
Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x
Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x
Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x
Xfermode_ColorDodge 25ms -> 16.5ms 0.66x
Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x
Ought to be no pixel diffs:
https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false
Incidental stuff:
I made the SkNx(T) constructors implicit to make writing math expressions simpler.
This allows us to write expressions like
Sk4f v;
...
v = v*4;
rather than
Sk4f v;
...
v = v * Sk4f(4);
As written it only works when the constant is on the right-hand side,
so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on
following up with a CL that lets those become `(1 - da)` too.
BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1432903002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- remove float -> int conversion, keeping float -> byte
- remove support for doubles
I was thinking of specializing Sk8f for AVX. This will help keep the complexity down.
This may cause minor diffs in radial gradients: toBytes() rounds where castTrunc() truncated. But I don't see any diffs in Gold.
https://gold.skia.org/search2?issue=1411563008&unt=true&query=source_type%3Dgm&master=false
BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1411563008
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to express shuffles more directly in code while also giving us a
convenient point to platform-specify particular shuffles for particular types.
No specializations yet. Everyone just uses the (pretty good) default option.
BUG=skia:
Review URL: https://codereview.chromium.org/1301413006
|
|
|
|
|
|
|
|
| |
BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1312053004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary
for SkColorCubeFilter_opts.h.
Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs.
Rough perf effect:
SSSE3: 110 -> 93 (~15%)
NEON: 465 -> 375 (~20%)
This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f.
I've kept it for now so I can convert sites one by one and think about how things
that really want to keep PM color order will work.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1319413003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:1 of https://codereview.chromium.org/1286093004/ )
Reason for revert:
Maybe causing test / gold problems?
Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f
TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1284333002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.
To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.
BUG=skia:4117
[1] Here is what those errors looked like:
In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors
Review URL: https://codereview.chromium.org/1286093004
|
|
|
|
|
|
|
|
|
| |
This was causing the 3 xfermodes that use floats and conditionals
to draw wrong when SKNX_NO_SIMD was defined.
BUG=skia:4051
Review URL: https://codereview.chromium.org/1229013003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1196713004/)
Reason for revert:
64-bit ARM build failures.
Original issue's description:
> Implement four more xfermodes with Sk4px.
>
> HardLight, Overlay, Darken, and Lighten are all
> ~2x faster with SSE, ~25% faster with NEON.
>
> This covers all previously-implemented NEON xfermodes.
> 3 previous SSE xfermodes remain. Those need division
> and sqrt, so I'm planning on using SkPMFloat for them.
> It'll help the readability and NEON speed if I move that
> into [0,1] space first.
>
> The main new concept here is c.thenElse(t,e), which behaves like
> (c ? t : e) except, of course, both t and e are evaluated. This allows
> us to emulate conditionals with vectors.
>
> This also removes the concept of SkNb. Instead of a standalone bool
> vector, each SkNi or SkNf will just return their own types for
> comparisons. Turns out to be a lot more manageable this way.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1205703008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Review URL: https://codereview.chromium.org/1196713004
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:3951
Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot
Review URL: https://codereview.chromium.org/1184113003
|