| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just makes sure all the plumbing is in place to use the Skylake
Xeon subset of AVX-512 instructions. So far,
- no Windows
- no lowp
- nothing explicitly making use of AVX-512 registers or instructions
This initial pass should run essentially identically to the _hsw AVX2
code we've been using previously. Clang _does_ use AVX-512-only
instructions to implement some of the higher-level concepts we've coded,
but it's really a pretty subtle difference.
Next steps will bump N from 8 to 16 and start threading through an
AVX-512-friendly mask instead of tail. I'll also want to take a harder
look at how we do blending like if_then_else()... the default codegen
here doesn't really take advantage of AVX-512 the way I'd like here.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug
Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2
Reviewed-on: https://skia-review.googlesource.com/54062
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I just noticed we don't really have any CPU test bots that have less
than AVX anymore. I'd like to make sure we're testing each mode of
SkJumper at least, so I've added flags to let us limit to SSE2 or
SSE4.1, the modes currently missing coverage.
Add the bots to test these modes too.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41
Change-Id: I7c2b061332e7f037538488260583076c34ae7b1e
Reviewed-on: https://skia-review.googlesource.com/14405
Reviewed-by: Eric Boren <borenet@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This time, don't call xgetbv() before checking we can.
This reverts commit b26373cfd8151b2fa56bdf532ddcde4919cce09f.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Mac-Clang-MacMini4.1-GPU-GeForce320M-x86_64-Debug
Change-Id: I148302cb36446891b1d79b2e60cde0b43420c1a8
Reviewed-on: https://skia-review.googlesource.com/9089
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 3c322e23a013e78fcbe0edd7adccd580af8466bc.
Reason for revert: crash in SkCpu on Mac
Original change's description:
> Add AVX-512 detection to SkCpu
>
> I've added a SKY alias for the five new bits detected on a Skylake Xeon.
>
> Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451
> Reviewed-on: https://skia-review.googlesource.com/9043
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>
TBR=mtklein@chromium.org,herb@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I3cc06c7e32391e68d6cfe084786b18270cdab631
Reviewed-on: https://skia-review.googlesource.com/9074
Reviewed-by: Cary Clark <caryclark@google.com>
Commit-Queue: Cary Clark <caryclark@google.com>
|
|
|
|
|
|
|
|
|
| |
I've added a SKY alias for the five new bits detected on a Skylake Xeon.
Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451
Reviewed-on: https://skia-review.googlesource.com/9043
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it.
Chrome's GN files and Google3's BUILD file will need an update, before or after this CL.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2
Reviewed-on: https://skia-review.googlesource.com/2840
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2322033002
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have successfully detected CRC32 instruction support on my Nexus 5x.
Use of these instructions to follow... I am not yet sure which compilers if any will give me instrinsics or let me write them in asm.
defined(__ARM_FEATURE_CRC32) should cover users like Android Framework who build with the best settings possible. cpu-features.h covers use cases like Clank and our bots.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2259133002
Review-Url: https://codereview.chromium.org/2259133002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.
We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.
Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.
This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2138073002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think I cracked it.
Though, this may not technically be legal C++...
I've only got one definition of SkCpu::gCachedFeatures,
but two different declarations: non-const in SkCpu.cpp, const elsewhere.
Is this...
- legal C++?
- not C++ but probably works as I think?
- not C++ and will probably blow up?
- who knows, let's see?
I have tested that the features are cached properly, read properly, and that the generated code treats SkCpu::gCachedFeatures as a global constant outside SkCpu.cpp. So it all observably works optimally.
Expanding testing to more bots.
TBR=reed@google.com
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1905683003
Review URL: https://codereview.chromium.org/1905683003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Review URL: https://codereview.chromium.org/1890483002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1890483002/ )
Reason for revert:
many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.
Original issue's description:
> Move CPU feature detection to its own file.
>
> - Moves CPU feature detection to its own file.
> - Cleans up some redundant feature detection scattered around core/ and opts/.
> - Can now detect a few new CPU features:
> * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
> * FMA -> Intel FMA instructions, added at the same time as AVX2
> * VFP_FP16 -> ARM f16<->f32 instructions, quite common
> * NEON_FMA -> ARM FMA instructions, also quite common
> * SSE and SSE3... why not?
>
> This new internal API makes it very cheap to do fine-grained runtime CPU
> feature detection. Redundant calls to SkCpu::Supports() should be eliminated
> and it's hoistable out of loops. It compiles away entirely when we have the
> appropriate instructions available at compile time.
>
> This means we can call it to guard even a little snippet of 1 or 2 instructions
> right where needed and let inlining hoist the check (if any at all) up to
> somewhere that doesn't hurt performance. I've explained how I made this work
> in the private section of the new header.
>
> Once this lands and bakes a bit, I'll start following up with CLs to use it more
> and to add a bunch of those little 1-2 instruction snippets we've been wanting,
> e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
> (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1892643003
|
|
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1890483002
|