aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkCpu.h
Commit message (Collapse)AuthorAge
* add _skx stagesGravatar Mike Klein2017-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This just makes sure all the plumbing is in place to use the Skylake Xeon subset of AVX-512 instructions. So far, - no Windows - no lowp - nothing explicitly making use of AVX-512 registers or instructions This initial pass should run essentially identically to the _hsw AVX2 code we've been using previously. Clang _does_ use AVX-512-only instructions to implement some of the higher-level concepts we've coded, but it's really a pretty subtle difference. Next steps will bump N from 8 to 16 and start threading through an AVX-512-friendly mask instead of tail. I'll also want to take a harder look at how we do blending like if_then_else()... the default codegen here doesn't really take advantage of AVX-512 the way I'd like here. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2 Reviewed-on: https://skia-review.googlesource.com/54062 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* add #defines to limit SkCpuGravatar Mike Klein2017-04-27
| | | | | | | | | | | | | | | | | I just noticed we don't really have any CPU test bots that have less than AVX anymore. I'd like to make sure we're testing each mode of SkJumper at least, so I've added flags to let us limit to SSE2 or SSE4.1, the modes currently missing coverage. Add the bots to test these modes too. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41 Change-Id: I7c2b061332e7f037538488260583076c34ae7b1e Reviewed-on: https://skia-review.googlesource.com/14405 Reviewed-by: Eric Boren <borenet@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add AVX-512 detection to SkCpu, try 2.Gravatar Mike Klein2017-03-01
| | | | | | | | | | | | | This time, don't call xgetbv() before checking we can. This reverts commit b26373cfd8151b2fa56bdf532ddcde4919cce09f. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Mac-Clang-MacMini4.1-GPU-GeForce320M-x86_64-Debug Change-Id: I148302cb36446891b1d79b2e60cde0b43420c1a8 Reviewed-on: https://skia-review.googlesource.com/9089 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "Add AVX-512 detection to SkCpu"Gravatar Cary Clark2017-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3c322e23a013e78fcbe0edd7adccd580af8466bc. Reason for revert: crash in SkCpu on Mac Original change's description: > Add AVX-512 detection to SkCpu > > I've added a SKY alias for the five new bits detected on a Skylake Xeon. > > Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451 > Reviewed-on: https://skia-review.googlesource.com/9043 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,herb@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I3cc06c7e32391e68d6cfe084786b18270cdab631 Reviewed-on: https://skia-review.googlesource.com/9074 Reviewed-by: Cary Clark <caryclark@google.com> Commit-Queue: Cary Clark <caryclark@google.com>
* Add AVX-512 detection to SkCpuGravatar Mike Klein2017-02-28
| | | | | | | | | I've added a SKY alias for the five new bits detected on a Skylake Xeon. Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451 Reviewed-on: https://skia-review.googlesource.com/9043 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add an SkOpts target for Haswell+ Intel chips.Gravatar Mike Klein2016-09-30
| | | | | | | | | | | | | | | | Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it. Chrome's GN files and Google3's BUILD file will need an update, before or after this CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2 Reviewed-on: https://skia-review.googlesource.com/2840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Apple devices do not support CRC32 instructions. Don't believe Clang's lies.Gravatar mtklein2016-09-08
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2322033002
* Detect CRC32 instructions on ARMv8.Gravatar mtklein2016-08-18
| | | | | | | | | | | | | I have successfully detected CRC32 instruction support on my Nexus 5x. Use of these instructions to follow... I am not yet sure which compilers if any will give me instrinsics or let me write them in asm. defined(__ARM_FEATURE_CRC32) should cover users like Android Framework who build with the best settings possible. cpu-features.h covers use cases like Clank and our bots. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2259133002 Review-Url: https://codereview.chromium.org/2259133002
* Clean up hyper-local SkCpu feature test experiment.Gravatar mtklein2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This removes the code paths where we make SkCpu::Supports() calls from within a tight loop. It keeps code paths using SkCpu::Supports() to choose entire routines from src/opts/. We can't rely on these hyper-local checks to be hoisted up reliably enough. It worked pretty well with the first couple platforms we tried (e.g. Clang on Linux/Mac) but we can't gaurantee it works everywhere. Further, I'm not able to actually do anything fancy with those tests outside of x86... I've not found a way to get, say, NEON+F16 conversion code embedded into ordinary NEON code outside writing then entire function in external assembly. This whole idea becomes less important now that we've got a way to chain separate function calls together efficiently. We can now, e.g., use an AVX+F16C method to load some pixels, then chain that into an ordinary AVX method to color filter them. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2138073002
* SkCpu w/o static initializerGravatar mtklein2016-04-21
| | | | | | | | | | | | | | | | | | | | | | | | I think I cracked it. Though, this may not technically be legal C++... I've only got one definition of SkCpu::gCachedFeatures, but two different declarations: non-const in SkCpu.cpp, const elsewhere. Is this... - legal C++? - not C++ but probably works as I think? - not C++ and will probably blow up? - who knows, let's see? I have tested that the features are cached properly, read properly, and that the generated code treats SkCpu::gCachedFeatures as a global constant outside SkCpu.cpp. So it all observably works optimally. Expanding testing to more bots. TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1905683003 Review URL: https://codereview.chromium.org/1905683003
* Move CPU feature detection to its own file.Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 Review URL: https://codereview.chromium.org/1890483002
* Revert of Move CPU feature detection to its own file. (patchset #7 id:120001 ↵Gravatar mtklein2016-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1890483002/ ) Reason for revert: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places. Original issue's description: > Move CPU feature detection to its own file. > > - Moves CPU feature detection to its own file. > - Cleans up some redundant feature detection scattered around core/ and opts/. > - Can now detect a few new CPU features: > * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 > * FMA -> Intel FMA instructions, added at the same time as AVX2 > * VFP_FP16 -> ARM f16<->f32 instructions, quite common > * NEON_FMA -> ARM FMA instructions, also quite common > * SSE and SSE3... why not? > > This new internal API makes it very cheap to do fine-grained runtime CPU > feature detection. Redundant calls to SkCpu::Supports() should be eliminated > and it's hoistable out of loops. It compiles away entirely when we have the > appropriate instructions available at compile time. > > This means we can call it to guard even a little snippet of 1 or 2 instructions > right where needed and let inlining hoist the check (if any at all) up to > somewhere that doesn't hurt performance. I've explained how I made this work > in the private section of the new header. > > Once this lands and bakes a bit, I'll start following up with CLs to use it more > and to add a bunch of those little 1-2 instruction snippets we've been wanting, > e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps > (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1892643003
* Move CPU feature detection to its own file.Gravatar mtklein2016-04-14
- Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1890483002