| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These HWCAP_... values are defined in hwcap.h, but we don't get them
from there because some platforms have older hwcap.h that don't have
these bits named yet.
Even though we don't directly include hwcap.h, it seems it can get
itself included somehow on some platforms. That leads to a name
clash with the HWCAP_... #defines in there. To avoid it, rename them.
Change-Id: I70788b5e4072c307c6eee55d6f197c3b9a49f5dc
Reviewed-on: https://skia-review.googlesource.com/24408
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For compatibility with older system headers, instead of looking for HWCAP_
values in asm/hwcap.h, just define the bits we want to test ourselves.
This lets us compile this code on systems before those bits were defined.
At runtime the bits will harmlessly test as zero.
Change-Id: I44b6aba7d6f0fc2c5df08ad262c2b0537d900209
Reviewed-on: https://skia-review.googlesource.com/18844
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This time, don't call xgetbv() before checking we can.
This reverts commit b26373cfd8151b2fa56bdf532ddcde4919cce09f.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Mac-Clang-MacMini4.1-GPU-GeForce320M-x86_64-Debug
Change-Id: I148302cb36446891b1d79b2e60cde0b43420c1a8
Reviewed-on: https://skia-review.googlesource.com/9089
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 3c322e23a013e78fcbe0edd7adccd580af8466bc.
Reason for revert: crash in SkCpu on Mac
Original change's description:
> Add AVX-512 detection to SkCpu
>
> I've added a SKY alias for the five new bits detected on a Skylake Xeon.
>
> Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451
> Reviewed-on: https://skia-review.googlesource.com/9043
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>
TBR=mtklein@chromium.org,herb@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I3cc06c7e32391e68d6cfe084786b18270cdab631
Reviewed-on: https://skia-review.googlesource.com/9074
Reviewed-by: Cary Clark <caryclark@google.com>
Commit-Queue: Cary Clark <caryclark@google.com>
|
|
|
|
|
|
|
|
|
| |
I've added a SKY alias for the five new bits detected on a Skylake Xeon.
Change-Id: I9f7dd48f4dc866608d81befd061434ca325ef451
Reviewed-on: https://skia-review.googlesource.com/9043
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a couple ways to detect CPU features on ARM:
- on ARMv8, getauxval(AT_HWCAP)
- on ARMv7, getauxval(AT_HWCAP) and cpu-features.h
This guards each of these methods with preprocessor guards to match
exactly when we can use them. Today they're sort of a mix of that and
higher level expectations about particular build and operating systems.
I'm looking into doing this directly by reading CPU registers,
much like we do for x86 further up the file.
None of this is super important right now, so as long as we don't decide
that we have these features when we don't, things will be fine. It's no
big deal for now if we fail to detect them.
Change-Id: I3b7768483086d0f3f4f6516b754c3ea5ec2d03e5
Reviewed-on: https://skia-review.googlesource.com/8182
Reviewed-by: Chinmay Garde <chinmaygarde@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can splice these stages if we drop them down to 2 at a time.
Turns out this is significantly (2-3x) faster than the status quo.
SkRasterPipeline_…
…f16_compile 1x …srgb_compile 2.06x …f16_run 3.08x …srgb_run 4.61x
Added a couple ways to detect (likely) the required VFPv4 support:
- use hwcap when available (NDK ≥21, Android framework)
- use cpu-features when not (NDK <21)
The code in SkSplicer_generated.h is ARM, not Thumb2. SkSplicer seems
to be blx'ing into it, so that's great, and we bx lr out. There's no
point in attempting to use Thumb2 in vector heavy code... it'll all be
4 byte anyway.
Follow ups:
- vpush {d8-d9} before the loop, vpop {d8-d9} afterwards,
skip these instructions when splicing;
- (probably) drop jumping stages down to 2-at-a-time also.
Change-Id: If151394ec10e8cbd6a05e2d81808488d743bfe15
Reviewed-on: https://skia-review.googlesource.com/6940
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of relying on cpu-features.c, just do what it does.
Good reading: http://man7.org/linux/man-pages/man3/getauxval.3.html
While it's nice to use the headers when possible, should either of these headers not be available, we can fall back to doing it all manually:
extern "C" uint32_t getauxval(uint32_t)
static const int AT_HWCAP = 16;
static const int HWCAP_CRC32 = (1<<7);
To keep things simple I've slimmed cpu feature detection down to just the features we actually make use of. This removes all runtime feature detection for ARMv7... we expect NEON to be globally available, and so far we haven't used the other FMA/FP16 bits on ARMv7. ARMv8 feature dection remains the same, CRC32 before, CRC32 after. x86 (cpuid-based detection) and MIPS (nothing) are untouched.
We need to keep //third_party/cpu-features for //third_party/libwebp.
Change-Id: I6c96df9a09ae68c8c0e54c1152aa177ba9bafc83
Reviewed-on: https://skia-review.googlesource.com/5800
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it.
Chrome's GN files and Google3's BUILD file will need an update, before or after this CL.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2
Reviewed-on: https://skia-review.googlesource.com/2840
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once you have downloaded an android NDK, you can set the ndk GN arg to use it.
E.g. my gn.args looks like:
is_debug = false
ndk = "/opt/android-ndk"
This should be enough to get you going for an arm64 build. You ought to be able to tweak that to other architectures by changing target_cpu to "arm", "x86", "x86-64", etc. That won't quite work until I follow this up a bit, but the skeleton is there.
This is enough to get me compiled, linked, and running to completion on my N5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2275983004
Review-Url: https://codereview.chromium.org/2275983004
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have successfully detected CRC32 instruction support on my Nexus 5x.
Use of these instructions to follow... I am not yet sure which compilers if any will give me instrinsics or let me write them in asm.
defined(__ARM_FEATURE_CRC32) should cover users like Android Framework who build with the best settings possible. cpu-features.h covers use cases like Clank and our bots.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2259133002
Review-Url: https://codereview.chromium.org/2259133002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.
We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.
Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.
This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2138073002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think I cracked it.
Though, this may not technically be legal C++...
I've only got one definition of SkCpu::gCachedFeatures,
but two different declarations: non-const in SkCpu.cpp, const elsewhere.
Is this...
- legal C++?
- not C++ but probably works as I think?
- not C++ and will probably blow up?
- who knows, let's see?
I have tested that the features are cached properly, read properly, and that the generated code treats SkCpu::gCachedFeatures as a global constant outside SkCpu.cpp. So it all observably works optimally.
Expanding testing to more bots.
TBR=reed@google.com
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1905683003
Review URL: https://codereview.chromium.org/1905683003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Review URL: https://codereview.chromium.org/1890483002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1890483002/ )
Reason for revert:
many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.
Original issue's description:
> Move CPU feature detection to its own file.
>
> - Moves CPU feature detection to its own file.
> - Cleans up some redundant feature detection scattered around core/ and opts/.
> - Can now detect a few new CPU features:
> * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
> * FMA -> Intel FMA instructions, added at the same time as AVX2
> * VFP_FP16 -> ARM f16<->f32 instructions, quite common
> * NEON_FMA -> ARM FMA instructions, also quite common
> * SSE and SSE3... why not?
>
> This new internal API makes it very cheap to do fine-grained runtime CPU
> feature detection. Redundant calls to SkCpu::Supports() should be eliminated
> and it's hoistable out of loops. It compiles away entirely when we have the
> appropriate instructions available at compile time.
>
> This means we can call it to guard even a little snippet of 1 or 2 instructions
> right where needed and let inlining hoist the check (if any at all) up to
> somewhere that doesn't hurt performance. I've explained how I made this work
> in the private section of the new header.
>
> Once this lands and bakes a bit, I'll start following up with CLs to use it more
> and to add a bunch of those little 1-2 instruction snippets we've been wanting,
> e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
> (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1892643003
|
|
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1890483002
|