| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44
This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not
by defined(__clang__). The two are usually the same, except when we opt
Clang builds into JUMPER_IS_SCALAR artificially.
Some Google3 builds use compilers old enough that they barf when
compiling our NEON code. It's conceivably also possible to define
JUMPER_IS_SCALAR yourself, but I don't think anyone does that.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d
Reviewed-on: https://skia-review.googlesource.com/112744
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44.
Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112743
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
Now with fixed #include paths in SkRasterPipeline_opts.h,
and -ffp-contract=fast for the :hsw target to minimize
diffs on non-Windows Clang AVX2/AVX-512 bots.
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
Reviewed-on: https://skia-review.googlesource.com/112742
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4.
Reason for revert: wrong include path :/
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112741
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enough clients are using Clang now that we can say, use Clang to build
if you want these software pipeline stages to go fast.
This lets us drop the offline build aspect of SkJumper stages, instead
building as part of Skia using the SkOpts framework.
I think everything should work, except I've (temporarily) removed
AVX-512 support. I will put this back in a follow up.
I have had to drop Windows down to __vectorcall and our narrower
stage calling convention that keeps the d-registers on the stack.
I tried forcing sysv_abi, but that crashed Clang. :/
Added a TODO to up the same narrower stage calling convention
for lowp stages... we just *don't* today, for no good reason.
Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
Reviewed-on: https://skia-review.googlesource.com/110641
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Skia's built with an interestingly advanced instruction set
baseline like SSSE3 or SSE4.1, we end up with two distinct copies of
some SkOpts functions, one default in SkOpts.o and one specialization
from SkOpts_{ssse3,sse41}.o. These functions are static, and so are
technically unrelated, even though they're the same code compiled with
the same instructions available. They're going to be identical.
What we want here is to remove static but mark them as inline instead.
In this case inline means "if the linker sees multiple copies of this,
that's cool, just pick any one arbitrarily". That's just what we want.
Now, when I disassemble a binary before and after this change, I do see
the redundant routines removed. However, the file size change is
minimal... I suspect that this must mean the linker has noticed that we
had identical code and physically folded the two logically independent
routines. I don't know how prevalent this optimization is, though, so
it doesn't hurt to give it more of a "one copy please" hint with inline.
There may also be a difference here between the binary size (~unchanged)
and the in-memory layout of that binary?
Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857
Reviewed-on: https://skia-review.googlesource.com/37501
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a simple benchmark the CRC32 version is about 3x faster on
ARM Cortex A57 (Aarch32) than the Murmur3 scalar version.
BUG=skia:
Change-Id: I71515e8463a33924998b837ff9f32202690dd2fe
Reviewed-on: https://skia-review.googlesource.com/15480
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2322033002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For large inputs, this runs ~11x faster than Murmur3.
My bench drops from 1µs to 88ns.
Like x86-64, this runs fastest if we work in 24 byte chunks. 16 byte chunks
run at about 0.75x this speed, 8 byte chunks at about 0.4x (which would still
be about 5x faster than Murmur3).
This'll require plumbing support for opts_crc32 into Chrome first before it can roll.
perf.skia.org charts we want to watch: https://perf.skia.org/#5490
Seach for compute_hash in these logs to see the difference:
baseline: https://luci-milo.appspot.com/swarming/task/30ba22f3dfe30e10/steps/nanobench/0/stdout
trybot: https://luci-milo.appspot.com/swarming/task/30bbc406cbf62d10/steps/nanobench/0/stdout
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2260823002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2260823002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nothing too surprising in the new 32-bit x86 hash. It's about half speed of the 64-bit variant, just as you'd expect.
Using unaligned_load like the others makes the may_alias parts of murmur3 moot. I've updated some of the terms in the murmur hash to read consistently with the others.
The existing hashes are the same speed and produce the same hashes. In case this is not obvious, all three hash functions produce different hashes.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2251773002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2251773002
|
|
About 9x faster than Murmur3 for long inputs.
Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot
Review-Url: https://codereview.chromium.org/2208903002
|