aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkChecksum_opts.h
Commit message (Collapse)AuthorAge
* Reland "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44 This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not by defined(__clang__). The two are usually the same, except when we opt Clang builds into JUMPER_IS_SCALAR artificially. Some Google3 builds use compilers old enough that they barf when compiling our NEON code. It's conceivably also possible to define JUMPER_IS_SCALAR yourself, but I don't think anyone does that. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d Reviewed-on: https://skia-review.googlesource.com/112744 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Revert "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44. Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112743 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Reland "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 Now with fixed #include paths in SkRasterPipeline_opts.h, and -ffp-contract=fast for the :hsw target to minimize diffs on non-Windows Clang AVX2/AVX-512 bots. Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 Reviewed-on: https://skia-review.googlesource.com/112742 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4. Reason for revert: wrong include path :/ Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112741 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkJumper stages normal Skia codeGravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Enough clients are using Clang now that we can say, use Clang to build if you want these software pipeline stages to go fast. This lets us drop the offline build aspect of SkJumper stages, instead building as part of Skia using the SkOpts framework. I think everything should work, except I've (temporarily) removed AVX-512 support. I will put this back in a follow up. I have had to drop Windows down to __vectorcall and our narrower stage calling convention that keeps the d-registers on the stack. I tried forcing sysv_abi, but that crashed Clang. :/ Added a TODO to up the same narrower stage calling convention for lowp stages... we just *don't* today, for no good reason. Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 Reviewed-on: https://skia-review.googlesource.com/110641 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* make SkOpts functions inline, not staticGravatar Mike Klein2017-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | When Skia's built with an interestingly advanced instruction set baseline like SSSE3 or SSE4.1, we end up with two distinct copies of some SkOpts functions, one default in SkOpts.o and one specialization from SkOpts_{ssse3,sse41}.o. These functions are static, and so are technically unrelated, even though they're the same code compiled with the same instructions available. They're going to be identical. What we want here is to remove static but mark them as inline instead. In this case inline means "if the linker sees multiple copies of this, that's cool, just pick any one arbitrarily". That's just what we want. Now, when I disassemble a binary before and after this change, I do see the redundant routines removed. However, the file size change is minimal... I suspect that this must mean the linker has noticed that we had identical code and physically folded the two logically independent routines. I don't know how prevalent this optimization is, though, so it doesn't hurt to give it more of a "one copy please" hint with inline. There may also be a difference here between the binary size (~unchanged) and the in-memory layout of that binary? Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857 Reviewed-on: https://skia-review.googlesource.com/37501 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Ben Wagner <bungeman@google.com>
* CRC32 no longer restricted to ARM64Gravatar Amaury Le Leyzour2017-05-04
| | | | | | | | | | | | On a simple benchmark the CRC32 version is about 3x faster on ARM Cortex A57 (Aarch32) than the Murmur3 scalar version. BUG=skia: Change-Id: I71515e8463a33924998b837ff9f32202690dd2fe Reviewed-on: https://skia-review.googlesource.com/15480 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Apple devices do not support CRC32 instructions. Don't believe Clang's lies.Gravatar mtklein2016-09-08
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2322033002
* Use ARMv8 CRC32 instructions for SkOpts::hash().Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | For large inputs, this runs ~11x faster than Murmur3. My bench drops from 1µs to 88ns. Like x86-64, this runs fastest if we work in 24 byte chunks. 16 byte chunks run at about 0.75x this speed, 8 byte chunks at about 0.4x (which would still be about 5x faster than Murmur3). This'll require plumbing support for opts_crc32 into Chrome first before it can roll. perf.skia.org charts we want to watch: https://perf.skia.org/#5490 Seach for compute_hash in these logs to see the difference: baseline: https://luci-milo.appspot.com/swarming/task/30ba22f3dfe30e10/steps/nanobench/0/stdout trybot: https://luci-milo.appspot.com/swarming/task/30bbc406cbf62d10/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2260823002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2260823002
* 32-bit fast hash, tidy up murmur3 a bitGravatar mtklein2016-08-16
| | | | | | | | | | | | | | Nothing too surprising in the new 32-bit x86 hash. It's about half speed of the 64-bit variant, just as you'd expect. Using unaligned_load like the others makes the may_alias parts of murmur3 moot. I've updated some of the terms in the murmur hash to read consistently with the others. The existing hashes are the same speed and produce the same hashes. In case this is not obvious, all three hash functions produce different hashes. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2251773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2251773002
* Use sse4.2 CRC32 instructions to hash when available.Gravatar mtklein2016-08-08
About 9x faster than Murmur3 for long inputs. Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot Review-Url: https://codereview.chromium.org/2208903002