aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Revert "SkRasterPipeline: 8x pipelines, attempt 2"Gravatar Mike Klein2016-10-10
| | | | | | | | | | | | This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, attempt 2Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines"Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make load4 and store4 part of SkNx properly.Gravatar Mike Klein2016-10-06
| | | | | | | | | | | | | | | Every type now nominally has Load4() and Store4() methods. The ones that we use are implemented. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd Reviewed-on: https://skia-review.googlesource.com/3046 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make all SkRasterPipeline stages stock stages in SkOpts.Gravatar Mike Klein2016-10-04
| | | | | | | | | | | | | | | | | If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>
* sk_linear_from_srgb_mathGravatar Mike Klein2016-10-04
| | | | | | | | | | | | | | | | | Looks great (imperceptibly different) but ~10% slower on both ARMv8 and x86-64. Probably need to hide the table-or-math logic behind Sk4f/Sk8f unless we find faster math. I do like the new look of the pipeline stages though. A lot clearer. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2880 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I44952237d56ba167445b07d4830eb8959c4d47b7 Reviewed-on: https://skia-review.googlesource.com/2880 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Add an SkOpts target for Haswell+ Intel chips.Gravatar Mike Klein2016-09-30
| | | | | | | | | | | | | | | | Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it. Chrome's GN files and Google3's BUILD file will need an update, before or after this CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2 Reviewed-on: https://skia-review.googlesource.com/2840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Add an enum layer of indirection for stock raster pipeline stages.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Start moving SkRasterPipeline stages to SkOpts.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | | | This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 Review-Url: https://codereview.chromium.org/2339233003
* Revert of Support Float32 output from SkColorSpaceXform (patchset #7 ↵Gravatar msarett2016-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2339233003/ ) Reason for revert: Hitting an assert Original issue's description: > Support Float32 output from SkColorSpaceXform > > * Adds Float32 support to SkColorSpaceXform > * Changes API to allows clients to ask for F32, updates clients to > new API > * Adds Sk4f_load4 and Sk4f_store4 to SkNx > * Make use of new xform in SkGr.cpp > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2347473007
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2339233003
* Apple devices do not support CRC32 instructions. Don't believe Clang's lies.Gravatar mtklein2016-09-08
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2322033002
* clean up use of register storage classGravatar mtklein2016-08-30
| | | | | | | | | | | | | This fixes this warning-as-error, so we don't have to stifle it any more: error: 'register' storage class specifier is deprecated and incompatible with C++1z [-Werror,-Wdeprecated-register] Tested by building for mipsel via GN. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2289293002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2289293002
* compress_r11eac_blocks() required more alignment than dst has.Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | This shouldn't change any behavior except that the stores to dst will no longer require 8-byte alignment. Empirically it seems like we can use 4-byte alignment here, but u8 (i.e. 1-byte alignment) is always safe. BUG=skia:5637 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2264103002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2264103002
* Use ARMv8 CRC32 instructions for SkOpts::hash().Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | For large inputs, this runs ~11x faster than Murmur3. My bench drops from 1µs to 88ns. Like x86-64, this runs fastest if we work in 24 byte chunks. 16 byte chunks run at about 0.75x this speed, 8 byte chunks at about 0.4x (which would still be about 5x faster than Murmur3). This'll require plumbing support for opts_crc32 into Chrome first before it can roll. perf.skia.org charts we want to watch: https://perf.skia.org/#5490 Seach for compute_hash in these logs to see the difference: baseline: https://luci-milo.appspot.com/swarming/task/30ba22f3dfe30e10/steps/nanobench/0/stdout trybot: https://luci-milo.appspot.com/swarming/task/30bbc406cbf62d10/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2260823002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2260823002
* 32-bit fast hash, tidy up murmur3 a bitGravatar mtklein2016-08-16
| | | | | | | | | | | | | | Nothing too surprising in the new 32-bit x86 hash. It's about half speed of the 64-bit variant, just as you'd expect. Using unaligned_load like the others makes the may_alias parts of murmur3 moot. I've updated some of the terms in the murmur hash to read consistently with the others. The existing hashes are the same speed and produce the same hashes. In case this is not obvious, all three hash functions produce different hashes. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2251773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2251773002
* Use sse4.2 CRC32 instructions to hash when available.Gravatar mtklein2016-08-08
| | | | | | | | | | | | About 9x faster than Murmur3 for long inputs. Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot Review-Url: https://codereview.chromium.org/2208903002
* SkBlendARGB32 and S32[A]_Blend_BlitRow32 are currently formulated as: ↵Gravatar lsalzman2016-08-05
| | | | | | | | | | | | | | | | | | | | | | SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate. In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684 In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338. This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off. This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein adds...] No public API changes. TBR=reed@google.com Review-Url: https://codereview.chromium.org/2097883002
* SkRTConf: eliminateGravatar halcanary2016-08-04
| | | | | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 DOCS_PREVIEW= https://skia.org/?cl=2212473002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein] TBR=reed@google.com Only removing unused public API. Review-Url: https://codereview.chromium.org/2212473002
* Revert of SkRTConf: reduce functionality to what we use, increase simplicity ↵Gravatar mtklein2016-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #8 id:150001 of https://codereview.chromium.org/2212473002/ ) Reason for revert: missed GrVkPipelineStateCache Original issue's description: > SkRTConf: reduce functionality to what we use, increase simplicity > > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 > DOCS_PREVIEW= https://skia.org/?cl=2212473002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > [mtklein] > TBR=reed@google.com > Only removing unused public API. > > Committed: https://skia.googlesource.com/skia/+/ef59974708dade6fa72fb0218d4f8a9590175c47 TBR=halcanary@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review-Url: https://codereview.chromium.org/2215433003
* SkRTConf: reduce functionality to what we use, increase simplicityGravatar halcanary2016-08-03
| | | | | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 DOCS_PREVIEW= https://skia.org/?cl=2212473002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein] TBR=reed@google.com Only removing unused public API. Review-Url: https://codereview.chromium.org/2212473002
* Refactor of SkColorSpaceXformOptsGravatar msarett2016-08-02
| | | | | | | | | | | | | | | | | (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002
* Revert of Tidy up SkNx_neon. (patchset #3 id:40001 of ↵Gravatar mtklein2016-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/2196773002/ ) Reason for revert: https://luci-milo.appspot.com/swarming/task/3055149a25621b10 Not Nexus 5 specific. Reproduces on Pixel C with --gcc -t Debug -d arm_v7_neon. Not sure about other configs yet. Original issue's description: > Tidy up SkNx_neon. > > This takes advantage of the fact that all the compilers we use that > support NEON implement it with their own vector extensions. This means > normal things like c = a + b work on the underlying vector types already. > Odd instructions like min or saturated add need to stay intrinsics. > > Also, rearrange functions to a more consistent order. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/6ad22315eb6eacfcd35497cd118440a619d05b18 TBR=msarett@google.com,mtklein@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=skia: Review-Url: https://codereview.chromium.org/2196953002
* simplify neon shiftsGravatar mtklein2016-07-29
| | | | | | | | | | | | These still generate vshr/vshl with immediates with both GCC and Clang. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194953002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Based on https://codereview.chromium.org/2196773002 Review-Url: https://codereview.chromium.org/2194953002
* Tidy up SkNx_neon.Gravatar mtklein2016-07-29
| | | | | | | | | | | | | | | This takes advantage of the fact that all the compilers we use that support NEON implement it with their own vector extensions. This means normal things like c = a + b work on the underlying vector types already. Odd instructions like min or saturated add need to stay intrinsics. Also, rearrange functions to a more consistent order. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2196773002
* SkNx: add Sk4uGravatar mtklein2016-07-29
| | | | | | | | | | This lets us get at logical >> in a nicely principled way. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2197683002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-29
| | | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b Review-Url: https://codereview.chromium.org/2174493002
* Revert of Add color space xform support to SkJpegCodec (includes F16!) ↵Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #9 id:260001 of https://codereview.chromium.org/2174493002/ ) Reason for revert: Breaking MSAN Original issue's description: > Add color space xform support to SkJpegCodec (includes F16!) > > Also changes SkColorXform to support: > RGBA->RGBA > RGBA->BGRA > > Instead of: > RGBA->SkPMColor > > TBR=reed@google.com > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b TBR=mtklein@google.com,reed@google.com,herb@google.com,brianosman@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2195523002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2174493002
* Fix alignment problems in NEON Sk4b.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | | As written at head, the compiler can assume these loads and stores are 4 byte aligned [1]. We want Sk4b to load from any 1-byte aligned address, to prevent crashes like [2]. [1] https://llvm.org/bugs/show_bug.cgi?id=24421 [2] https://luci-milo.appspot.com/swarming/task/304079e125b1b910/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183133002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2183133002
* Add Sk4h_load4 for loading F16.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | Should feel very similar to Sk4h_store4: NEON uses its native instruction, SSE unpacks manually. Since we'll have our F16s in 4 Sk4h by the time we're done here, this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2184753002
* Use sk_srgb_to_linear_trunc in SkColorXform_optsGravatar msarett2016-07-25
| | | | | | | | | | | | | | | This gives us a little more control over instruction order, allowing us to pipeline the muls and get better performance. Technically, clang should be able to do this for us anyway... Performance on HP z620 (201295.jpg): toSRGB: 371us -> 356us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2175413002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2175413002
* Delete SkDefaultXform, handle edge cases in SkColorSpaceXform_BaseGravatar msarett2016-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | "Edge" cases include: (1) Matrices with translation (2) colorLUTs Performance on HP z620: 201295.jpg to2Dot2: 386us -> 414us toSRGB: 346us -> 371us toHalf: 282us -> 302us strange0-translate.jpg toSRGB: 1060us -> 244us strange1-colorLUT.jpg toSRGB: 2.74ms -> 2.00ms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2177173003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2177173003
* Add clamp to sk_linear_to_srgb, reorder instructionsGravatar msarett2016-07-22
| | | | | | | | | | | | | | | | | | | Improves performance for xforms toSRGB and to2Dot2. Seems more optimal to save clamping until the end. That way we don't stall the mul pipeline with a min/max. toSRGB: 371us -> 346us to2Dot2: 404us -> 387us FWIW, it probably makes sense to clamp inside sk_linear_to_srgb anyway. If not, we should potentially provide two versions (one that clamps and one that doesn't). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2173803002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2173803002
* Miscellaneous color space refactorsGravatar msarett2016-07-21
| | | | | | | | | | | | | | | | (1) Use float matrix[16] everywhere (enables future code sharing). (2) SkColorLookUpTable refactors *** Store in a single allocation (like SkGammas) *** Eliminate fOutputChannels (we always require 3, and probably always will) (3) Change names of read_big_endian_* helpers BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2166093003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2166093003
* Correct sRGB <-> linear everywhere.Gravatar mtklein2016-07-20
| | | | | | | | | | | | | | | | | | | | | | This trims the SkPM4fPriv methods down to just foolproof methods. (Anything trying to build these itself is probably wrong.) Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore, at least not efficiently, so this refactor is somewhat more invasive than you might think. Generally this means things using to_4f() are also making a misstep... that's gone too. It also does not make sense to try to play games with linear floats with 255 bias any more. That hack can't work with real sRGB coding. Rather than update them, I've removed a couple of L32 xfermode fast paths. I'd even rather drop it entirely... BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2163683002
* Tune linear->sRGB constants to round-trip all bytes.Gravatar mtklein2016-07-20
| | | | | | | | | | | | I basically just ran a big 5-deep for-loop over the five constants here. This is the first set of coefficients I found that round trips all bytes. I suspect there are many such sets. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2162063003
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Add capability for SkColorXform to output half floatsGravatar msarett2016-07-15
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2147763002
* Add a bench to measure the best way to pack from int to uint16_t with SSE.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* Update SkOpts namespaces.Gravatar mtklein2016-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | If we make sure all SkOpts functions are static, we can give the namespaces any name we like. This lets us drop the sk_ prefix and give a real indication of the default SIMD instruction set rather than just saying sk_default. Both of these changes help debugger, profiler, and crash report readability. Perhaps more importantly, keeping these functions static helps prevent accidentally linking in unused versions of functions, as you see here with sk_avx::srcover_srgb_srgb(). This requires we update SkBlend_opts tests and benches to call SkOpts functions through SkOpts rather than declaring the methods externally. In practice this drops testing of the SSE2 version on machines with SSE4. If we still really need to test/bench the compile time best SIMD level version of this method against the runtime detected best, we can include SkBlend_opts.h into the tests or benches directly, similar to what we do for the trivial, brute-force, or best non-SIMD versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145833002
* SkRasterPipeline preliminariesGravatar mtklein2016-07-12
| | | | | | | | | | | | | | Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004
* Remove bloat from SkBlend_opts.Gravatar herb2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130183003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130183003
* Add Sk4f_RoundToIntGravatar msarett2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2134753006
* Revert of try to speed-up maprect + round2i + contains (patchset #8 ↵Gravatar msarett2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2133413002/ ) Reason for revert: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > try to speed-up maprect + round2i + contains > > We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 TBR=mtklein@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2136343002