aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkRasterPipeline_opts.h
Commit message (Collapse)AuthorAge
...
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make load4 and store4 part of SkNx properly.Gravatar Mike Klein2016-10-06
| | | | | | | | | | | | | | | Every type now nominally has Load4() and Store4() methods. The ones that we use are implemented. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd Reviewed-on: https://skia-review.googlesource.com/3046 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make all SkRasterPipeline stages stock stages in SkOpts.Gravatar Mike Klein2016-10-04
| | | | | | | | | | | | | | | | | If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>
* sk_linear_from_srgb_mathGravatar Mike Klein2016-10-04
| | | | | | | | | | | | | | | | | Looks great (imperceptibly different) but ~10% slower on both ARMv8 and x86-64. Probably need to hide the table-or-math logic behind Sk4f/Sk8f unless we find faster math. I do like the new look of the pipeline stages though. A lot clearer. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2880 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I44952237d56ba167445b07d4830eb8959c4d47b7 Reviewed-on: https://skia-review.googlesource.com/2880 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Add an enum layer of indirection for stock raster pipeline stages.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Start moving SkRasterPipeline stages to SkOpts.Gravatar Mike Klein2016-09-29
This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>