aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkOpts.h
Commit message (Collapse)AuthorAge
* remove SkOpts::run_pipeline() declaration.Gravatar Mike Klein2017-04-25
| | | | | | | | | | | I missed this while removing SkRasterPipeline_opts.h. It's just a declaration... this won't change any generated code. Change-Id: I66f6e9fe5341e9ff6a91981da9275c944a63fee9 Reviewed-on: https://skia-review.googlesource.com/14325 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Simplify more: remove SkRasterPipeline::compile().Gravatar Mike Klein2017-02-16
| | | | | | | | | | | | It's easier to work on SkJumper if everything funnels through run(). I don't anticipate huge benefit from compile() without JITing, but it's something we can always put back if we find a need. Change-Id: Id5256fd21495e8195cad1924dbad81856416d913 Reviewed-on: https://skia-review.googlesource.com/8468 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Remove SkTextureCompressor.Gravatar Herb Derby2017-02-10
| | | | | | | | | | | | | | | | | | | This is the ultimate state of what it looks like to remove SkTextureCompressor. This end result will result from the following steps. 1) Remove Skia dep on ktx (done) 2) Move format over to ktx (done) 3) Remove all SkTexture compressor code CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3ad7a6abbea006a3034d95662c652d6db90b86ef Reviewed-on: https://skia-review.googlesource.com/8272 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Robert Phillips <robertphillips@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com>
* Move shader register setup to SkRasterPipelineBlitter.Gravatar Mike Klein2017-01-23
| | | | | | | | | | | | | | | | | | | | | We've been seeding the initial values of our registers to x+0.5,y+0.5, 1,0, 0,0,0,0 (useful values for shaders to start with) in all pipelines. This CL changes that to do so only when blitting, and only when we have a shader. The nicest part of this change is that SkRasterPipeline itself no longer needs to have a concept of y, or what x means. It just marches x through [x,x+n), and the blitter handles y and layers the meaning of "dst x coordinate" onto x. This ought to make SkSplicer a little easier to work with too. dm --src gm --config f16 srgb 565 all draws the same. Change-Id: I69d8c1cc14a06e5dfdd6a7493364f43a18f8dec5 Reviewed-on: https://skia-review.googlesource.com/7353 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Remove SkColorCubeFilter. It is unused.Gravatar Mike Klein2017-01-21
| | | | | | | Change-Id: Iec5fc759e331de24caea1347f9510917260d379b Reviewed-on: https://skia-review.googlesource.com/7363 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Bring back SkRasterPipeline::run() for one-off uses.Gravatar Mike Klein2016-11-30
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984 Reviewed-on: https://skia-review.googlesource.com/5353 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Port convolve functions to SkOptsGravatar xiangze.zhang2016-11-17
| | | | | | | | | | | | This patch moves the C++/SSE2/NEON implementations of convolve functions into the same place and uses SkOpts framework. Also some indentation fix. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2500113004 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2500113004
* make SkXfermode.h go awayGravatar Mike Reed2016-11-16
| | | | | | | | | | | | | | | | | | This is step one: - make SkXfermode useless to public clients - everything they should need is in SkBlendMode.h Step two: - remove SkXfermode.h entirely (since skia core will already be using SkXfermodePriv.h) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4534 Change-Id: If2cea9f71df92430ed6644edb98dd306c5572cbc Reviewed-on: https://skia-review.googlesource.com/4534 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Start each pipeline with (x,y) in (dr,dg) registers for the shader.Gravatar Mike Klein2016-11-15
| | | | | | | | | | | | | | | | | | | | | Image shaders need to do some geometry work before sampling the image colors: 1) determine dst coordinates 2) map back to src coordinates 3) tiling Feeding (x,y) through as (dr,dg) registers makes step 1) easy, perhaps trivial, while leaving (r,g,b,a) with their usual meanings, "the color", starting with the paint color. This is easy to tweak into something like (x+0.5, y+0.5, 1) in (dr,dg,db) once this lands. Mostly I just want to get all the uninteresting boilerplate out of the way first. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4791 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ia07815d942ded6672dc1df785caf80a508fc8f37 Reviewed-on: https://skia-review.googlesource.com/4791 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove xfermode from public apiGravatar Mike Reed2016-10-28
| | | | | | | | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4020 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I19cd056f2af778f10e8c6c2b7b2735593b43dbac Reviewed-on: https://skia-review.googlesource.com/4020 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* SkRasterPipeline::compile().Gravatar Mike Klein2016-10-25
| | | | | | | | | | | I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Move SkRasterPipeline further into SkOpts.Gravatar Mike Klein2016-10-25
| | | | | | | | | | | | | | | | | | The portable code now becomes entirely focused on enum+ptr descriptions, leaving the concrete implementation of the pipeline to SkOpts::run_pipeline(). As implemented, the concrete implementation is basically the same, with a little more type safety. Speed is essentially unchanged on my laptop, and that's having run_pipeline() rebuild its concrete state every call. There's room for improvement there if we split this into a compile_pipeline() / run_pipeline() sort of thing, which is my next planned CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3920 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ie4c554f51040426de7c5c144afa5d9d9d8938012 Reviewed-on: https://skia-review.googlesource.com/3920 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, without any 8x code enabled.Gravatar Mike Klein2016-10-12
| | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines, attempt 2"Gravatar Mike Klein2016-10-10
| | | | | | | | | | | | This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, attempt 2Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines"Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add an enum layer of indirection for stock raster pipeline stages.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Start moving SkRasterPipeline stages to SkOpts.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | | | This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Use sse4.2 CRC32 instructions to hash when available.Gravatar mtklein2016-08-08
| | | | | | | | | | | | About 9x faster than Murmur3 for long inputs. Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot Review-Url: https://codereview.chromium.org/2208903002
* Refactor of SkColorSpaceXformOptsGravatar msarett2016-08-02
| | | | | | | | | | | | | | | | | (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-29
| | | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b Review-Url: https://codereview.chromium.org/2174493002
* Revert of Add color space xform support to SkJpegCodec (includes F16!) ↵Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #9 id:260001 of https://codereview.chromium.org/2174493002/ ) Reason for revert: Breaking MSAN Original issue's description: > Add color space xform support to SkJpegCodec (includes F16!) > > Also changes SkColorXform to support: > RGBA->RGBA > RGBA->BGRA > > Instead of: > RGBA->SkPMColor > > TBR=reed@google.com > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b TBR=mtklein@google.com,reed@google.com,herb@google.com,brianosman@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2195523002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2174493002
* Miscellaneous color space refactorsGravatar msarett2016-07-21
| | | | | | | | | | | | | | | | (1) Use float matrix[16] everywhere (enables future code sharing). (2) SkColorLookUpTable refactors *** Store in a single allocation (like SkGammas) *** Eliminate fOutputChannels (we always require 3, and probably always will) (3) Change names of read_big_endian_* helpers BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2166093003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2166093003
* Add capability for SkColorXform to output half floatsGravatar msarett2016-07-15
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2147763002
* Remove bulk float <-> half routines. These are dead code.Gravatar mtklein2016-07-13
| | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152583002 Review-Url: https://codereview.chromium.org/2152583002
* Make all color xforms 'fast' (step 1)Gravatar msarett2016-07-11
| | | | | | | | | | | This refactors opt code to handle arbitrary src and dst gammas that are specified by tables. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130013002
* Support sRGB dsts in opt codeGravatar msarett2016-06-20
| | | | | | | | | | | | | | | 201295.jpg on HP z620 (300x280) QCMS Xform 0.418 ms Skia NEW Xform 0.378 ms Vs QCMS 1.11x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078623002
* Implement fast, correct gamma conversion for color xformsGravatar msarett2016-06-16
| | | | | | | | | | | | | | | | | | | | | | 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2060823003
* Optimize color xforms with 2.2 gammas for SSE2Gravatar msarett2016-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2046013002
* Add a hook for CPU-optimized sRGB-sRGB srcover.Gravatar mtklein2016-05-02
| | | | | | | | | | | | | | | | | | Herb's really starting to get serious about tweaking this, which becomes a lot easier when you've got SkOpts' runtime CPU detection. We should be able to optimize this usefully for SSSE3, SSE4.1, AVX, AVX2, or NEON. (We can of course implement a subset.) This function takes two counts to give us flexibility to write src patterns: nsrc >= ndst -> the usual srcover function nsrc < ndst -> repeat src until it fills dst nsrc << ndst -> possibly preprocess src into registers nsrc == 1 -> equivalent of blitrow_color32, srcover_1, etc. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939783003 Review-Url: https://codereview.chromium.org/1939783003
* Remove static initializer for SkOpts::Init()Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Static initializers run in a confusing unspecified order, so it's best to have as few of them as possible. Most tools and clients I can find already call SkGraphics::Init(), (or equivalently create an SkAutoGraphics) which calls SkOpts::Init(): - Chrome - Chrome renderer - Android - DM - nanobench - SampleApp - VisualBench - the old debugger Seems like the only thing relying on this static initializer today is the new debugger, fixed here. TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1903503002 Review URL: https://codereview.chromium.org/1903503002
* Graduate matrix map-point procs out of SkOpts.Gravatar mtklein2016-04-14
| | | | | | | | | | | | | | | These are implemented generically with Sk4s and don't benefit from anything fancier than vanilla SSE/NEON. This means there's no need to hide this code away in another file or behind a function pointer... it's readable and we have compile-time support for all the instructions it needs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1872193002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1872193002
* Port S32A_opaque blit row to SkOpts.Gravatar mtklein2016-03-23
| | | | | | | | | | This should be a pixel-for-pixel (i.e. bug-for-bug) port. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1820313002
* try plain-old code for sk_memset16/32 now that NEON is compile-timeGravatar mtklein2016-02-17
| | | | | | | | | | | | | Most of these implementations now just say "always inline". Let's see if we can get away with the simplicity of doing that all the time. These inlined implementations can autovectorize easily. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1639863002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1639863002
* skeleton for float <-> half optimized procsGravatar mtklein2016-02-09
| | | | | | | | | | | | | | | | Nothing fancy yet, just calls the serial code in a loop. I will try to folow this up with at least some of: - SSE2 version of serial code - NEON version of serial code - NEON version using vcvt.f32.f16/vcvt.f16.f32 - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph The last two are fastest but need runtime detection. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003 Review URL: https://codereview.chromium.org/1686543003
* Optimize CMYK->RGBA (BGRA) transform for jpeg decodesGravatar msarett2016-02-08
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.14x Dell Venue 8 0.12x CMYK Jpeg Decode Runtime Nexus 6P 0.81x Dell Venue 8 0.85x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1676773003
* NEON optimizations for GrayAlpha -> RGBA/BGRA Premul/UnpremulGravatar msarett2016-02-03
| | | | | | | | | | | | | | PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs) Regular Unpremul 0.91x Zero Init Unpremul 0.92x Regular Premul 0.84x Zero Init Premul 0.86x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1663623002
* NEON optimizations for gray -> RGBA (or BGRA) conversionsGravatar msarett2016-02-02
| | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P 0.32x Nexus 9 0.89x PNG Decode Time (for test set of gray encoded PNGs) Nexus 6P 0.88x Nexus 9 0.91x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1656383002
* de-proc sk_float_rsqrtGravatar mtklein2016-01-26
| | | | | | | | | | | | This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d Review URL: https://codereview.chromium.org/1616013003
* Revert of de-proc sk_float_rsqrt (patchset #3 id:40001 of ↵Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1616013003/ ) Reason for revert: This is somehow blocking the Google3 roll in ways neither Ben nor I understand. Precautionary revert... will try again Monday. Original issue's description: > de-proc sk_float_rsqrt > > This is the first of many little baby steps to have us stop runtime-detecting NEON. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/efcc125acd2d71eb077caf6db65fdd6b9eb1dc0d TBR=reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1629503002
* Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzlerGravatar msarett2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Swizzle Bench Runtime Nexus 6P xxx_xxxa 0.32x xxx_swaprb_xxxa 0.31x Swizzle Bench Runtime Nexus 9 xxx_xxxa 1.11x xxx_swaprb_xxxa 1.14x (This is a slow down.) Swizzle Bench Runtime Nexus 5 xxx_xxxa 0.12x xxx_swaprb 0.12x RGB PNG Decode Runtime Nexus 6P 0.94x Nexus 9 0.98x I don't know how to explain the fact that the Swizzle Bench was slower on Nexus 9, but the decode times got faster. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1618003002
* de-proc sk_float_rsqrtGravatar mtklein2016-01-22
| | | | | | | | | | This is the first of many little baby steps to have us stop runtime-detecting NEON. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1616013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1616013003
* Refactor swizzle names and types.Gravatar mtklein2016-01-22
| | | | | | | | | | | | | | | | | | | - Plant a flag to say "pretend all the inputs are RGBA". This is how libpng thinks. This is the opposite of what the implementation had been doing, so I've rearranged everything to reflect the new orientation. - Rewrite the names to be less mysterious looking. No more Xs. - Make the src type uniformly const void*, to allow for 888 (RGB) srcs. This should be performance and pixel neutral. (Please revert if it's not.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1626463002
* Set up some hooks for premul/swizzzle opts.Gravatar mtklein2016-01-07
| | | | | | | | | | | | | You can call these as SkOpts::premul_xxxa, SkOpts::swaprb_xxxa, etc. For now, I just backed the function pointers with some (untested) portable code, which may autovectorize. We can override with optimized versions in Init_ssse3() (in SkOpts_ssse3.cpp), Init_neon() (SkOpts_neon.cpp), etc. BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1569013002 Review URL: https://codereview.chromium.org/1569013002
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 Review URL: https://codereview.chromium.org/1415653003
* Revert of Make SkBlurImageFilter capable of cropping during blur (patchset ↵Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #16 id:400001 of https://codereview.chromium.org/1415653003/ ) Reason for revert: ASAN failures (see https://codereview.chromium.org/1415653003/) Original issue's description: > Make SkBlurImageFilter capable of cropping during blur (raster path) > > SkBlurImageFilter can currently only process a source image > which is larger than or equal to the destination rect. If > the source image (or crop rect) is smaller, it is padded > out to dest size with transparent black via the 6-param > version of applyCropRect(). > > Fixing this requires modifying all the flavours of RGBA > box_blur() to accept a src crop rect. > > BUG=skia:4502, skia:4526 > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/1b82ceb737c73327412f2e8a91748481e1aec9e4 TBR=mtklein@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4502, skia:4526 Review URL: https://codereview.chromium.org/1428053002
* Make SkBlurImageFilter capable of cropping during blur (raster path)Gravatar senorblanco2015-11-02
| | | | | | | | | | | | | | | | SkBlurImageFilter can currently only process a source image which is larger than or equal to the destination rect. If the source image (or crop rect) is smaller, it is padded out to dest size with transparent black via the 6-param version of applyCropRect(). Fixing this requires modifying all the flavours of RGBA box_blur() to accept a src crop rect. BUG=skia:4502, skia:4526 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1415653003
* Port SkMatrix opts to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | | | | | | | | | | | | | No changes to the code, just moved around. This will have the effect of enabling vectorized code on ARMv7. Should be no effect on ARMv8 or x86, which would have been vectorized already. nanobench --match mappoints changes on Nexus 5 (ARMv7): _affine: 132 -> 95 _scale: 118 -> 47 _trans: 60 -> 37 A teaser: We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL! (I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1320673014