aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper.cpp
Commit message (Collapse)AuthorAge
* initial SkSLJIT checkinGravatar Ethan Nicholas2018-03-27
| | | | | | | | | Docs-Preview: https://skia.org/?cl=112204 Bug: skia: Change-Id: I10042a0200db00bd8ff8078467c409b1cf191f50 Reviewed-on: https://skia-review.googlesource.com/112204 Commit-Queue: Ethan Nicholas <ethannicholas@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Reland "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44 This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not by defined(__clang__). The two are usually the same, except when we opt Clang builds into JUMPER_IS_SCALAR artificially. Some Google3 builds use compilers old enough that they barf when compiling our NEON code. It's conceivably also possible to define JUMPER_IS_SCALAR yourself, but I don't think anyone does that. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d Reviewed-on: https://skia-review.googlesource.com/112744 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Revert "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44. Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112743 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Reland "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 Now with fixed #include paths in SkRasterPipeline_opts.h, and -ffp-contract=fast for the :hsw target to minimize diffs on non-Windows Clang AVX2/AVX-512 bots. Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 Reviewed-on: https://skia-review.googlesource.com/112742 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4. Reason for revert: wrong include path :/ Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112741 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkJumper stages normal Skia codeGravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Enough clients are using Clang now that we can say, use Clang to build if you want these software pipeline stages to go fast. This lets us drop the offline build aspect of SkJumper stages, instead building as part of Skia using the SkOpts framework. I think everything should work, except I've (temporarily) removed AVX-512 support. I will put this back in a follow up. I have had to drop Windows down to __vectorcall and our narrower stage calling convention that keeps the d-registers on the stack. I tried forcing sysv_abi, but that crashed Clang. :/ Added a TODO to up the same narrower stage calling convention for lowp stages... we just *don't* today, for no good reason. Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 Reviewed-on: https://skia-review.googlesource.com/110641 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* lowp impl for decal stagesGravatar Mike Reed2018-02-21
| | | | | | | | | Bug: skia: Change-Id: If6481d202bf22a95f1dea0c5bf7d84698b63869a Reviewed-on: https://skia-review.googlesource.com/109241 Commit-Queue: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add decal tilemode to shadersGravatar Mike Reed2018-02-16
| | | | | | | | | | | | | Plenty more to follow-up: - gradients - gpu impl Bug: skia:7638 Change-Id: I8e54fd0e24921f040f178c793b36c7fb855b136e Reviewed-on: https://skia-review.googlesource.com/107420 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* 1010102, 101010x, 888x in swGravatar Mike Klein2018-01-30
| | | | | | | | | | | | | | | | | | | Same sort of deal as before, now with all three new formats. While I was at it, I made sure RGBA 8888 and BGRA 8888 both work too. We don't want the 101010's in lowp, but 888x should be fine. After looking at the DM images on monitors at work, I decided to re-enable dither even on 10-bit images. Looking at the GMs in 888x or 101010x is interesting... I think we must not be clearing the memory allocated for layers? Seems like we want to allocate layers as 8888? Change-Id: I3a85b4f00877792a6425a7e7eb31eacb04ae9218 Reviewed-on: https://skia-review.googlesource.com/101640 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Remove legacy 2pt conical gradientGravatar Yuqian Li2018-01-16
| | | | | | | | Bug: skia:7459 Change-Id: Iccc2588f80e22b13ed5d23656b8c75d7b7058a36 Reviewed-on: https://skia-review.googlesource.com/92700 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Yuqian Li <liyuqian@google.com>
* Update 2pt conical gradient in raster pipelineGravatar Yuqian Li2018-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | The updated algorithm matches our new GPU algorithm (https://skia.org/dev/design/conical) and it brings about 7%-26% speedup. In the next CL, I'll simplify the GPU code by reusing the CPU code in this CL. 7.20% faster in gradient_conical_clamp_hicolor 8.94% faster in gradient_conicalZero_clamp_hicolor 10.00% faster in gradient_conicalOut_clamp_hicolor 11.72% faster in gradient_conicalOutZero_clamp_hicolor 13.62% faster in gradient_conical_clamp_3color 16.52% faster in gradient_conicalZero_clamp_3color 17.48% faster in gradient_conical_clamp 17.70% faster in gradient_conical_clamp_shallow 20.60% faster in gradient_conicalOut_clamp_3color 20.98% faster in gradient_conicalOutZero_clamp_3color 21.79% faster in gradient_conicalZero_clamp 22.48% faster in gradient_conicalOut_clamp 26.13% faster in gradient_conicalOutZero_clamp Bug: skia: Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c Reviewed-on: https://skia-review.googlesource.com/90262 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* attempt 3: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-22
| | | | | | | | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. This time, rewrite the loop over sample points to be a little friendlier to 32-bit x86 code generation. The previous version created an object file indirection feature build_stages.py can't handle. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb Reviewed-on: https://skia-review.googlesource.com/89180 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "attempt 2: add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85. Reason for revert: Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Original change's description: > attempt 2: add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 > Reviewed-on: https://skia-review.googlesource.com/86860 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,liyuqian@google.com Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/86601 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* attempt 2: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 Reviewed-on: https://skia-review.googlesource.com/86860 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkColorSpace_New realGravatar Mike Klein2017-12-14
| | | | | | | | | | | | | | | | | | | Some interesting things are starting to fall out already, like the fact that I needed to add a gamma_dst stage to be able to draw into gamma-transfer-fn destinations. I've also had to pass an SkAlphaType through to the linearize functions so that they can maintain premul invariants. I'm not sure this is actually a good idea... if you can, please double- check my logic at SkRasterPipeline.cpp:128? If it's correct logic, I'm going to need to do it all over the place. But I imagine you don't do this and somehow get away with it. Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959 Reviewed-on: https://skia-review.googlesource.com/84680 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* remove vfpv4 requirement for SkJumper on ARMv7Gravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFPv4 gives us two interesting features: - FMA - f16<->f32 conversions Even without FMAs, NEON still has non-fused MLA instructions. We don't really care about the fusedness of those mul-adds, so losing FMA here is kind of no big deal. We already maintain portable code to do f16<->f32 conversions, so it's not much of a maintanence hit to use that instead of the native instructions. To my knowledge software F16 rendering is not a performance critical mode of operation for any of our users. This drops our minimum requirement to basically just having NEON. Devices like the Nexus 7 2012 will now take SkJumper fast paths instead of portable code. (Though actually, we've only ever required NEON for _lowp... only the float code also needed vfpv4). The main file to look at here is actually SkJumper_vectors.h, where you will see all the substantive changes. The rest just kind of tears down most of the old complexity, add adds ABI to put just a little of it back. :) Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4 Bug: skia: Reviewed-on: https://skia-review.googlesource.com/83521 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Use first/second instead of min/max in 2pt conical gradientGravatar Yuqian Li2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | Here's the tiny performance gain: $python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic" firstsecond (compared to master) is likely 4.23% faster in gradient_conicalOut_clamp_3color 4.23% faster in gradient_conicalOutZero_clamp_3color 4.79% faster in gradient_conical_clamp_shallow_dither 6.04% faster in gradient_conical_clamp_3color 6.04% faster in gradient_conicalZero_clamp_3color 6.42% faster in gradient_conicalOut_clamp 6.43% faster in gradient_conicalOutZero_clamp 6.74% faster in gradient_conical_clamp 6.98% faster in gradient_conical_clamp_shallow 6.98% faster in gradient_conicalZero_clamp Bug: skia: Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043 Reviewed-on: https://skia-review.googlesource.com/76561 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Revert "add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d. Reason for revert: lots of crashing GPU bots. Original change's description: > add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 > Reviewed-on: https://skia-review.googlesource.com/82100 > Reviewed-by: Yuqian Li <liyuqian@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/83340 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 Reviewed-on: https://skia-review.googlesource.com/82100 Reviewed-by: Yuqian Li <liyuqian@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* turn back on NEON lowp code...Gravatar Mike Klein2017-12-07
| | | | | | | | | | | | Turns out I was defining JUMPER_HAS_NEON_LOWP, but checking JUMPER_NEON_HAS_LOWP. 🤦 Change-Id: Ib328190ce35a367bf3d08d8e66f0ab8791ccb8b2 Reviewed-on: https://skia-review.googlesource.com/82320 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add some lowp gradient stagesGravatar Mike Klein2017-11-03
| | | | | | | | | | | | | | I was originally going to add these to help test a lowp dither, but after looking at diffs I don't think lowp dither is a good idea. Non-dithered lowp gradients look fine to me so far. I'd have done conics, but they scare me. Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3 Reviewed-on: https://skia-review.googlesource.com/66460 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* another batch of lowp stagesGravatar Mike Klein2017-11-01
| | | | | | | | | | The 4444 image in all_bitmap_configs now draws slightly different before and after serialization. (It's serialized as 8888.) Still looks fine. Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd Reviewed-on: https://skia-review.googlesource.com/65960 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Some lowp refactoringGravatar Mike Klein2017-10-31
| | | | | | | | | | | | | | | | | | 1) Move a couple stages around in the enum to places that make more sense, and guass_a_to_rbga in the code too. 2) mirror the SkRasterPipeline stage enum with either: LOWP(st): the stage is implemented in low precision TODO(st): the stage should be lowp, but isn't NOPE(st): the stage shouldn't be done in lowp. 3) statically enforce that all stages are covered by one of LOWP, TODO, or NOPE. Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767 Reviewed-on: https://skia-review.googlesource.com/63800 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up SK_LEGACY_LOWP_STAGESGravatar Mike Klein2017-10-31
| | | | | | | Change-Id: I5629e74c4c13ddb9217fd3c2df3388030fa03f0c Reviewed-on: https://skia-review.googlesource.com/63780 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp shader stagesGravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make enabling LOWP stages simplerGravatar Mike Klein2017-10-23
| | | | | | | | | | | | | | | | This method is a little simpler macro-wise, and makes it easier to guard new lowp stages: LOWP(foo) LOWP(bar) #ifndef SK_LEGACY_LOWP_BAZ LOWP(baz) #endif Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab Reviewed-on: https://skia-review.googlesource.com/63141 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on lowp shadersGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to assign types to the stages depending on their inputs and outputs: GG: x,y -> x,y GP: x,y -> r,g,b,a PP: r,g,b,a -> r,g,b,a (There are a couple other degenerate cases here, where a stage ignores its inputs or creates no outputs, but we can always just pretend their null input or output is one type or the other arbitrarily.) The GG stages will be pretty much entirely float code, and the GP stages a mix of float math and byte stuff. Since we've chosen U16 to match our register size in _lowp land, we'll unpack each F register across two of those for transport between stages. This is a notional, free operation in both directions. Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715 Reviewed-on: https://skia-review.googlesource.com/60800 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* reformat hard-to-read preprocessor in SkJumper.cppGravatar Mike Klein2017-10-05
| | | | | | | Change-Id: I9a140e342e7b12b1cbb09503ca8fc03016717784 Reviewed-on: https://skia-review.googlesource.com/55701 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _skx stagesGravatar Mike Klein2017-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This just makes sure all the plumbing is in place to use the Skylake Xeon subset of AVX-512 instructions. So far, - no Windows - no lowp - nothing explicitly making use of AVX-512 registers or instructions This initial pass should run essentially identically to the _hsw AVX2 code we've been using previously. Clang _does_ use AVX-512-only instructions to implement some of the higher-level concepts we've coded, but it's really a pretty subtle difference. Next steps will bump N from 8 to 16 and start threading through an AVX-512-friendly mask instead of tail. I'll also want to take a harder look at how we do blending like if_then_else()... the default codegen here doesn't really take advantage of AVX-512 the way I'd like here. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2 Reviewed-on: https://skia-review.googlesource.com/54062 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* count total non-lowp runsGravatar Mike Klein2017-09-21
| | | | | | | Change-Id: I2e24c990983ea93cbd7983c9c4e88120c2b7f358 Reviewed-on: https://skia-review.googlesource.com/49768 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Implement some easy _lowp stages.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | - load_565 allows 565-src sprite blits - scale_565 / lerp_565 allow subpixel text - luminance_to_alpha is a color filter, and lets us write grey 8 And update CachedDecodingPixelRefTest with a yet more robust color. Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526 Reviewed-on: https://skia-review.googlesource.com/47021 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Disable SkJumper assembly in cross builds for now.Gravatar Nico Weber2017-09-16
| | | | | | | | Bug: chromium:762167 Change-Id: Ia23f6dbfc0466aef4ca9d1a5b9ff343d79dc83bb Reviewed-on: https://skia-review.googlesource.com/47460 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make most of SkColorPriv.h privateGravatar Cary Clark2017-09-15
| | | | | | | | | | | | | created new file src/core/SkColorData.h for internal consumption. Note that many of the functions there are unused as well. Bug: skia: 6898 R: reed@google.com Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d Reviewed-on: https://skia-review.googlesource.com/46848 Commit-Queue: Cary Clark <caryclark@google.com> Reviewed-by: Mike Reed <reed@google.com>
* clean up SK_JUMPER_LEGACY_8BITGravatar Mike Klein2017-09-15
| | | | | | | Change-Id: I4d4093fcfc839f6e7468b7d9f89bb903186ab68d Reviewed-on: https://skia-review.googlesource.com/46761 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* grand unifried lowp stagesGravatar Mike Klein2017-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my laptop. I'm feeling confident that we can make this perform well. After looking at performance a bit more today, it looks like everything is within what I'd consider comparable in performance, especially on ARM. On x86-64 it looks like big bulk blits get a little slower and small mask blits get a little faster. Quality looks good, and maybe improved for 565. There are fewer platform-specific differences now in _lowp, and I think they're few enough now that we could even consider completing the unification by folding the 8-bit and float code together. Rename "div255()" to "rebias()", slap on a few coats of paint... Guarded for Chrome with SK_JUMPER_LEGACY_LOWP. Change-Id: I36309c07cf736f3cb31952cca66030ad56026318 Reviewed-on: https://skia-review.googlesource.com/45982 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up SK_JUMPER_LEGACY_X86_8BITGravatar Mike Klein2017-09-06
| | | | | | | Change-Id: I26c22c085efd70b65de927a9a8a041d03c170f2a Reviewed-on: https://skia-review.googlesource.com/42760 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* add 8bit stages for load/store 565Gravatar Mike Reed2017-08-31
| | | | | | | | | | | approx 2.5x faster on arm64 for sprite 8888 --> 565 blits Bug: skia: Change-Id: I524f993fee16196385dc07cbec39ef378b1301e5 Reviewed-on: https://skia-review.googlesource.com/41162 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* 32-bit x86 8-bit stagesGravatar Mike Klein2017-08-30
| | | | | | | | | | | Shouldn't be anything tricky here. Guarded by SK_JUMPER_LEGACY_X86_8BIT for (Win) layout tests. Change-Id: I7580c7c18d1721f1301904c049ea2e59e9bda5d9 Reviewed-on: https://skia-review.googlesource.com/40692 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* no more need for a constants pointerGravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | The only reason we were keeping SkJumper_constants around is that it was hard to get float/integer iota vectors on arm64 without relocations. Now that we're compiling arm64 normally as part of Skia, we don't have to worry about relocations. This means we can kill the struct and stop passing around that pointer. Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce Reviewed-on: https://skia-review.googlesource.com/40200 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* use NEON 8-bit stages on ARMv7 tooGravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | | | We don't really use anything very ARMv8 specific in the 8-bit NEON stages, so we can just naturally extend what we're doing to ARMv7 too. Note that unlike the float stages, we're not requiring VFPv4 either, just NEON. VFPv4 is for FMA and F16<->F32 conversion, both of which are unnecessary for the integer pipeline. GMs and perf improvement are similar to the previous ARMv8 change. Change-Id: Id618801ea1920564c1deee144a640a4133c4505f Reviewed-on: https://skia-review.googlesource.com/39840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Revert "Revert "8-bit jumper on armv8""Gravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | | | This reverts commit 6d13575108299951ecdfba6d85c915fcec2bc028. Now with guards for "errors" like this: external/skia/src/jumper/SkJumper_stages_8bit.cpp:240:50: error: 'memcpy' called with size bigger than buffer case 12: memcpy(&v, src, 12*sizeof(T)); break; This code is unreachable and generally removed by Clang's optimizer anyway... as far as I can tell the code generation diff is arbitrary. Change-Id: I6216567caaa6166f71258bd25343a09e93892a10 Reviewed-on: https://skia-review.googlesource.com/39961 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "8-bit jumper on armv8"Gravatar Derek Sollenberger2017-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 08133583d5e1cdfdcc41b4bb078fcfb64137f058. Reason for revert: Blocking Android Autoroller on compile error. Original change's description: > 8-bit jumper on armv8 > > The GM diffs are all minor and what you'd expect. > > I did a quick performance sanity check, which also looks fine. > > $ out/ok bench rp filter:search=Modulate > [blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100 > [blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100 > ~~~> > [blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100 > [blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100 > > This isn't even really the fastest we can make 8-bit go on ARMv8; > it's actually much more natural to work de-interlaced there. Lots > of room to follow up. > > Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7 > Reviewed-on: https://skia-review.googlesource.com/39740 > Reviewed-by: Florin Malita <fmalita@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org,reed@google.com Change-Id: I71425d8b7fbb66be5cb50025871dd81358111da4 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/39980 Reviewed-by: Derek Sollenberger <djsollen@google.com> Commit-Queue: Derek Sollenberger <djsollen@google.com>
* 8-bit jumper on armv8Gravatar Mike Klein2017-08-28
| | | | | | | | | | | | | | | | | | | | | | The GM diffs are all minor and what you'd expect. I did a quick performance sanity check, which also looks fine. $ out/ok bench rp filter:search=Modulate [blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100 [blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100 ~~~> [blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100 [blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100 This isn't even really the fastest we can make 8-bit go on ARMv8; it's actually much more natural to work de-interlaced there. Lots of room to follow up. Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7 Reviewed-on: https://skia-review.googlesource.com/39740 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove aarch64 offline compilationGravatar Mike Klein2017-08-28
| | | | | | | | | | | | | | | The baseline compiled into Skia is now pretty much identical. Minor diffs due to the offline code using -ffp-contract=fast, and the baseline not. Explicit calls to fma() are still FMAs, but we're no longer letting the compiler uncover FMAs we didn't explicitly call out. If this goes well, we should be able to turn on the 8-bit pipeline. Change-Id: I8f73157cfce7373574c20f6435fe86b46477afa9 Reviewed-on: https://skia-review.googlesource.com/39520 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rework plus blend modeGravatar Mike Klein2017-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most interesting parts of this are how plus interacts with partial coverage. Plus needs its clamp to happen after the lerp. Luckily, some of its math folds away: d' = clamp[ d*(1-c) + (s+d)*c ] == clamp[ d - dc + sc + dc ] == clamp[ d + sc ] What's nice there is that coverage can be folded into the src term. This suggests that we can re-write the plus stage to clamp internally (and thus, be viable for 8-bit) if we always pre-scale with coverage. We don't have a way to pre-scale with 565 coverage until now, but it's only a step or two away from there. We can use the alternate formulation we derived for alpha for lerp_565, calculating the alpha coverage from red, green, and blue coverages _and_ the values of src and dst alpha. While we already pre-scale srcover today for 8-bit or constant coverage, we cannot do the same for 565. When evaluating the expression d' = s + (1-a)d we need the a term to be pre-scaled with red's coverage when calculating dr', with blue's when calculating db', etc. Essentially we need to carry around a bunch of extra values, and we've got no way to do that. So instead, we'll just carefully pre-scale plus with any coverage, and keep post-lerping srcover when we have 565 coverage. Change-Id: I7a7a52eec7d482e1b98bb8a01ea0a3d5e67bef65 Reviewed-on: https://skia-review.googlesource.com/38300 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Remove SK_SUPPORT_LEGACY_RP_BLENDS-guarded codeGravatar Florin Malita2017-08-24
| | | | | | | | | | The flag is no longer used. Change-Id: I39156ef5683538263c2302f2fe3ba779e55dbc47 Reviewed-on: https://skia-review.googlesource.com/38360 Commit-Queue: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* rename confusing lowp guardGravatar Mike Klein2017-08-15
| | | | | | | Change-Id: I346429015e5f902b0a35663e140bb9a025c4220e Reviewed-on: https://skia-review.googlesource.com/34680 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Lowp overlay, hardlight stagesGravatar Florin Malita2017-08-14
| | | | | | | | | | | | | | | | | | | | Before: micros bench 7669.09 ? blendmode_rect_HardLight 8888 8707.13 ? blendmode_rect_Overlay 8888 After: micros bench 6679.60 ? blendmode_rect_HardLight 8888 6789.57 ? blendmode_rect_Overlay 8888 Change-Id: I52f389253fa07dafe18e572af550af7387264a16 Reviewed-on: https://skia-review.googlesource.com/34280 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
* lowp: lighten, difference, exclusionGravatar Florin Malita2017-08-14
| | | | | | | Change-Id: I5773cf831c7e41a932bee1f2c6830085fb7db025 Reviewed-on: https://skia-review.googlesource.com/33764 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>