aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper
Commit message (Collapse)AuthorAge
* Reland "start cleaning up non-skcms SkColorSpaceXforms"Gravatar Mike Klein2018-05-23
| | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 339133f82c30cd3080672db28e6f72c894cba05a Original change's description: > start cleaning up non-skcms SkColorSpaceXforms > > I think this gets rid of > - SkColorSpaceXform_Base > - SkColorSpaceXform_XYZ > - SkColorSpaceXform_A2B > and lots of support code. Might be more left to clean up? > > Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a > Reviewed-on: https://skia-review.googlesource.com/129512 > Commit-Queue: Brian Osman <brianosman@google.com> > Auto-Submit: Mike Klein <mtklein@chromium.org> > Reviewed-by: Brian Osman <brianosman@google.com> Change-Id: I33ee0d8bcfd72c401823a2e7d5168c9ecc9a5181 Reviewed-on: https://skia-review.googlesource.com/129624 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Revert "start cleaning up non-skcms SkColorSpaceXforms"Gravatar Mike Klein2018-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 339133f82c30cd3080672db28e6f72c894cba05a. Reason for revert: broke NinePatchDrawableTest.testGetPadding? stranger things have happened. Original change's description: > start cleaning up non-skcms SkColorSpaceXforms > > I think this gets rid of > - SkColorSpaceXform_Base > - SkColorSpaceXform_XYZ > - SkColorSpaceXform_A2B > and lots of support code. Might be more left to clean up? > > Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a > Reviewed-on: https://skia-review.googlesource.com/129512 > Commit-Queue: Brian Osman <brianosman@google.com> > Auto-Submit: Mike Klein <mtklein@chromium.org> > Reviewed-by: Brian Osman <brianosman@google.com> TBR=mtklein@chromium.org,brianosman@google.com Change-Id: I9e76195481b8658b34936aeece278d81c286c0fa No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/129680 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* start cleaning up non-skcms SkColorSpaceXformsGravatar Mike Klein2018-05-22
| | | | | | | | | | | | | | I think this gets rid of - SkColorSpaceXform_Base - SkColorSpaceXform_XYZ - SkColorSpaceXform_A2B and lots of support code. Might be more left to clean up? Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a Reviewed-on: https://skia-review.googlesource.com/129512 Commit-Queue: Brian Osman <brianosman@google.com> Auto-Submit: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* initial SkSLJIT checkinGravatar Ethan Nicholas2018-03-27
| | | | | | | | | Docs-Preview: https://skia.org/?cl=112204 Bug: skia: Change-Id: I10042a0200db00bd8ff8078467c409b1cf191f50 Reviewed-on: https://skia-review.googlesource.com/112204 Commit-Queue: Ethan Nicholas <ethannicholas@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Reland "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44 This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not by defined(__clang__). The two are usually the same, except when we opt Clang builds into JUMPER_IS_SCALAR artificially. Some Google3 builds use compilers old enough that they barf when compiling our NEON code. It's conceivably also possible to define JUMPER_IS_SCALAR yourself, but I don't think anyone does that. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d Reviewed-on: https://skia-review.googlesource.com/112744 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Revert "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44. Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112743 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Reland "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 Now with fixed #include paths in SkRasterPipeline_opts.h, and -ffp-contract=fast for the :hsw target to minimize diffs on non-Windows Clang AVX2/AVX-512 bots. Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 Reviewed-on: https://skia-review.googlesource.com/112742 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4. Reason for revert: wrong include path :/ Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112741 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkJumper stages normal Skia codeGravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Enough clients are using Clang now that we can say, use Clang to build if you want these software pipeline stages to go fast. This lets us drop the offline build aspect of SkJumper stages, instead building as part of Skia using the SkOpts framework. I think everything should work, except I've (temporarily) removed AVX-512 support. I will put this back in a follow up. I have had to drop Windows down to __vectorcall and our narrower stage calling convention that keeps the d-registers on the stack. I tried forcing sysv_abi, but that crashed Clang. :/ Added a TODO to up the same narrower stage calling convention for lowp stages... we just *don't* today, for no good reason. Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 Reviewed-on: https://skia-review.googlesource.com/110641 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Cleaning up SkColorSpaceXform a bitGravatar Brian Osman2018-03-06
| | | | | | | | | | | | | | | | Remove SkColorSpaceXform_base from the inheritance chain. Move some enums only used within XYZ to be class scoped. Eliminate redundant context structs, and just use the SkJumper types directly. SkColorSpaceXform_Base still exists, but just to hold a couple static functions. Trying to untangle the dst gamma table mess next. Bug: skia: Change-Id: I6d2b7807c33e61a0d7df74e334356567d8a2c0e0 Reviewed-on: https://skia-review.googlesource.com/112601 Commit-Queue: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* lowp impl for decal stagesGravatar Mike Reed2018-02-21
| | | | | | | | | Bug: skia: Change-Id: If6481d202bf22a95f1dea0c5bf7d84698b63869a Reviewed-on: https://skia-review.googlesource.com/109241 Commit-Queue: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add decal tilemode to shadersGravatar Mike Reed2018-02-16
| | | | | | | | | | | | | Plenty more to follow-up: - gradients - gpu impl Bug: skia:7638 Change-Id: I8e54fd0e24921f040f178c793b36c7fb855b136e Reviewed-on: https://skia-review.googlesource.com/107420 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* 1010102, 101010x, 888x in swGravatar Mike Klein2018-01-30
| | | | | | | | | | | | | | | | | | | Same sort of deal as before, now with all three new formats. While I was at it, I made sure RGBA 8888 and BGRA 8888 both work too. We don't want the 101010's in lowp, but 888x should be fine. After looking at the DM images on monitors at work, I decided to re-enable dither even on 10-bit images. Looking at the GMs in 888x or 101010x is interesting... I think we must not be clearing the memory allocated for layers? Seems like we want to allocate layers as 8888? Change-Id: I3a85b4f00877792a6425a7e7eb31eacb04ae9218 Reviewed-on: https://skia-review.googlesource.com/101640 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* pass size_t to ptr_and_ix()Gravatar Mike Klein2018-01-24
| | | | | | | | | | dx and dy are already size_t, so no need to demote them to int, and demoting to int gets dicey in terms of wrap-around. Change-Id: I98eb31ef7aa35fa2c2aa5be27cdc0b4dc7dfd008 Reviewed-on: https://skia-review.googlesource.com/99500 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Remove legacy 2pt conical gradientGravatar Yuqian Li2018-01-16
| | | | | | | | Bug: skia:7459 Change-Id: Iccc2588f80e22b13ed5d23656b8c75d7b7058a36 Reviewed-on: https://skia-review.googlesource.com/92700 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Yuqian Li <liyuqian@google.com>
* Update 2pt conical gradient in raster pipelineGravatar Yuqian Li2018-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | The updated algorithm matches our new GPU algorithm (https://skia.org/dev/design/conical) and it brings about 7%-26% speedup. In the next CL, I'll simplify the GPU code by reusing the CPU code in this CL. 7.20% faster in gradient_conical_clamp_hicolor 8.94% faster in gradient_conicalZero_clamp_hicolor 10.00% faster in gradient_conicalOut_clamp_hicolor 11.72% faster in gradient_conicalOutZero_clamp_hicolor 13.62% faster in gradient_conical_clamp_3color 16.52% faster in gradient_conicalZero_clamp_3color 17.48% faster in gradient_conical_clamp 17.70% faster in gradient_conical_clamp_shallow 20.60% faster in gradient_conicalOut_clamp_3color 20.98% faster in gradient_conicalOutZero_clamp_3color 21.79% faster in gradient_conicalZero_clamp 22.48% faster in gradient_conicalOut_clamp 26.13% faster in gradient_conicalOutZero_clamp Bug: skia: Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c Reviewed-on: https://skia-review.googlesource.com/90262 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* fold SkJumper_vectors.h into SkJumper_stages.cppGravatar Mike Klein2018-01-01
| | | | | | | | | This brings a little more symmetry to _stages.cpp and _stages_lowp.cpp. Change-Id: Icfcbd3f264ab97d8445ad8e14c25b4a07c780aea Reviewed-on: https://skia-review.googlesource.com/90030 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* attempt 3: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-22
| | | | | | | | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. This time, rewrite the loop over sample points to be a little friendlier to 32-bit x86 code generation. The previous version created an object file indirection feature build_stages.py can't handle. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb Reviewed-on: https://skia-review.googlesource.com/89180 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* GOOGLE3 -> SK_BUILD_FOR_GOOGLE3Gravatar Mike Klein2017-12-19
| | | | | | | | | | | | | This is more consistent with our other SK_BUILD_FOR_... macros, and less likely to collide with other preprocessor logic. (Luckily, this was defined in public.bzl, so we can do this all in one CL in the Skia repo.) Change-Id: I5f232888288c9c53fad445545d983d0fb0b4add8 Reviewed-on: https://skia-review.googlesource.com/86940 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "attempt 2: add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85. Reason for revert: Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Original change's description: > attempt 2: add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 > Reviewed-on: https://skia-review.googlesource.com/86860 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,liyuqian@google.com Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/86601 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* attempt 2: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 Reviewed-on: https://skia-review.googlesource.com/86860 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Rework out-of-gamut handling in SkRasterPipelineGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | Instead of trying to carefully manage the in-gamut / out-of-gamut state of the pipeline, let's do what a GPU would do, clamping to representable range in any float -> integer conversion. Most effects doing table lookups now clamp themselves internally, and the store_foo() methods clamp when the destination is fixed point. In turn the from_srgb() conversions and all future transfer function stages can care less about this stuff. If I'm thinking right, the _lowp side of things need not change at all, and that will soften the performance impact of this change. Anything that was fast to begin with was probably running a _lowp pipeline. Bug: skia:7419 Change-Id: Id2e080ac240a97b900a1ac131c85d9e15f70af32 Reviewed-on: https://skia-review.googlesource.com/85740 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* add 0->0 path to approx_powf()Gravatar Mike Klein2017-12-15
| | | | | | | | | | | The path involving approx_log2() and approx_pow2() does not produce 0. And it's probably not a good idea to think about what approx_log2(0) is anyway. Change-Id: If5f48298c5bd5565ae808ebdfbd02649f4dd3046 Reviewed-on: https://skia-review.googlesource.com/85840 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkColorSpace_New realGravatar Mike Klein2017-12-14
| | | | | | | | | | | | | | | | | | | Some interesting things are starting to fall out already, like the fact that I needed to add a gamma_dst stage to be able to draw into gamma-transfer-fn destinations. I've also had to pass an SkAlphaType through to the linearize functions so that they can maintain premul invariants. I'm not sure this is actually a good idea... if you can, please double- check my logic at SkRasterPipeline.cpp:128? If it's correct logic, I'm going to need to do it all over the place. But I imagine you don't do this and somehow get away with it. Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959 Reviewed-on: https://skia-review.googlesource.com/84680 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* disable f16<->f32 instructions on Google3Gravatar Mike Klein2017-12-13
| | | | | | | | | | | | We're still working out why these intrinsics aren't available. Long term options: - get the intrinsics to work - convert to inline asm Change-Id: I07edf1944daf01842f01b26ad874f62314d0f68f Reviewed-on: https://skia-review.googlesource.com/84222 Reviewed-by: Mike Klein <mtklein@chromium.org>
* JUMPER_IS_AVX2 -> JUMPER_IS_HSWGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | We need to be a bit more pedantic here to support builds that may be using AVX2 as part of their baseline but perhaps not enabling all the related features SkJumper would like to use. E.g. we've seen Tensorflow build with AVX2 and FMA, but not F16C. So check all three {AVX2,FMA,F16C}, and only then build stages in HSW mode. I've updated the define as a reminder. This only affects builds using these features for their _baseline_ stages... the offline-compiled stages in SkJumper_generated.S are not affected. Change-Id: I9bfb3bae3589d35043b748782cefa8c213726d6a Reviewed-on: https://skia-review.googlesource.com/84221 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* a little SkJumper tidy upGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | I noticed these little bits while working on that old-Clang fix. - We can force-inline anytime we've got Clang, not just when JUMPER_IS_OFFLINE. - The _aarch64 and _vfp4 WRAP functions are dead code, as they're never compiled offline now. Change-Id: I5850daded2ffcfe50ceeadc43f89fa8597df3387 Reviewed-on: https://skia-review.googlesource.com/84060 Commit-Queue: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* drop to scalar mode in some armv7 debug buildsGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | Older versions of Clang, at least vanilla 3.9 and "Apple LLVM version 8.1.0 (clang-802.0.42)" seem to crash when compiling SkJumper_stages.cpp with NEON for ARMv7 without at least -O1. So detect that case, and fall back to scalar code. Change-Id: I3c1595da491bef38c18f47f96690700c67fdc70e Reviewed-on: https://skia-review.googlesource.com/83980 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove vfpv4 requirement for SkJumper on ARMv7Gravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFPv4 gives us two interesting features: - FMA - f16<->f32 conversions Even without FMAs, NEON still has non-fused MLA instructions. We don't really care about the fusedness of those mul-adds, so losing FMA here is kind of no big deal. We already maintain portable code to do f16<->f32 conversions, so it's not much of a maintanence hit to use that instead of the native instructions. To my knowledge software F16 rendering is not a performance critical mode of operation for any of our users. This drops our minimum requirement to basically just having NEON. Devices like the Nexus 7 2012 will now take SkJumper fast paths instead of portable code. (Though actually, we've only ever required NEON for _lowp... only the float code also needed vfpv4). The main file to look at here is actually SkJumper_vectors.h, where you will see all the substantive changes. The rest just kind of tears down most of the old complexity, add adds ABI to put just a little of it back. :) Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4 Bug: skia: Reviewed-on: https://skia-review.googlesource.com/83521 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Use first/second instead of min/max in 2pt conical gradientGravatar Yuqian Li2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | Here's the tiny performance gain: $python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic" firstsecond (compared to master) is likely 4.23% faster in gradient_conicalOut_clamp_3color 4.23% faster in gradient_conicalOutZero_clamp_3color 4.79% faster in gradient_conical_clamp_shallow_dither 6.04% faster in gradient_conical_clamp_3color 6.04% faster in gradient_conicalZero_clamp_3color 6.42% faster in gradient_conicalOut_clamp 6.43% faster in gradient_conicalOutZero_clamp 6.74% faster in gradient_conical_clamp 6.98% faster in gradient_conical_clamp_shallow 6.98% faster in gradient_conicalZero_clamp Bug: skia: Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043 Reviewed-on: https://skia-review.googlesource.com/76561 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Revert "add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d. Reason for revert: lots of crashing GPU bots. Original change's description: > add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 > Reviewed-on: https://skia-review.googlesource.com/82100 > Reviewed-by: Yuqian Li <liyuqian@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/83340 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 Reviewed-on: https://skia-review.googlesource.com/82100 Reviewed-by: Yuqian Li <liyuqian@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* turn back on NEON lowp code...Gravatar Mike Klein2017-12-07
| | | | | | | | | | | | Turns out I was defining JUMPER_HAS_NEON_LOWP, but checking JUMPER_NEON_HAS_LOWP. 🤦 Change-Id: Ib328190ce35a367bf3d08d8e66f0ab8791ccb8b2 Reviewed-on: https://skia-review.googlesource.com/82320 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* AVX2 specialization for lowp gradient lookupGravatar Florin Malita2017-11-10
| | | | | | | | | | | 705.32 -> 457.76 gradient_sweep_clamp_3color 609.38 -> 345.34 gradient_radial1_clamp_3color Change-Id: I0165ac8f004ee095ada4f12b33db0a94ae39fca3 Reviewed-on: https://skia-review.googlesource.com/69902 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Revert "more powerful map()"Gravatar Greg Daniel2017-11-10
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit a3dd5ec3a769fb833ce77878cd4e551c15e5074d. Reason for revert: breaking build on Build-Debian9-Clang-x86_64_Release-Fast Original change's description: > more powerful map() > > Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608 > Reviewed-on: https://skia-review.googlesource.com/69901 > Reviewed-by: Florin Malita <fmalita@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,fmalita@chromium.org Change-Id: Ice989dd6a6b2786f318791dd91f2c06f689cb979 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/70105 Reviewed-by: Greg Daniel <egdaniel@google.com> Commit-Queue: Greg Daniel <egdaniel@google.com>
* more powerful map()Gravatar Mike Klein2017-11-10
| | | | | | | Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608 Reviewed-on: https://skia-review.googlesource.com/69901 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* AVX2 gather for lowpGravatar Florin Malita2017-11-10
| | | | | | | Change-Id: I15f83a72645fed0ed8dca9c9aad66c5db5eb247a Reviewed-on: https://skia-review.googlesource.com/69920 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add some lowp gradient stagesGravatar Mike Klein2017-11-03
| | | | | | | | | | | | | | I was originally going to add these to help test a lowp dither, but after looking at diffs I don't think lowp dither is a good idea. Non-dithered lowp gradients look fine to me so far. I'd have done conics, but they scare me. Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3 Reviewed-on: https://skia-review.googlesource.com/66460 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* another batch of lowp stagesGravatar Mike Klein2017-11-01
| | | | | | | | | | The 4444 image in all_bitmap_configs now draws slightly different before and after serialization. (It's serialized as 8888.) Still looks fine. Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd Reviewed-on: https://skia-review.googlesource.com/65960 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Some lowp refactoringGravatar Mike Klein2017-10-31
| | | | | | | | | | | | | | | | | | 1) Move a couple stages around in the enum to places that make more sense, and guass_a_to_rbga in the code too. 2) mirror the SkRasterPipeline stage enum with either: LOWP(st): the stage is implemented in low precision TODO(st): the stage should be lowp, but isn't NOPE(st): the stage shouldn't be done in lowp. 3) statically enforce that all stages are covered by one of LOWP, TODO, or NOPE. Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767 Reviewed-on: https://skia-review.googlesource.com/63800 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up SK_LEGACY_LOWP_STAGESGravatar Mike Klein2017-10-31
| | | | | | | Change-Id: I5629e74c4c13ddb9217fd3c2df3388030fa03f0c Reviewed-on: https://skia-review.googlesource.com/63780 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp shader stagesGravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make enabling LOWP stages simplerGravatar Mike Klein2017-10-23
| | | | | | | | | | | | | | | | This method is a little simpler macro-wise, and makes it easier to guard new lowp stages: LOWP(foo) LOWP(bar) #ifndef SK_LEGACY_LOWP_BAZ LOWP(baz) #endif Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab Reviewed-on: https://skia-review.googlesource.com/63141 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* translate+scale -> scale+translateGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | This is a no-op refactor. It's just always surprised me that the matrix_scale_translate stage expects [tx ty sx sy], when scales precede the translates in the names and in both normal row-major and column-major matrix layouts. This switches to [sx sy tx ty], scale then translate. Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6 Reviewed-on: https://skia-review.googlesource.com/62541 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on lowp shadersGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to assign types to the stages depending on their inputs and outputs: GG: x,y -> x,y GP: x,y -> r,g,b,a PP: r,g,b,a -> r,g,b,a (There are a couple other degenerate cases here, where a stage ignores its inputs or creates no outputs, but we can always just pretend their null input or output is one type or the other arbitrarily.) The GG stages will be pretty much entirely float code, and the GP stages a mix of float math and byte stuff. Since we've chosen U16 to match our register size in _lowp land, we'll unpack each F register across two of those for transport between stages. This is a notional, free operation in both directions. Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715 Reviewed-on: https://skia-review.googlesource.com/60800 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Feed seed_shader() iota through a context pointer.Gravatar Mike Klein2017-10-18
| | | | | | | | | | As this array grows longer it causes troublesome code generation when we're compiling offline, but it's easy as an argument. Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86 Reviewed-on: https://skia-review.googlesource.com/60564 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* rename (x,y) to (dx,dy)Gravatar Mike Klein2017-10-18
| | | | | | | | | | | | | | | Today (x,y) are the integer coordinates of the first destination pixel we're working on. By renaming them (dx,dy), we free up the names (x,y) for working (i.e. _source_) x and y. Until now we've generally just been continuing to call those (r,g), but in the _lowp code that won't be possible (r+g hold x together, b+a y) but we'll have the ability to just give them proper names x and y. Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f Reviewed-on: https://skia-review.googlesource.com/60820 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* set SkJumper_kMaxStride to 16Gravatar Mike Klein2017-10-17
| | | | | | | | | | | | | | | It's properly 16 today because of HSW/lowp stages handling 16 pixels at a time, but it hasn't yet had an effect on lowp so we didn't notice. As we add lowp shader stages this will start to matter, so might as well bump it up to 16 now. (One day _skx lowp stages could bump this up to 32.) Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a Reviewed-on: https://skia-review.googlesource.com/60565 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* check unpremul scale directly against infinityGravatar Mike Klein2017-10-12
| | | | | | | | | | | | | There are non-zero values of a that make infinite 1.0f/a. Let's just check for the real thing we care about, that scale is finite. Bug: skia:7123 Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1 Reviewed-on: https://skia-review.googlesource.com/58460 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>