aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper
Commit message (Collapse)AuthorAge
* Update 2pt conical gradient in raster pipelineGravatar Yuqian Li2018-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | The updated algorithm matches our new GPU algorithm (https://skia.org/dev/design/conical) and it brings about 7%-26% speedup. In the next CL, I'll simplify the GPU code by reusing the CPU code in this CL. 7.20% faster in gradient_conical_clamp_hicolor 8.94% faster in gradient_conicalZero_clamp_hicolor 10.00% faster in gradient_conicalOut_clamp_hicolor 11.72% faster in gradient_conicalOutZero_clamp_hicolor 13.62% faster in gradient_conical_clamp_3color 16.52% faster in gradient_conicalZero_clamp_3color 17.48% faster in gradient_conical_clamp 17.70% faster in gradient_conical_clamp_shallow 20.60% faster in gradient_conicalOut_clamp_3color 20.98% faster in gradient_conicalOutZero_clamp_3color 21.79% faster in gradient_conicalZero_clamp 22.48% faster in gradient_conicalOut_clamp 26.13% faster in gradient_conicalOutZero_clamp Bug: skia: Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c Reviewed-on: https://skia-review.googlesource.com/90262 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* fold SkJumper_vectors.h into SkJumper_stages.cppGravatar Mike Klein2018-01-01
| | | | | | | | | This brings a little more symmetry to _stages.cpp and _stages_lowp.cpp. Change-Id: Icfcbd3f264ab97d8445ad8e14c25b4a07c780aea Reviewed-on: https://skia-review.googlesource.com/90030 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* attempt 3: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-22
| | | | | | | | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. This time, rewrite the loop over sample points to be a little friendlier to 32-bit x86 code generation. The previous version created an object file indirection feature build_stages.py can't handle. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb Reviewed-on: https://skia-review.googlesource.com/89180 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* GOOGLE3 -> SK_BUILD_FOR_GOOGLE3Gravatar Mike Klein2017-12-19
| | | | | | | | | | | | | This is more consistent with our other SK_BUILD_FOR_... macros, and less likely to collide with other preprocessor logic. (Luckily, this was defined in public.bzl, so we can do this all in one CL in the Skia repo.) Change-Id: I5f232888288c9c53fad445545d983d0fb0b4add8 Reviewed-on: https://skia-review.googlesource.com/86940 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "attempt 2: add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85. Reason for revert: Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Original change's description: > attempt 2: add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 > Reviewed-on: https://skia-review.googlesource.com/86860 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,liyuqian@google.com Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/86601 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* attempt 2: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 Reviewed-on: https://skia-review.googlesource.com/86860 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Rework out-of-gamut handling in SkRasterPipelineGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | Instead of trying to carefully manage the in-gamut / out-of-gamut state of the pipeline, let's do what a GPU would do, clamping to representable range in any float -> integer conversion. Most effects doing table lookups now clamp themselves internally, and the store_foo() methods clamp when the destination is fixed point. In turn the from_srgb() conversions and all future transfer function stages can care less about this stuff. If I'm thinking right, the _lowp side of things need not change at all, and that will soften the performance impact of this change. Anything that was fast to begin with was probably running a _lowp pipeline. Bug: skia:7419 Change-Id: Id2e080ac240a97b900a1ac131c85d9e15f70af32 Reviewed-on: https://skia-review.googlesource.com/85740 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* add 0->0 path to approx_powf()Gravatar Mike Klein2017-12-15
| | | | | | | | | | | The path involving approx_log2() and approx_pow2() does not produce 0. And it's probably not a good idea to think about what approx_log2(0) is anyway. Change-Id: If5f48298c5bd5565ae808ebdfbd02649f4dd3046 Reviewed-on: https://skia-review.googlesource.com/85840 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkColorSpace_New realGravatar Mike Klein2017-12-14
| | | | | | | | | | | | | | | | | | | Some interesting things are starting to fall out already, like the fact that I needed to add a gamma_dst stage to be able to draw into gamma-transfer-fn destinations. I've also had to pass an SkAlphaType through to the linearize functions so that they can maintain premul invariants. I'm not sure this is actually a good idea... if you can, please double- check my logic at SkRasterPipeline.cpp:128? If it's correct logic, I'm going to need to do it all over the place. But I imagine you don't do this and somehow get away with it. Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959 Reviewed-on: https://skia-review.googlesource.com/84680 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* disable f16<->f32 instructions on Google3Gravatar Mike Klein2017-12-13
| | | | | | | | | | | | We're still working out why these intrinsics aren't available. Long term options: - get the intrinsics to work - convert to inline asm Change-Id: I07edf1944daf01842f01b26ad874f62314d0f68f Reviewed-on: https://skia-review.googlesource.com/84222 Reviewed-by: Mike Klein <mtklein@chromium.org>
* JUMPER_IS_AVX2 -> JUMPER_IS_HSWGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | We need to be a bit more pedantic here to support builds that may be using AVX2 as part of their baseline but perhaps not enabling all the related features SkJumper would like to use. E.g. we've seen Tensorflow build with AVX2 and FMA, but not F16C. So check all three {AVX2,FMA,F16C}, and only then build stages in HSW mode. I've updated the define as a reminder. This only affects builds using these features for their _baseline_ stages... the offline-compiled stages in SkJumper_generated.S are not affected. Change-Id: I9bfb3bae3589d35043b748782cefa8c213726d6a Reviewed-on: https://skia-review.googlesource.com/84221 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* a little SkJumper tidy upGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | I noticed these little bits while working on that old-Clang fix. - We can force-inline anytime we've got Clang, not just when JUMPER_IS_OFFLINE. - The _aarch64 and _vfp4 WRAP functions are dead code, as they're never compiled offline now. Change-Id: I5850daded2ffcfe50ceeadc43f89fa8597df3387 Reviewed-on: https://skia-review.googlesource.com/84060 Commit-Queue: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* drop to scalar mode in some armv7 debug buildsGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | Older versions of Clang, at least vanilla 3.9 and "Apple LLVM version 8.1.0 (clang-802.0.42)" seem to crash when compiling SkJumper_stages.cpp with NEON for ARMv7 without at least -O1. So detect that case, and fall back to scalar code. Change-Id: I3c1595da491bef38c18f47f96690700c67fdc70e Reviewed-on: https://skia-review.googlesource.com/83980 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove vfpv4 requirement for SkJumper on ARMv7Gravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFPv4 gives us two interesting features: - FMA - f16<->f32 conversions Even without FMAs, NEON still has non-fused MLA instructions. We don't really care about the fusedness of those mul-adds, so losing FMA here is kind of no big deal. We already maintain portable code to do f16<->f32 conversions, so it's not much of a maintanence hit to use that instead of the native instructions. To my knowledge software F16 rendering is not a performance critical mode of operation for any of our users. This drops our minimum requirement to basically just having NEON. Devices like the Nexus 7 2012 will now take SkJumper fast paths instead of portable code. (Though actually, we've only ever required NEON for _lowp... only the float code also needed vfpv4). The main file to look at here is actually SkJumper_vectors.h, where you will see all the substantive changes. The rest just kind of tears down most of the old complexity, add adds ABI to put just a little of it back. :) Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4 Bug: skia: Reviewed-on: https://skia-review.googlesource.com/83521 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Use first/second instead of min/max in 2pt conical gradientGravatar Yuqian Li2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | Here's the tiny performance gain: $python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic" firstsecond (compared to master) is likely 4.23% faster in gradient_conicalOut_clamp_3color 4.23% faster in gradient_conicalOutZero_clamp_3color 4.79% faster in gradient_conical_clamp_shallow_dither 6.04% faster in gradient_conical_clamp_3color 6.04% faster in gradient_conicalZero_clamp_3color 6.42% faster in gradient_conicalOut_clamp 6.43% faster in gradient_conicalOutZero_clamp 6.74% faster in gradient_conical_clamp 6.98% faster in gradient_conical_clamp_shallow 6.98% faster in gradient_conicalZero_clamp Bug: skia: Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043 Reviewed-on: https://skia-review.googlesource.com/76561 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Revert "add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d. Reason for revert: lots of crashing GPU bots. Original change's description: > add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 > Reviewed-on: https://skia-review.googlesource.com/82100 > Reviewed-by: Yuqian Li <liyuqian@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/83340 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 Reviewed-on: https://skia-review.googlesource.com/82100 Reviewed-by: Yuqian Li <liyuqian@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* turn back on NEON lowp code...Gravatar Mike Klein2017-12-07
| | | | | | | | | | | | Turns out I was defining JUMPER_HAS_NEON_LOWP, but checking JUMPER_NEON_HAS_LOWP. 🤦 Change-Id: Ib328190ce35a367bf3d08d8e66f0ab8791ccb8b2 Reviewed-on: https://skia-review.googlesource.com/82320 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* AVX2 specialization for lowp gradient lookupGravatar Florin Malita2017-11-10
| | | | | | | | | | | 705.32 -> 457.76 gradient_sweep_clamp_3color 609.38 -> 345.34 gradient_radial1_clamp_3color Change-Id: I0165ac8f004ee095ada4f12b33db0a94ae39fca3 Reviewed-on: https://skia-review.googlesource.com/69902 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Revert "more powerful map()"Gravatar Greg Daniel2017-11-10
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit a3dd5ec3a769fb833ce77878cd4e551c15e5074d. Reason for revert: breaking build on Build-Debian9-Clang-x86_64_Release-Fast Original change's description: > more powerful map() > > Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608 > Reviewed-on: https://skia-review.googlesource.com/69901 > Reviewed-by: Florin Malita <fmalita@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,fmalita@chromium.org Change-Id: Ice989dd6a6b2786f318791dd91f2c06f689cb979 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/70105 Reviewed-by: Greg Daniel <egdaniel@google.com> Commit-Queue: Greg Daniel <egdaniel@google.com>
* more powerful map()Gravatar Mike Klein2017-11-10
| | | | | | | Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608 Reviewed-on: https://skia-review.googlesource.com/69901 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* AVX2 gather for lowpGravatar Florin Malita2017-11-10
| | | | | | | Change-Id: I15f83a72645fed0ed8dca9c9aad66c5db5eb247a Reviewed-on: https://skia-review.googlesource.com/69920 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add some lowp gradient stagesGravatar Mike Klein2017-11-03
| | | | | | | | | | | | | | I was originally going to add these to help test a lowp dither, but after looking at diffs I don't think lowp dither is a good idea. Non-dithered lowp gradients look fine to me so far. I'd have done conics, but they scare me. Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3 Reviewed-on: https://skia-review.googlesource.com/66460 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* another batch of lowp stagesGravatar Mike Klein2017-11-01
| | | | | | | | | | The 4444 image in all_bitmap_configs now draws slightly different before and after serialization. (It's serialized as 8888.) Still looks fine. Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd Reviewed-on: https://skia-review.googlesource.com/65960 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Some lowp refactoringGravatar Mike Klein2017-10-31
| | | | | | | | | | | | | | | | | | 1) Move a couple stages around in the enum to places that make more sense, and guass_a_to_rbga in the code too. 2) mirror the SkRasterPipeline stage enum with either: LOWP(st): the stage is implemented in low precision TODO(st): the stage should be lowp, but isn't NOPE(st): the stage shouldn't be done in lowp. 3) statically enforce that all stages are covered by one of LOWP, TODO, or NOPE. Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767 Reviewed-on: https://skia-review.googlesource.com/63800 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up SK_LEGACY_LOWP_STAGESGravatar Mike Klein2017-10-31
| | | | | | | Change-Id: I5629e74c4c13ddb9217fd3c2df3388030fa03f0c Reviewed-on: https://skia-review.googlesource.com/63780 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp shader stagesGravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make enabling LOWP stages simplerGravatar Mike Klein2017-10-23
| | | | | | | | | | | | | | | | This method is a little simpler macro-wise, and makes it easier to guard new lowp stages: LOWP(foo) LOWP(bar) #ifndef SK_LEGACY_LOWP_BAZ LOWP(baz) #endif Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab Reviewed-on: https://skia-review.googlesource.com/63141 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* translate+scale -> scale+translateGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | This is a no-op refactor. It's just always surprised me that the matrix_scale_translate stage expects [tx ty sx sy], when scales precede the translates in the names and in both normal row-major and column-major matrix layouts. This switches to [sx sy tx ty], scale then translate. Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6 Reviewed-on: https://skia-review.googlesource.com/62541 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on lowp shadersGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to assign types to the stages depending on their inputs and outputs: GG: x,y -> x,y GP: x,y -> r,g,b,a PP: r,g,b,a -> r,g,b,a (There are a couple other degenerate cases here, where a stage ignores its inputs or creates no outputs, but we can always just pretend their null input or output is one type or the other arbitrarily.) The GG stages will be pretty much entirely float code, and the GP stages a mix of float math and byte stuff. Since we've chosen U16 to match our register size in _lowp land, we'll unpack each F register across two of those for transport between stages. This is a notional, free operation in both directions. Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715 Reviewed-on: https://skia-review.googlesource.com/60800 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Feed seed_shader() iota through a context pointer.Gravatar Mike Klein2017-10-18
| | | | | | | | | | As this array grows longer it causes troublesome code generation when we're compiling offline, but it's easy as an argument. Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86 Reviewed-on: https://skia-review.googlesource.com/60564 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* rename (x,y) to (dx,dy)Gravatar Mike Klein2017-10-18
| | | | | | | | | | | | | | | Today (x,y) are the integer coordinates of the first destination pixel we're working on. By renaming them (dx,dy), we free up the names (x,y) for working (i.e. _source_) x and y. Until now we've generally just been continuing to call those (r,g), but in the _lowp code that won't be possible (r+g hold x together, b+a y) but we'll have the ability to just give them proper names x and y. Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f Reviewed-on: https://skia-review.googlesource.com/60820 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* set SkJumper_kMaxStride to 16Gravatar Mike Klein2017-10-17
| | | | | | | | | | | | | | | It's properly 16 today because of HSW/lowp stages handling 16 pixels at a time, but it hasn't yet had an effect on lowp so we didn't notice. As we add lowp shader stages this will start to matter, so might as well bump it up to 16 now. (One day _skx lowp stages could bump this up to 32.) Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a Reviewed-on: https://skia-review.googlesource.com/60565 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* check unpremul scale directly against infinityGravatar Mike Klein2017-10-12
| | | | | | | | | | | | | There are non-zero values of a that make infinite 1.0f/a. Let's just check for the real thing we care about, that scale is finite. Bug: skia:7123 Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1 Reviewed-on: https://skia-review.googlesource.com/58460 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clamp to [0,1] in all gradient tilersGravatar Mike Klein2017-10-05
| | | | | | | | | | | | | | | | | Today gradient mirror and repeat don't explicitly clamp. They work fine for normal float values, but blow up with inputs like infinity and NaN, and those aren't hard to construct with a combination of a funky matrix and some squaring for xy -> radius. So explicitly clamp in each of the three matrix tilers. This should fix the fuzz at the associated bug. Bug: skia:7093 Change-Id: Idd44e3c7a1ed95e2b1ace8eb953b62eddeb4e00e Reviewed-on: https://skia-review.googlesource.com/55702 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* reformat hard-to-read preprocessor in SkJumper.cppGravatar Mike Klein2017-10-05
| | | | | | | Change-Id: I9a140e342e7b12b1cbb09503ca8fc03016717784 Reviewed-on: https://skia-review.googlesource.com/55701 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _skx stagesGravatar Mike Klein2017-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This just makes sure all the plumbing is in place to use the Skylake Xeon subset of AVX-512 instructions. So far, - no Windows - no lowp - nothing explicitly making use of AVX-512 registers or instructions This initial pass should run essentially identically to the _hsw AVX2 code we've been using previously. Clang _does_ use AVX-512-only instructions to implement some of the higher-level concepts we've coded, but it's really a pretty subtle difference. Next steps will bump N from 8 to 16 and start threading through an AVX-512-friendly mask instead of tail. I'll also want to take a harder look at how we do blending like if_then_else()... the default codegen here doesn't really take advantage of AVX-512 the way I'd like here. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2 Reviewed-on: https://skia-review.googlesource.com/54062 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Move context types into STAGE() macros.Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | This is something I came up with while writing _lowp.cpp. This should all be a logical no-op, but there are some code generation changes. I'm not exactly sure why. Change-Id: Iaad36b5298b37fe26ebd375a147a48852f98e1e4 Reviewed-on: https://skia-review.googlesource.com/52003 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Always zero vectors in start_pipeline().Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | | | | | | The lowp start_pipeline() always zeros, and with floats we always zero when compiled as part of Skia, so this just makes the offline float consistent with the others. It's getting confusing to think about which code zeros and which doesn't, and it'd be nicer to be able to rely on zeros. This should change code generation only to the start_pipelines in the .S files. Change-Id: I1178b83c01e609e40dc7912d8d56df8e36eb339d Reviewed-on: https://skia-review.googlesource.com/52001 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Also mask t itself in mask_2pt_conical_degenerates.Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | | | | | | | We look at t to create a mask in mask_2pt_conical_degenerates to be applied later to the colors after the normal gradient stages have run. But if t itself is NaN, that will wreak havoc in the normal gradient stages. So in addition to building the mask to kill off degenerate colors, let's also set degenerate t to zero, which should be a safe value. This fixes the fuzz mentioned in this bug. BUG=skia:7078 Change-Id: I8301450c707bdbf941abd0339959f9e60d46d955 Reviewed-on: https://skia-review.googlesource.com/52763 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove __attribute__((disable_tail_calls))Gravatar Mike Klein2017-09-27
| | | | | | | | | | | | This is a no-op in terms of generated code. There is no longer a tail call here to be disabled, not since we changed start_pipeline() to operate in 2D. Change-Id: Ife92590eb059e28e4a84e3729180c7410a93b410 Reviewed-on: https://skia-review.googlesource.com/52020 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rename kStride NGravatar Mike Klein2017-09-27
| | | | | | | | | | This is a no-op refactor to make SkJumper_stages.cpp and SkJumper_stages_lowp.cpp more similar. Change-Id: Icb5dd415d105fbdc58ce0b9b63058c0a66ed4a13 Reviewed-on: https://skia-review.googlesource.com/52000 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fold clamp_{x,y} into the gathers.Gravatar Mike Klein2017-09-22
| | | | | | | | | | | | | | | | | | | All three image tile modes go through exclusive_clamp() and then a gather today, so we can move the work of exclusive_clamp() into eac gather_ stage, eliminating the need for clamp_{x,y} stages. Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(), which works out the pointer and vector of indices to load for gathers. This deletes SkRasterPipeline_repeat_tiling unit test, which now no longer exactly makes sense. It tests that repeat_x does that clamp, but that's now done automatically outside that stage. Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9 Reviewed-on: https://skia-review.googlesource.com/50260 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* count total non-lowp runsGravatar Mike Klein2017-09-21
| | | | | | | Change-Id: I2e24c990983ea93cbd7983c9c4e88120c2b7f358 Reviewed-on: https://skia-review.googlesource.com/49768 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Retry, Bump stored lowp uniform color to 16-bit storage.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | This makes loading into 16-bit channels more natural in _lowp.cpp. Update a unit test to stop using out-of-range "colors". Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow Reviewed-on: https://skia-review.googlesource.com/47580 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Implement some easy _lowp stages.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | - load_565 allows 565-src sprite blits - scale_565 / lerp_565 allow subpixel text - luminance_to_alpha is a color filter, and lets us write grey 8 And update CachedDecodingPixelRefTest with a yet more robust color. Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526 Reviewed-on: https://skia-review.googlesource.com/47021 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Revert "Bump stored lowp uniform color to 16-bit storage."Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30. Reason for revert: ../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short' Excellent new bot! Original change's description: > Bump stored lowp uniform color to 16-bit storage. > > This makes loading into 16-bit channels more natural in _lowp.cpp. > > Change-Id: I1ed393873654060ef52f4632d670465528006bbd > Reviewed-on: https://skia-review.googlesource.com/47261 > Reviewed-by: Mike Reed <reed@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,reed@google.com Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/47540 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Disable SkJumper assembly in cross builds for now.Gravatar Nico Weber2017-09-16
| | | | | | | | Bug: chromium:762167 Change-Id: Ia23f6dbfc0466aef4ca9d1a5b9ff343d79dc83bb Reviewed-on: https://skia-review.googlesource.com/47460 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Bump stored lowp uniform color to 16-bit storage.Gravatar Mike Klein2017-09-16
| | | | | | | | | This makes loading into 16-bit channels more natural in _lowp.cpp. Change-Id: I1ed393873654060ef52f4632d670465528006bbd Reviewed-on: https://skia-review.googlesource.com/47261 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>