aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper_stages.cpp
Commit message (Collapse)AuthorAge
* Revert "attempt 2: add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85. Reason for revert: Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android Original change's description: > attempt 2: add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 > Reviewed-on: https://skia-review.googlesource.com/86860 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,liyuqian@google.com Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/86601 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* attempt 2: add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81 Reviewed-on: https://skia-review.googlesource.com/86860 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Rework out-of-gamut handling in SkRasterPipelineGravatar Mike Klein2017-12-18
| | | | | | | | | | | | | | | | | | | | | | Instead of trying to carefully manage the in-gamut / out-of-gamut state of the pipeline, let's do what a GPU would do, clamping to representable range in any float -> integer conversion. Most effects doing table lookups now clamp themselves internally, and the store_foo() methods clamp when the destination is fixed point. In turn the from_srgb() conversions and all future transfer function stages can care less about this stuff. If I'm thinking right, the _lowp side of things need not change at all, and that will soften the performance impact of this change. Anything that was fast to begin with was probably running a _lowp pipeline. Bug: skia:7419 Change-Id: Id2e080ac240a97b900a1ac131c85d9e15f70af32 Reviewed-on: https://skia-review.googlesource.com/85740 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* make SkColorSpace_New realGravatar Mike Klein2017-12-14
| | | | | | | | | | | | | | | | | | | Some interesting things are starting to fall out already, like the fact that I needed to add a gamma_dst stage to be able to draw into gamma-transfer-fn destinations. I've also had to pass an SkAlphaType through to the linearize functions so that they can maintain premul invariants. I'm not sure this is actually a good idea... if you can, please double- check my logic at SkRasterPipeline.cpp:128? If it's correct logic, I'm going to need to do it all over the place. But I imagine you don't do this and somehow get away with it. Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959 Reviewed-on: https://skia-review.googlesource.com/84680 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com>
* JUMPER_IS_AVX2 -> JUMPER_IS_HSWGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | We need to be a bit more pedantic here to support builds that may be using AVX2 as part of their baseline but perhaps not enabling all the related features SkJumper would like to use. E.g. we've seen Tensorflow build with AVX2 and FMA, but not F16C. So check all three {AVX2,FMA,F16C}, and only then build stages in HSW mode. I've updated the define as a reminder. This only affects builds using these features for their _baseline_ stages... the offline-compiled stages in SkJumper_generated.S are not affected. Change-Id: I9bfb3bae3589d35043b748782cefa8c213726d6a Reviewed-on: https://skia-review.googlesource.com/84221 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* a little SkJumper tidy upGravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | I noticed these little bits while working on that old-Clang fix. - We can force-inline anytime we've got Clang, not just when JUMPER_IS_OFFLINE. - The _aarch64 and _vfp4 WRAP functions are dead code, as they're never compiled offline now. Change-Id: I5850daded2ffcfe50ceeadc43f89fa8597df3387 Reviewed-on: https://skia-review.googlesource.com/84060 Commit-Queue: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* remove vfpv4 requirement for SkJumper on ARMv7Gravatar Mike Klein2017-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFPv4 gives us two interesting features: - FMA - f16<->f32 conversions Even without FMAs, NEON still has non-fused MLA instructions. We don't really care about the fusedness of those mul-adds, so losing FMA here is kind of no big deal. We already maintain portable code to do f16<->f32 conversions, so it's not much of a maintanence hit to use that instead of the native instructions. To my knowledge software F16 rendering is not a performance critical mode of operation for any of our users. This drops our minimum requirement to basically just having NEON. Devices like the Nexus 7 2012 will now take SkJumper fast paths instead of portable code. (Though actually, we've only ever required NEON for _lowp... only the float code also needed vfpv4). The main file to look at here is actually SkJumper_vectors.h, where you will see all the substantive changes. The rest just kind of tears down most of the old complexity, add adds ABI to put just a little of it back. :) Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4 Bug: skia: Reviewed-on: https://skia-review.googlesource.com/83521 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Use first/second instead of min/max in 2pt conical gradientGravatar Yuqian Li2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | Here's the tiny performance gain: $python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic" firstsecond (compared to master) is likely 4.23% faster in gradient_conicalOut_clamp_3color 4.23% faster in gradient_conicalOutZero_clamp_3color 4.79% faster in gradient_conical_clamp_shallow_dither 6.04% faster in gradient_conical_clamp_3color 6.04% faster in gradient_conicalZero_clamp_3color 6.42% faster in gradient_conicalOut_clamp 6.43% faster in gradient_conicalOutZero_clamp 6.74% faster in gradient_conical_clamp 6.98% faster in gradient_conical_clamp_shallow 6.98% faster in gradient_conicalZero_clamp Bug: skia: Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043 Reviewed-on: https://skia-review.googlesource.com/76561 Commit-Queue: Yuqian Li <liyuqian@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Revert "add experimental bilerp_clamp_8888 stage"Gravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d. Reason for revert: lots of crashing GPU bots. Original change's description: > add experimental bilerp_clamp_8888 stage > > It looks like we can specialize hot image shaders into their > own single stages for a good speedup on both x86 and ARM. > > I've started here with bilerp_clamp_8888, and will > follow up with bgra and 565, and lowp versions of those, > and probably also the same for nearest neighbors. > > All pixels are identical in GMs. > > Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 > Reviewed-on: https://skia-review.googlesource.com/82100 > Reviewed-by: Yuqian Li <liyuqian@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/83340 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add experimental bilerp_clamp_8888 stageGravatar Mike Klein2017-12-11
| | | | | | | | | | | | | | | | It looks like we can specialize hot image shaders into their own single stages for a good speedup on both x86 and ARM. I've started here with bilerp_clamp_8888, and will follow up with bgra and 565, and lowp versions of those, and probably also the same for nearest neighbors. All pixels are identical in GMs. Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70 Reviewed-on: https://skia-review.googlesource.com/82100 Reviewed-by: Yuqian Li <liyuqian@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add some lowp gradient stagesGravatar Mike Klein2017-11-03
| | | | | | | | | | | | | | I was originally going to add these to help test a lowp dither, but after looking at diffs I don't think lowp dither is a good idea. Non-dithered lowp gradients look fine to me so far. I'd have done conics, but they scare me. Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3 Reviewed-on: https://skia-review.googlesource.com/66460 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Some lowp refactoringGravatar Mike Klein2017-10-31
| | | | | | | | | | | | | | | | | | 1) Move a couple stages around in the enum to places that make more sense, and guass_a_to_rbga in the code too. 2) mirror the SkRasterPipeline stage enum with either: LOWP(st): the stage is implemented in low precision TODO(st): the stage should be lowp, but isn't NOPE(st): the stage shouldn't be done in lowp. 3) statically enforce that all stages are covered by one of LOWP, TODO, or NOPE. Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767 Reviewed-on: https://skia-review.googlesource.com/63800 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* translate+scale -> scale+translateGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | This is a no-op refactor. It's just always surprised me that the matrix_scale_translate stage expects [tx ty sx sy], when scales precede the translates in the names and in both normal row-major and column-major matrix layouts. This switches to [sx sy tx ty], scale then translate. Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6 Reviewed-on: https://skia-review.googlesource.com/62541 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Feed seed_shader() iota through a context pointer.Gravatar Mike Klein2017-10-18
| | | | | | | | | | As this array grows longer it causes troublesome code generation when we're compiling offline, but it's easy as an argument. Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86 Reviewed-on: https://skia-review.googlesource.com/60564 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* rename (x,y) to (dx,dy)Gravatar Mike Klein2017-10-18
| | | | | | | | | | | | | | | Today (x,y) are the integer coordinates of the first destination pixel we're working on. By renaming them (dx,dy), we free up the names (x,y) for working (i.e. _source_) x and y. Until now we've generally just been continuing to call those (r,g), but in the _lowp code that won't be possible (r+g hold x together, b+a y) but we'll have the ability to just give them proper names x and y. Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f Reviewed-on: https://skia-review.googlesource.com/60820 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* check unpremul scale directly against infinityGravatar Mike Klein2017-10-12
| | | | | | | | | | | | | There are non-zero values of a that make infinite 1.0f/a. Let's just check for the real thing we care about, that scale is finite. Bug: skia:7123 Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1 Reviewed-on: https://skia-review.googlesource.com/58460 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clamp to [0,1] in all gradient tilersGravatar Mike Klein2017-10-05
| | | | | | | | | | | | | | | | | Today gradient mirror and repeat don't explicitly clamp. They work fine for normal float values, but blow up with inputs like infinity and NaN, and those aren't hard to construct with a combination of a funky matrix and some squaring for xy -> radius. So explicitly clamp in each of the three matrix tilers. This should fix the fuzz at the associated bug. Bug: skia:7093 Change-Id: Idd44e3c7a1ed95e2b1ace8eb953b62eddeb4e00e Reviewed-on: https://skia-review.googlesource.com/55702 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _skx stagesGravatar Mike Klein2017-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This just makes sure all the plumbing is in place to use the Skylake Xeon subset of AVX-512 instructions. So far, - no Windows - no lowp - nothing explicitly making use of AVX-512 registers or instructions This initial pass should run essentially identically to the _hsw AVX2 code we've been using previously. Clang _does_ use AVX-512-only instructions to implement some of the higher-level concepts we've coded, but it's really a pretty subtle difference. Next steps will bump N from 8 to 16 and start threading through an AVX-512-friendly mask instead of tail. I'll also want to take a harder look at how we do blending like if_then_else()... the default codegen here doesn't really take advantage of AVX-512 the way I'd like here. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2 Reviewed-on: https://skia-review.googlesource.com/54062 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Move context types into STAGE() macros.Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | This is something I came up with while writing _lowp.cpp. This should all be a logical no-op, but there are some code generation changes. I'm not exactly sure why. Change-Id: Iaad36b5298b37fe26ebd375a147a48852f98e1e4 Reviewed-on: https://skia-review.googlesource.com/52003 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Always zero vectors in start_pipeline().Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | | | | | | The lowp start_pipeline() always zeros, and with floats we always zero when compiled as part of Skia, so this just makes the offline float consistent with the others. It's getting confusing to think about which code zeros and which doesn't, and it'd be nicer to be able to rely on zeros. This should change code generation only to the start_pipelines in the .S files. Change-Id: I1178b83c01e609e40dc7912d8d56df8e36eb339d Reviewed-on: https://skia-review.googlesource.com/52001 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Also mask t itself in mask_2pt_conical_degenerates.Gravatar Mike Klein2017-09-28
| | | | | | | | | | | | | | | | | | We look at t to create a mask in mask_2pt_conical_degenerates to be applied later to the colors after the normal gradient stages have run. But if t itself is NaN, that will wreak havoc in the normal gradient stages. So in addition to building the mask to kill off degenerate colors, let's also set degenerate t to zero, which should be a safe value. This fixes the fuzz mentioned in this bug. BUG=skia:7078 Change-Id: I8301450c707bdbf941abd0339959f9e60d46d955 Reviewed-on: https://skia-review.googlesource.com/52763 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove __attribute__((disable_tail_calls))Gravatar Mike Klein2017-09-27
| | | | | | | | | | | | This is a no-op in terms of generated code. There is no longer a tail call here to be disabled, not since we changed start_pipeline() to operate in 2D. Change-Id: Ife92590eb059e28e4a84e3729180c7410a93b410 Reviewed-on: https://skia-review.googlesource.com/52020 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rename kStride NGravatar Mike Klein2017-09-27
| | | | | | | | | | This is a no-op refactor to make SkJumper_stages.cpp and SkJumper_stages_lowp.cpp more similar. Change-Id: Icb5dd415d105fbdc58ce0b9b63058c0a66ed4a13 Reviewed-on: https://skia-review.googlesource.com/52000 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fold clamp_{x,y} into the gathers.Gravatar Mike Klein2017-09-22
| | | | | | | | | | | | | | | | | | | All three image tile modes go through exclusive_clamp() and then a gather today, so we can move the work of exclusive_clamp() into eac gather_ stage, eliminating the need for clamp_{x,y} stages. Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(), which works out the pointer and vector of indices to load for gathers. This deletes SkRasterPipeline_repeat_tiling unit test, which now no longer exactly makes sense. It tests that repeat_x does that clamp, but that's now done automatically outside that stage. Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9 Reviewed-on: https://skia-review.googlesource.com/50260 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* centralize SI, Ctx, and load_and_inc()Gravatar Mike Klein2017-09-15
| | | | | | | | | | | | | | | | | We've got independent definitions of SI, LazyCtx/Ctx, and load_and_inc() in _stages.cpp and _lowp.cpp. It's a good time to centralize them, taking _stages.cpp's SI and load_and_inc(), and _lowp's Ctx. SI and load_and_inc() are uninterestingly different. But using _lowp's Ctx will let us get its prettier typed stage definitions into _stages.cpp, but that is not not done here. This is a pure refactor with no generated code changes. Change-Id: I53260b0fdc71a77bf9e3ed6f3df3a2a4cbd2392b Reviewed-on: https://skia-review.googlesource.com/47181 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* merge 0,1,2,3,... and 0.5fGravatar Mike Klein2017-09-05
| | | | | | | | | | Because floats are fun, the compiler cannot merge x + 0.5f + [0,1,2,3,4...] into x + [0.5,1.5,2.5,3.5,4.5,...]. But we can. Change-Id: I03b46c1ea0653877f35f6c888f29371b5f73d813 Reviewed-on: https://skia-review.googlesource.com/42480 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* no more need for a constants pointerGravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | The only reason we were keeping SkJumper_constants around is that it was hard to get float/integer iota vectors on arm64 without relocations. Now that we're compiling arm64 normally as part of Skia, we don't have to worry about relocations. This means we can kill the struct and stop passing around that pointer. Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce Reviewed-on: https://skia-review.googlesource.com/40200 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* split up JUMPER defineGravatar Mike Klein2017-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | Whether JUMPER is defined is starting to get a little overloaded: - are we compiling offline (defined) or as part of Skia (!defined)? - are we using Clang vector extensions (defined) or scalars (!defined)? This splits JUMPER into these two separate concerns: - JUMPER_IS_OFFLINE - JUMPER_IS_SCALAR, JUMPER_IS_NEON, JUMPER_IS_AVX2, etc. The upshot is that we'll now use Clang vector extensions when available for our "portable" baseline. On x86-64 and ARMv8 compiled by Clang, we're guaranteed to pick up SSE2 and NEON respectively. Our -Fast bot should even get all the way to AVX2. Another CL will do some refactoring in SkJumper to remove the redundant copies of guaranteed vector code on x86-64 and ARMv8. I didn't want to do that here yet to demonstrate that there is zero effect on the .S files from this CL. Change-Id: Ib5e8f00b35e8721b2cc7180e294840ffaf9dddce Reviewed-on: https://skia-review.googlesource.com/39500 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rework plus blend modeGravatar Mike Klein2017-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most interesting parts of this are how plus interacts with partial coverage. Plus needs its clamp to happen after the lerp. Luckily, some of its math folds away: d' = clamp[ d*(1-c) + (s+d)*c ] == clamp[ d - dc + sc + dc ] == clamp[ d + sc ] What's nice there is that coverage can be folded into the src term. This suggests that we can re-write the plus stage to clamp internally (and thus, be viable for 8-bit) if we always pre-scale with coverage. We don't have a way to pre-scale with 565 coverage until now, but it's only a step or two away from there. We can use the alternate formulation we derived for alpha for lerp_565, calculating the alpha coverage from red, green, and blue coverages _and_ the values of src and dst alpha. While we already pre-scale srcover today for 8-bit or constant coverage, we cannot do the same for 565. When evaluating the expression d' = s + (1-a)d we need the a term to be pre-scaled with red's coverage when calculating dr', with blue's when calculating db', etc. Essentially we need to carry around a bunch of extra values, and we've got no way to do that. So instead, we'll just carefully pre-scale plus with any coverage, and keep post-lerping srcover when we have 565 coverage. Change-Id: I7a7a52eec7d482e1b98bb8a01ea0a3d5e67bef65 Reviewed-on: https://skia-review.googlesource.com/38300 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* ColorBurn/ColorDodge stage tweaksGravatar Florin Malita2017-08-23
| | | | | | | | | | | | | | | | | | | Minor speedup. Before: 10212.01 ? blendmode_rect_ColorBurn 8888 9216.78 ? blendmode_rect_ColorDodge 8888 After: 9635.44 ? blendmode_rect_ColorBurn 8888 8820.22 ? blendmode_rect_ColorDodge 8888 Change-Id: I9e8a9aa21e2370de3174c31821fb0676260d2643 Reviewed-on: https://skia-review.googlesource.com/37620 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* remove mask load() and store()Gravatar Mike Klein2017-08-11
| | | | | | | | | | | | | | | They appear to be slower than the generic load() and store() now. [blendmode_mask_Hue] 14.7ms @0 15.6ms @95 39.6ms @100 [blendmode_rect_Hue] 31.5ms @0 37.6ms @95 39.5ms @100 ~~> [blendmode_mask_Hue] 14.7ms @0 15.2ms @95 39.5ms @100 [blendmode_rect_Hue] 30.5ms @0 32.6ms @95 37.8ms @100 Change-Id: I674b75087b8139debead71f3016631bcb0cb0047 Reviewed-on: https://skia-review.googlesource.com/33800 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Replace interp() with clut_{3,4}D stages.Gravatar Mike Klein2017-08-10
| | | | | | | | | | | | | | | I tried to follow exactly the same strategy as a start. (Though I did fix the off-by-one dimensions.) It does rather look like we only need 3D and 4D now that I've looked at the call sites. Looks like about a 20% speedup. Change-Id: I8b1af64750ad1750716ee1ab0767e64591c7206a Reviewed-on: https://skia-review.googlesource.com/32842 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Brian Osman <brianosman@google.com>
* add gamma stageGravatar Mike Klein2017-08-09
| | | | | | | | | | | | | | | | Until now we've been using 3 separate parametric stages to apply gamma to r,g,b. That works fine, but is kind of unnecessarily slow, and again less clear in a stack trace than seeing "gamma". The new bench runs in about 60% of the time the old one does on my Trashcan. BUG=skia:6939 Change-Id: I079698d3009b081f1c23a2e27fc26e373b439610 Reviewed-on: https://skia-review.googlesource.com/32721 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add an invert stage for inverse CMYK -> CMYKGravatar Mike Klein2017-08-08
| | | | | | | | | | | | | | | | | This will be faster, but maybe more importantly it helps make debugging a stack trace clearer. It's confusing to see a "parametric transfer function" stages followed by a table transfer function stages... This leads to a little bit of cleanup in SkColorSpaceXform_A2B. I am uncertain whether we still need parametric_a. I need to do some more tracing through the code before I'd say it's impossible to reach in addTransferFn(). Change-Id: I52e85019f92d012a3086fc94cf64ae6c9307ea94 Reviewed-on: https://skia-review.googlesource.com/32040 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Store float and byte constant colors.Gravatar Mike Klein2017-08-03
| | | | | | | | | This makes loading them much simpler in 8-bit mode. Change-Id: I35ff34ebd0b93425c4e39e055bf4ade8cf8561e1 Reviewed-on: https://skia-review.googlesource.com/30621 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clamp to 0 in repeat and mirror image tilersGravatar Mike Klein2017-08-01
| | | | | | | | | | | | | | | | | | | If we were doing this math with real numbers or even just doubles, these clamps wouldn't be necessary. But we're favoring speed over accuracy here when we emulate fmod() and some of those inaccuracies end up with values outside the [0,tile) range, negative! To keep the spirit of fast over 100% accurate, I've just added a safety clamp to 0. The case in the unit test now returns 0 where it should really return something like 7 or 8, but at least we won't try to read _way_ outside the image buffer. BUG=chromium:749260 Change-Id: Ifc5cfe69798beccbb2a16547510158576e06eb3a Reviewed-on: https://skia-review.googlesource.com/29580 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* use new Stage ABI for ARMv7 tooGravatar Mike Klein2017-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | ARMv7 can pass 16 floats as function arguments. We've been slicing that as 8 2-float vectors. This CL switches to 4 4-float vectors. We'll now operate on 4 pixels at a time instead of 2, at the expense of keeping the d-vectors (mostly used for blending) on the stack. It'll be interesting to see how this plays out performance-wise. One nice side effect is now both ARMv7 and ARMv8 use 4-float NEON vectors. Most of the code is now shared, with just a couple checks to use new instructions added in ARMv8. It looks like we do see a ~15% win: $ bin/droid out/monobench SkRasterPipeline_srgb 200 Before: 644.029ns After: 547.301ns ARMv8: 453.838ns (just for reference) Change-Id: I184ff29a36499e3cdb4c284809d40880b02c2236 Reviewed-on: https://skia-review.googlesource.com/27701 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rearrange SkJumper registers on 32-bit x86Gravatar Mike Klein2017-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are not many registers on 32-bit x86, and we're using most to pass Stage function arguments. This means few are available as temporaries, and we're forced to hit the stack all the time. xmm registers are the most egregious example: we use all 8 registers pass data, leaving none free as temporaries. This CL cuts things down pretty dramatically, from passing 5 general purpose and 8 xmm registers to 2 general purpose and 4 xmm registers. One of the two general purpose registers is a pointer to space on the stack where we store all those other values. Every stage function needs to use the program pointer, so that stays in a general purpose register. Almost every stage uses the r,g,b,a vectors, so they stay in xmm registers. The rest (destination x,y, the tail mask, a pointer to tricky constants, and the dr,dg,db,da vectors) now live on the stack. The generated code is about 20K smaller and runs about 20% faster. $ out/monobench SkRasterPipeline_srgb 200 Before: 358.784ns After: 282.563ns Change-Id: Icc117af95c1a81c41109984b32e0841022f0d1a6 Reviewed-on: https://skia-review.googlesource.com/27620 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* convert over to 2d-modeGravatar Mike Klein2017-07-20
| | | | | | | | | | | | [√] convert all stages to use SkJumper_MemoryCtx / be 2d-compatible [√] convert compile to 2d also, remove 1d run/compile [√] convert all call sites [√] no diffs Change-Id: I3b806eb8fe0c3ec043359616409f7cd1211a1e43 Reviewed-on: https://skia-review.googlesource.com/24263 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* experimental: draw into unpremulGravatar Mike Reed2017-07-19
| | | | | | | | | | raster-only Bug: skia: Change-Id: I3af19f031083c9cc258f73ba6a2f6020bb15f110 Reviewed-on: https://skia-review.googlesource.com/24400 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* remove gather_i8, unify memory-touching contextsGravatar Mike Klein2017-07-18
| | | | | | | | | | | | | | | gather_i8 is now unused, so we can remove it. That in turn makes the ctable field of SkJumper_GatherCtx unused. After removing ctable, SkJumper_GatherCtx and SkJumper_PtrStride look identical, so I've now fused them into SkJumper_MemoryCtx, which will eventually be used by everything loading from, gathering from, or storing to memory. Change-Id: Ia882d2dbd54c9fcf9a8250a1ce83304389dd284a Reviewed-on: https://skia-review.googlesource.com/24085 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on raster pipeline 2d modeGravatar Mike Klein2017-07-18
| | | | | | | | | | - Add run_2d(x,y,w,h) and start_pipeline_2d(). - Add and test a 2d-compatible store_8888_2d stage. Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a Reviewed-on: https://skia-review.googlesource.com/24401 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* add stages for black and white colorsGravatar Mike Reed2017-07-06
| | | | | | | | | | | | | | histogram of test skps: black: 1/7 white: 2/7 other: 4/7 Bug: skia: Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361 Reviewed-on: https://skia-review.googlesource.com/21408 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Reed <reed@google.com>
* optimize for diff matrix typesGravatar Mike Reed2017-07-05
| | | | | | | | Bug: skia: Change-Id: I671e07c5bbb9e4ced92303c9959143324f7a6bdc Reviewed-on: https://skia-review.googlesource.com/21523 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Herb Derby <herb@google.com>
* 2pt conical stage for focal-point-outside caseGravatar Florin Malita2017-06-29
| | | | | | | | | | | | | | | | | A couple of annoyances here: 1) the prev vector_scale stage is not usable for masking, as NaN values can propagate through => switch to actual masking 2) for the outside case, we must select the min root when the gradient is flipped => split into two templated stages (_min, _max) (I'm not convinced that we need to flip the gradient for RP at all; we can investigate later) Change-Id: I0283812d613a53124f2987d1aea1f26e4533655e Reviewed-on: https://skia-review.googlesource.com/21162 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* 2pt conical stage for focal-pt-on-edge caseGravatar Florin Malita2017-06-28
| | | | | | | | | | | | | | | | | | | When the focal point is on the edge of the end circle, the quadratic equation devolves to linear. Add a stage to handle this case. As a complication, this case can produce "degenerate" values: 1) t == NaN 2) R(t) < 0 For these, we're supposed to draw transparent black - which means overwriting the color from the gradient stage. To support this, build a 0/1 vector mask in the context, and apply it post-gradient-stage. Change-Id: Ice4e3243abfd8c784bb810f6c310aed7a4ac7dc8 Reviewed-on: https://skia-review.googlesource.com/21111 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
* 2ptconical stageGravatar Florin Malita2017-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Initial impl, for the well-behaved case (focal point inside). MBP numbers - Before: 3365.87 ! gradient_conical_clamp_shallow srgb 3590.88 ! gradient_conical_clamp_shallow_dither srgb 3376.91 ! gradient_conical_clamp_3color srgb 3351.64 ! gradient_conical_clamp_hicolor srgb 3379.35 ! gradient_conical_clamp srgb After: 648.93 ! gradient_conical_clamp_shallow srgb 665.12 ! gradient_conical_clamp_shallow_dither srgb 773.98 ! gradient_conical_clamp_3color srgb 1175.35 ! gradient_conical_clamp_hicolor srgb 619.17 ! gradient_conical_clamp srgb Change-Id: I07b22a758363e1f340a6041bca53bdef74229eb9 Reviewed-on: https://skia-review.googlesource.com/20906 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add bgra as 1st class formatGravatar Mike Klein2017-06-27
| | | | | | | | | | | | This is a start to eliminating swap_rb as a stage. I've just hit the main hot spots here. Going to look into the ~dozen other spots to see how they should work next. Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd Reviewed-on: https://skia-review.googlesource.com/20982 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* remove unused "swap" stageGravatar Mike Klein2017-06-27
| | | | | | | Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>