aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper.cpp
Commit message (Collapse)AuthorAge
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp shader stagesGravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make enabling LOWP stages simplerGravatar Mike Klein2017-10-23
| | | | | | | | | | | | | | | | This method is a little simpler macro-wise, and makes it easier to guard new lowp stages: LOWP(foo) LOWP(bar) #ifndef SK_LEGACY_LOWP_BAZ LOWP(baz) #endif Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab Reviewed-on: https://skia-review.googlesource.com/63141 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on lowp shadersGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to assign types to the stages depending on their inputs and outputs: GG: x,y -> x,y GP: x,y -> r,g,b,a PP: r,g,b,a -> r,g,b,a (There are a couple other degenerate cases here, where a stage ignores its inputs or creates no outputs, but we can always just pretend their null input or output is one type or the other arbitrarily.) The GG stages will be pretty much entirely float code, and the GP stages a mix of float math and byte stuff. Since we've chosen U16 to match our register size in _lowp land, we'll unpack each F register across two of those for transport between stages. This is a notional, free operation in both directions. Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715 Reviewed-on: https://skia-review.googlesource.com/60800 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* reformat hard-to-read preprocessor in SkJumper.cppGravatar Mike Klein2017-10-05
| | | | | | | Change-Id: I9a140e342e7b12b1cbb09503ca8fc03016717784 Reviewed-on: https://skia-review.googlesource.com/55701 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _skx stagesGravatar Mike Klein2017-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This just makes sure all the plumbing is in place to use the Skylake Xeon subset of AVX-512 instructions. So far, - no Windows - no lowp - nothing explicitly making use of AVX-512 registers or instructions This initial pass should run essentially identically to the _hsw AVX2 code we've been using previously. Clang _does_ use AVX-512-only instructions to implement some of the higher-level concepts we've coded, but it's really a pretty subtle difference. Next steps will bump N from 8 to 16 and start threading through an AVX-512-friendly mask instead of tail. I'll also want to take a harder look at how we do blending like if_then_else()... the default codegen here doesn't really take advantage of AVX-512 the way I'd like here. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2 Reviewed-on: https://skia-review.googlesource.com/54062 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* count total non-lowp runsGravatar Mike Klein2017-09-21
| | | | | | | Change-Id: I2e24c990983ea93cbd7983c9c4e88120c2b7f358 Reviewed-on: https://skia-review.googlesource.com/49768 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Implement some easy _lowp stages.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | - load_565 allows 565-src sprite blits - scale_565 / lerp_565 allow subpixel text - luminance_to_alpha is a color filter, and lets us write grey 8 And update CachedDecodingPixelRefTest with a yet more robust color. Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526 Reviewed-on: https://skia-review.googlesource.com/47021 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Disable SkJumper assembly in cross builds for now.Gravatar Nico Weber2017-09-16
| | | | | | | | Bug: chromium:762167 Change-Id: Ia23f6dbfc0466aef4ca9d1a5b9ff343d79dc83bb Reviewed-on: https://skia-review.googlesource.com/47460 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make most of SkColorPriv.h privateGravatar Cary Clark2017-09-15
| | | | | | | | | | | | | created new file src/core/SkColorData.h for internal consumption. Note that many of the functions there are unused as well. Bug: skia: 6898 R: reed@google.com Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d Reviewed-on: https://skia-review.googlesource.com/46848 Commit-Queue: Cary Clark <caryclark@google.com> Reviewed-by: Mike Reed <reed@google.com>
* clean up SK_JUMPER_LEGACY_8BITGravatar Mike Klein2017-09-15
| | | | | | | Change-Id: I4d4093fcfc839f6e7468b7d9f89bb903186ab68d Reviewed-on: https://skia-review.googlesource.com/46761 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* grand unifried lowp stagesGravatar Mike Klein2017-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my laptop. I'm feeling confident that we can make this perform well. After looking at performance a bit more today, it looks like everything is within what I'd consider comparable in performance, especially on ARM. On x86-64 it looks like big bulk blits get a little slower and small mask blits get a little faster. Quality looks good, and maybe improved for 565. There are fewer platform-specific differences now in _lowp, and I think they're few enough now that we could even consider completing the unification by folding the 8-bit and float code together. Rename "div255()" to "rebias()", slap on a few coats of paint... Guarded for Chrome with SK_JUMPER_LEGACY_LOWP. Change-Id: I36309c07cf736f3cb31952cca66030ad56026318 Reviewed-on: https://skia-review.googlesource.com/45982 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up SK_JUMPER_LEGACY_X86_8BITGravatar Mike Klein2017-09-06
| | | | | | | Change-Id: I26c22c085efd70b65de927a9a8a041d03c170f2a Reviewed-on: https://skia-review.googlesource.com/42760 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* add 8bit stages for load/store 565Gravatar Mike Reed2017-08-31
| | | | | | | | | | | approx 2.5x faster on arm64 for sprite 8888 --> 565 blits Bug: skia: Change-Id: I524f993fee16196385dc07cbec39ef378b1301e5 Reviewed-on: https://skia-review.googlesource.com/41162 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* 32-bit x86 8-bit stagesGravatar Mike Klein2017-08-30
| | | | | | | | | | | Shouldn't be anything tricky here. Guarded by SK_JUMPER_LEGACY_X86_8BIT for (Win) layout tests. Change-Id: I7580c7c18d1721f1301904c049ea2e59e9bda5d9 Reviewed-on: https://skia-review.googlesource.com/40692 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* no more need for a constants pointerGravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | The only reason we were keeping SkJumper_constants around is that it was hard to get float/integer iota vectors on arm64 without relocations. Now that we're compiling arm64 normally as part of Skia, we don't have to worry about relocations. This means we can kill the struct and stop passing around that pointer. Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce Reviewed-on: https://skia-review.googlesource.com/40200 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* use NEON 8-bit stages on ARMv7 tooGravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | | | We don't really use anything very ARMv8 specific in the 8-bit NEON stages, so we can just naturally extend what we're doing to ARMv7 too. Note that unlike the float stages, we're not requiring VFPv4 either, just NEON. VFPv4 is for FMA and F16<->F32 conversion, both of which are unnecessary for the integer pipeline. GMs and perf improvement are similar to the previous ARMv8 change. Change-Id: Id618801ea1920564c1deee144a640a4133c4505f Reviewed-on: https://skia-review.googlesource.com/39840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Revert "Revert "8-bit jumper on armv8""Gravatar Mike Klein2017-08-29
| | | | | | | | | | | | | | | | | This reverts commit 6d13575108299951ecdfba6d85c915fcec2bc028. Now with guards for "errors" like this: external/skia/src/jumper/SkJumper_stages_8bit.cpp:240:50: error: 'memcpy' called with size bigger than buffer case 12: memcpy(&v, src, 12*sizeof(T)); break; This code is unreachable and generally removed by Clang's optimizer anyway... as far as I can tell the code generation diff is arbitrary. Change-Id: I6216567caaa6166f71258bd25343a09e93892a10 Reviewed-on: https://skia-review.googlesource.com/39961 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "8-bit jumper on armv8"Gravatar Derek Sollenberger2017-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 08133583d5e1cdfdcc41b4bb078fcfb64137f058. Reason for revert: Blocking Android Autoroller on compile error. Original change's description: > 8-bit jumper on armv8 > > The GM diffs are all minor and what you'd expect. > > I did a quick performance sanity check, which also looks fine. > > $ out/ok bench rp filter:search=Modulate > [blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100 > [blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100 > ~~~> > [blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100 > [blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100 > > This isn't even really the fastest we can make 8-bit go on ARMv8; > it's actually much more natural to work de-interlaced there. Lots > of room to follow up. > > Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7 > Reviewed-on: https://skia-review.googlesource.com/39740 > Reviewed-by: Florin Malita <fmalita@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org,reed@google.com Change-Id: I71425d8b7fbb66be5cb50025871dd81358111da4 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/39980 Reviewed-by: Derek Sollenberger <djsollen@google.com> Commit-Queue: Derek Sollenberger <djsollen@google.com>
* 8-bit jumper on armv8Gravatar Mike Klein2017-08-28
| | | | | | | | | | | | | | | | | | | | | | The GM diffs are all minor and what you'd expect. I did a quick performance sanity check, which also looks fine. $ out/ok bench rp filter:search=Modulate [blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100 [blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100 ~~~> [blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100 [blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100 This isn't even really the fastest we can make 8-bit go on ARMv8; it's actually much more natural to work de-interlaced there. Lots of room to follow up. Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7 Reviewed-on: https://skia-review.googlesource.com/39740 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove aarch64 offline compilationGravatar Mike Klein2017-08-28
| | | | | | | | | | | | | | | The baseline compiled into Skia is now pretty much identical. Minor diffs due to the offline code using -ffp-contract=fast, and the baseline not. Explicit calls to fma() are still FMAs, but we're no longer letting the compiler uncover FMAs we didn't explicitly call out. If this goes well, we should be able to turn on the 8-bit pipeline. Change-Id: I8f73157cfce7373574c20f6435fe86b46477afa9 Reviewed-on: https://skia-review.googlesource.com/39520 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rework plus blend modeGravatar Mike Klein2017-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most interesting parts of this are how plus interacts with partial coverage. Plus needs its clamp to happen after the lerp. Luckily, some of its math folds away: d' = clamp[ d*(1-c) + (s+d)*c ] == clamp[ d - dc + sc + dc ] == clamp[ d + sc ] What's nice there is that coverage can be folded into the src term. This suggests that we can re-write the plus stage to clamp internally (and thus, be viable for 8-bit) if we always pre-scale with coverage. We don't have a way to pre-scale with 565 coverage until now, but it's only a step or two away from there. We can use the alternate formulation we derived for alpha for lerp_565, calculating the alpha coverage from red, green, and blue coverages _and_ the values of src and dst alpha. While we already pre-scale srcover today for 8-bit or constant coverage, we cannot do the same for 565. When evaluating the expression d' = s + (1-a)d we need the a term to be pre-scaled with red's coverage when calculating dr', with blue's when calculating db', etc. Essentially we need to carry around a bunch of extra values, and we've got no way to do that. So instead, we'll just carefully pre-scale plus with any coverage, and keep post-lerping srcover when we have 565 coverage. Change-Id: I7a7a52eec7d482e1b98bb8a01ea0a3d5e67bef65 Reviewed-on: https://skia-review.googlesource.com/38300 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Remove SK_SUPPORT_LEGACY_RP_BLENDS-guarded codeGravatar Florin Malita2017-08-24
| | | | | | | | | | The flag is no longer used. Change-Id: I39156ef5683538263c2302f2fe3ba779e55dbc47 Reviewed-on: https://skia-review.googlesource.com/38360 Commit-Queue: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* rename confusing lowp guardGravatar Mike Klein2017-08-15
| | | | | | | Change-Id: I346429015e5f902b0a35663e140bb9a025c4220e Reviewed-on: https://skia-review.googlesource.com/34680 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Lowp overlay, hardlight stagesGravatar Florin Malita2017-08-14
| | | | | | | | | | | | | | | | | | | | Before: micros bench 7669.09 ? blendmode_rect_HardLight 8888 8707.13 ? blendmode_rect_Overlay 8888 After: micros bench 6679.60 ? blendmode_rect_HardLight 8888 6789.57 ? blendmode_rect_Overlay 8888 Change-Id: I52f389253fa07dafe18e572af550af7387264a16 Reviewed-on: https://skia-review.googlesource.com/34280 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
* lowp: lighten, difference, exclusionGravatar Florin Malita2017-08-14
| | | | | | | Change-Id: I5773cf831c7e41a932bee1f2c6830085fb7db025 Reviewed-on: https://skia-review.googlesource.com/33764 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
* Guard lowp changesGravatar Florin Malita2017-08-11
| | | | | | | | | | Chromium uses the lowp code, we have to stage the changes. TBR= Change-Id: I45e97a51eca285c9afc71926bbf736a03d0d146c Reviewed-on: https://skia-review.googlesource.com/33765 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Lowp darken stageGravatar Florin Malita2017-08-11
| | | | | | | Change-Id: I4bf618ad8728541fcef3fc1c6aa5b3ca106d50dc Reviewed-on: https://skia-review.googlesource.com/33583 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* clamp_1 is also a no-op with 8-bit lowpGravatar Mike Klein2017-08-04
| | | | | | | Change-Id: Ifef97d8f28c88c4ee3f7701aac6e383940ed5275 Reviewed-on: https://skia-review.googlesource.com/31020 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* 15-bit lowp is dead, long live 8-bit lowpGravatar Mike Klein2017-08-04
| | | | | | | Change-Id: Icc4b06094aeba3af99b534746f66286d776ef78a Reviewed-on: https://skia-review.googlesource.com/30920 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* 8-bit hackingGravatar Mike Klein2017-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | I think we can replace a lot of legacy code with an SkRasterPipeline backend that works in 8-bit and stays interlaced. Think of this as a "lowerp" replacement for lowp. I'm having some trouble getting ARMv8 working. ARMv7 should be fine, but I want to turn it on separately from x86. I haven't looked at 32-bit x86 yet, but that's also on the todo list. Open questions to follow up on: - is it better to fold every multiply back down to 8-bit (as seen here), or to allow intermediates to accumulate in 16-bit and divide by 255 when done/needed? - is it better pass tightly packed 8-bit vectors between stages (as seen here), or to keep the 8-bit values unpacked in 16-bit lanes? - should we make V wider than 1 register? GMs look good. All diffs invisible and plausibly due to the 15->8 bit precision drop. A quick bench run showed this running in about 0.75x the time of the existing lowp backend. Change-Id: I24aa46ff1d19c0b9b8dc192d5b1821cab0b8843c Reviewed-on: https://skia-review.googlesource.com/29886 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* clean up SK_SUPPORT_LEGACY_WIN32_JUMPERGravatar Mike Klein2017-07-27
| | | | | | | Change-Id: Icae3c6ce80a0bef097ea1010a4d065cc9d5a4c88 Reviewed-on: https://skia-review.googlesource.com/27560 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* convert over to 2d-modeGravatar Mike Klein2017-07-20
| | | | | | | | | | | | [√] convert all stages to use SkJumper_MemoryCtx / be 2d-compatible [√] convert compile to 2d also, remove 1d run/compile [√] convert all call sites [√] no diffs Change-Id: I3b806eb8fe0c3ec043359616409f7cd1211a1e43 Reviewed-on: https://skia-review.googlesource.com/24263 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* start on raster pipeline 2d modeGravatar Mike Klein2017-07-18
| | | | | | | | | | - Add run_2d(x,y,w,h) and start_pipeline_2d(). - Add and test a 2d-compatible store_8888_2d stage. Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a Reviewed-on: https://skia-review.googlesource.com/24401 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* add 32-bit Windows SkJumper backendGravatar Mike Klein2017-07-17
| | | | | | | | | | | | | | | | | | | | The most interesting part of this is getting the call to start_pipeline to work. From there it should be just like the other x86 backend. The 32-bit calling conventions are the same across Linux/Mac and Windows, so that's nice. The tricky bit is that Linux and Mac align the stack to 16 bytes, while Windows only to 4. I think this force_align_arg_pointer attribute on start_pipeline does the trick. This needs a guard for layout tests. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86-Debug;master.tryserver.blink:win10_blink_rel,win7_blink_rel;master.tryserver.chromium.win:win_chromium_rel_ng Change-Id: Ia74d22e5a4ce5483c9817b8a8f89dd21885bbd14 Reviewed-on: https://skia-review.googlesource.com/20968 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* add stages for black and white colorsGravatar Mike Reed2017-07-06
| | | | | | | | | | | | | | histogram of test skps: black: 1/7 white: 2/7 other: 4/7 Bug: skia: Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361 Reviewed-on: https://skia-review.googlesource.com/21408 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Reed <reed@google.com>
* be more explicit about not expecting 32-bit x86 jumper backend on windowsGravatar Mike Klein2017-06-28
| | | | | | | | | | | Looks like Clang/Win is defining __i386__, but we're not linking in stage functions (they don't exist yet for Windows). Change-Id: I78fdd3e1d89020bc6c64bc1cd5dfb3fbca720b2e Reviewed-on: https://skia-review.googlesource.com/21103 Commit-Queue: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Klein <mtklein@google.com>
* add bgra as 1st class formatGravatar Mike Klein2017-06-27
| | | | | | | | | | | | This is a start to eliminating swap_rb as a stage. I've just hit the main hot spots here. Going to look into the ~dozen other spots to see how they should work next. Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd Reviewed-on: https://skia-review.googlesource.com/20982 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* remove unused "swap" stageGravatar Mike Klein2017-06-27
| | | | | | | Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* specialize loaders for dst registers, to avoid move/swap stagesGravatar Mike Reed2017-06-27
| | | | | | | | Bug: skia: Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d Reviewed-on: https://skia-review.googlesource.com/20984 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add dumbest possible 32-bit SkJumper backendGravatar Mike Klein2017-06-27
| | | | | | | | | | | | | | | | Everything uses a ton of stack, nothing tail calls, and for now this is non-Windows only. But, it does run faster than the portable serial code. On my trashcan, running `monobench SkRasterPipeline_compile`: - Normal 64-bit AVX build: 43.6ns - Before this CL, 32-bit: 707.9ns - This CL: 147.5ns Change-Id: I4a8929570ace47193ed8925c58b70bb22d6b1447 Reviewed-on: https://skia-review.googlesource.com/20964 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _hsw lowp backendGravatar Mike Klein2017-06-27
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN Change-Id: Id53279c17589b3434629bb644358ee238af8649f Reviewed-on: https://skia-review.googlesource.com/20269 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com>
* delete lowp plusGravatar Mike Klein2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | I have figured out how to implement lowp clamp_1/clamp_a, and implementing clamp_1 would make lowp plus active. But... the way we have factored blend modes requires us to be able to lerp between the dst and possibly-out-of-range src values. This is not possible in lowp. If we try to multiply with values in [0x8001,0xffff], we'll just get garbage. We'll clamp them back in range, but sadly clamped garbage is still garbage. So the simplest thing to do is keep plus blends in floats. This CL doesn't even change that... we'd use floats before and after it. It just removes the lowp plus stage code that is both dead and buggy. As far as I can tell, no other drawing is currently gated by lowp missing clamp_1 or clamp_a. Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5 Reviewed-on: https://skia-review.googlesource.com/19909 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Experimental change to diagnose image diffs in g3Gravatar Mike Reed2017-06-08
| | | | | | | | Bug: skia: Change-Id: I33226a0266093a98083b4c78cdaba402ce3f3929 Reviewed-on: https://skia-review.googlesource.com/19082 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Ben Wagner <benjaminwagner@google.com>
* more easy lowp stagesGravatar Mike Klein2017-06-06
| | | | | | | Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc Reviewed-on: https://skia-review.googlesource.com/18722 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* more lowp blend modesGravatar Mike Klein2017-06-05
| | | | | | | | Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0 Reviewed-on: https://skia-review.googlesource.com/18627 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add some big easy stagesGravatar Mike Klein2017-06-05
| | | | | | | | | | srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float... this is enough for _lots_ of drawing. Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3 Reviewed-on: https://skia-review.googlesource.com/18622 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add constant_color, swap, move_dst_srcGravatar Mike Klein2017-06-05
| | | | | | | | | | This is enough for us to do some really simple draws. Also add some debug tools to help prioritize porting. Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776 Reviewed-on: https://skia-review.googlesource.com/18597 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* lowp: add move_src_dst and srcoverGravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | This is enough to run the bench SkRasterPipeline_compile. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 300 SkRasterPipeline_compile 48.4858ns After: 300 SkRasterPipeline_compile 37.5801ns Change-Id: Icb80348908dfb016826700a44566222c9f7a853c Reviewed-on: https://skia-review.googlesource.com/18595 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Real fix for stage perf regression.Gravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | | | | When we made start_pipeline() return void, the call into the tail!=0 run of the pipeline became eligble to be a tail-call, and Clang made that choice. This had the side effect of not going through vzeroupper on those tails. We now mark start_pipeline() as inelligible for tail calls when targeting AVX+. All paths go through the vzeroupper at the end. BUG=chromium:729237 Change-Id: I2099931284214f24c67b38979b3ad4b4d10e8bba Reviewed-on: https://skia-review.googlesource.com/18591 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>