aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper_stages_lowp.cpp
Commit message (Collapse)AuthorAge
* add srcover_bgra_8888Gravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really doing them any good. Probably a good idea to cover both kN32 options any time we specialize like this? There's one small diff, so I've lazily guarded this by SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway. Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478 Reviewed-on: https://skia-review.googlesource.com/63301 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp shader stagesGravatar Mike Klein2017-10-24
| | | | | | | | | | | | | | This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on lowp shadersGravatar Mike Klein2017-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | We're going to want to assign types to the stages depending on their inputs and outputs: GG: x,y -> x,y GP: x,y -> r,g,b,a PP: r,g,b,a -> r,g,b,a (There are a couple other degenerate cases here, where a stage ignores its inputs or creates no outputs, but we can always just pretend their null input or output is one type or the other arbitrarily.) The GG stages will be pretty much entirely float code, and the GP stages a mix of float math and byte stuff. Since we've chosen U16 to match our register size in _lowp land, we'll unpack each F register across two of those for transport between stages. This is a notional, free operation in both directions. Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715 Reviewed-on: https://skia-review.googlesource.com/60800 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rename (x,y) to (dx,dy)Gravatar Mike Klein2017-10-18
| | | | | | | | | | | | | | | Today (x,y) are the integer coordinates of the first destination pixel we're working on. By renaming them (dx,dy), we free up the names (x,y) for working (i.e. _source_) x and y. Until now we've generally just been continuing to call those (r,g), but in the _lowp code that won't be possible (r+g hold x together, b+a y) but we'll have the ability to just give them proper names x and y. Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f Reviewed-on: https://skia-review.googlesource.com/60820 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Retry, Bump stored lowp uniform color to 16-bit storage.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | This makes loading into 16-bit channels more natural in _lowp.cpp. Update a unit test to stop using out-of-range "colors". Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow Reviewed-on: https://skia-review.googlesource.com/47580 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Implement some easy _lowp stages.Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | - load_565 allows 565-src sprite blits - scale_565 / lerp_565 allow subpixel text - luminance_to_alpha is a color filter, and lets us write grey 8 And update CachedDecodingPixelRefTest with a yet more robust color. Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526 Reviewed-on: https://skia-review.googlesource.com/47021 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Revert "Bump stored lowp uniform color to 16-bit storage."Gravatar Mike Klein2017-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30. Reason for revert: ../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short' Excellent new bot! Original change's description: > Bump stored lowp uniform color to 16-bit storage. > > This makes loading into 16-bit channels more natural in _lowp.cpp. > > Change-Id: I1ed393873654060ef52f4632d670465528006bbd > Reviewed-on: https://skia-review.googlesource.com/47261 > Reviewed-by: Mike Reed <reed@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,reed@google.com Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/47540 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Bump stored lowp uniform color to 16-bit storage.Gravatar Mike Klein2017-09-16
| | | | | | | | | This makes loading into 16-bit channels more natural in _lowp.cpp. Change-Id: I1ed393873654060ef52f4632d670465528006bbd Reviewed-on: https://skia-review.googlesource.com/47261 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* centralize SI, Ctx, and load_and_inc()Gravatar Mike Klein2017-09-15
| | | | | | | | | | | | | | | | | We've got independent definitions of SI, LazyCtx/Ctx, and load_and_inc() in _stages.cpp and _lowp.cpp. It's a good time to centralize them, taking _stages.cpp's SI and load_and_inc(), and _lowp's Ctx. SI and load_and_inc() are uninterestingly different. But using _lowp's Ctx will let us get its prettier typed stage definitions into _stages.cpp, but that is not not done here. This is a pure refactor with no generated code changes. Change-Id: I53260b0fdc71a77bf9e3ed6f3df3a2a4cbd2392b Reviewed-on: https://skia-review.googlesource.com/47181 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* fix android rollGravatar Mike Klein2017-09-14
| | | | | | | | | | | | | | | | Guarding loads of 8-15 with defined(__AVX2__) should prevent errors like these: external/skia/src/jumper/SkJumper_stages_lowp.cpp:287:46: error: 'memcpy' called with size bigger than buffer case 12: memcpy(&v, ptr, 12*sizeof(T)); break; The loads of 8-15 were of course unreachable, given the &(N-1) == &7. Change-Id: Ifcb5c177c6909e1df55cb564779a4d6610ff7b32 Reviewed-on: https://skia-review.googlesource.com/46521 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* grand unifried lowp stagesGravatar Mike Klein2017-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my laptop. I'm feeling confident that we can make this perform well. After looking at performance a bit more today, it looks like everything is within what I'd consider comparable in performance, especially on ARM. On x86-64 it looks like big bulk blits get a little slower and small mask blits get a little faster. Quality looks good, and maybe improved for 565. There are fewer platform-specific differences now in _lowp, and I think they're few enough now that we could even consider completing the unification by folding the 8-bit and float code together. Rename "div255()" to "rebias()", slap on a few coats of paint... Guarded for Chrome with SK_JUMPER_LEGACY_LOWP. Change-Id: I36309c07cf736f3cb31952cca66030ad56026318 Reviewed-on: https://skia-review.googlesource.com/45982 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* 15-bit lowp is dead, long live 8-bit lowpGravatar Mike Klein2017-08-04
| | | | | | | Change-Id: Icc4b06094aeba3af99b534746f66286d776ef78a Reviewed-on: https://skia-review.googlesource.com/30920 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Store float and byte constant colors.Gravatar Mike Klein2017-08-03
| | | | | | | | | This makes loading them much simpler in 8-bit mode. Change-Id: I35ff34ebd0b93425c4e39e055bf4ade8cf8561e1 Reviewed-on: https://skia-review.googlesource.com/30621 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* convert over to 2d-modeGravatar Mike Klein2017-07-20
| | | | | | | | | | | | [√] convert all stages to use SkJumper_MemoryCtx / be 2d-compatible [√] convert compile to 2d also, remove 1d run/compile [√] convert all call sites [√] no diffs Change-Id: I3b806eb8fe0c3ec043359616409f7cd1211a1e43 Reviewed-on: https://skia-review.googlesource.com/24263 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* start on raster pipeline 2d modeGravatar Mike Klein2017-07-18
| | | | | | | | | | - Add run_2d(x,y,w,h) and start_pipeline_2d(). - Add and test a 2d-compatible store_8888_2d stage. Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a Reviewed-on: https://skia-review.googlesource.com/24401 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* minor fixes to start_pipeline_lowpGravatar Mike Klein2017-07-18
| | | | | | | | | | - in _lowp.cpp, JUMPER is always defined, so no need to check. - the return type of this function has been void for a while. Change-Id: I5271e8dab784f46c7ffa9cfba6eb55b5e399b537 Reviewed-on: https://skia-review.googlesource.com/24326 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add stages for black and white colorsGravatar Mike Reed2017-07-06
| | | | | | | | | | | | | | histogram of test skps: black: 1/7 white: 2/7 other: 4/7 Bug: skia: Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361 Reviewed-on: https://skia-review.googlesource.com/21408 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Reed <reed@google.com>
* add bgra as 1st class formatGravatar Mike Klein2017-06-27
| | | | | | | | | | | | This is a start to eliminating swap_rb as a stage. I've just hit the main hot spots here. Going to look into the ~dozen other spots to see how they should work next. Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd Reviewed-on: https://skia-review.googlesource.com/20982 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* remove unused "swap" stageGravatar Mike Klein2017-06-27
| | | | | | | Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* specialize loaders for dst registers, to avoid move/swap stagesGravatar Mike Reed2017-06-27
| | | | | | | | Bug: skia: Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d Reviewed-on: https://skia-review.googlesource.com/20984 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* try not zeroing registers in start_pipelineGravatar Mike Klein2017-06-27
| | | | | | | | | | | | | | | | | Generally stages take care of state setup themselves, either with seed_shader, constant_color, a load, etc. I think these zeros may be unnecessarily cautious. This can't make anything draw more correctly, but it could make things - draw wrong - draw more slowly - draw more quickly so it's an interesting thing to try and keep an eye on. Change-Id: I7e5ea3cd79e55a65e1dbd214601e147ba3815b87 Reviewed-on: https://skia-review.googlesource.com/20976 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _hsw lowp backendGravatar Mike Klein2017-06-27
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN Change-Id: Id53279c17589b3434629bb644358ee238af8649f Reviewed-on: https://skia-review.googlesource.com/20269 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com>
* somewhat less silly tail loads and storesGravatar Mike Klein2017-06-26
| | | | | | | | | | | | | | | | | | | | No reason to keep going one at a time when we know there are generally better ways to handle loading a power-of-two number of low lanes. This strategy scales up too, with quick answers for 8 (one 8 byte load), 12 (one 8 byte, one 4 byte), etc. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 46.946ns After: 43.341ns (This happens to be _lowp. Expect similar small speedups elsewhere.) Change-Id: I08f87769ea3c9f06ad13d2b1d5326e542b9b63a8 Reviewed-on: https://skia-review.googlesource.com/20903 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lean more on the compiler in lowp stagesGravatar Mike Klein2017-06-26
| | | | | | | | | | | | | | This refactors {from,to}_{byte,8888} to lean a bit more on the compiler, and to share code between the two. The algorithm is not exactly the same, but it's comparable, and the results of course are identical. This new algorithm is a lot easier to generalize to AVX2, and parallels the full-precision {from,to}_{byte,8888} functions in _stages.cpp. Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8 Reviewed-on: https://skia-review.googlesource.com/20828 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* rephrase lowp constant_colorGravatar Mike Klein2017-06-19
| | | | | | | | | | | | | | | This doesn't change the generated code (no .S files change), but it does rephrase what we're trying to do to make it generalize to AVX2 better: - load 4 floats - add 256.0f to each - splat out the low 2 bytes of each 4 byte lane as r,g,b,a Change-Id: Iadc5bc1f2a268679d1ccadd31cd24949a71e0aa4 Reviewed-on: https://skia-review.googlesource.com/20270 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove defined(JUMPER) guards in _lowp.cppGravatar Mike Klein2017-06-19
| | | | | | | | | | JUMPER is always defined in that file; we never use it as a portable fallback. Change-Id: Ic7caf726191599d4058adbf80084ede9f80676ee Reviewed-on: https://skia-review.googlesource.com/20271 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* delete lowp plusGravatar Mike Klein2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | I have figured out how to implement lowp clamp_1/clamp_a, and implementing clamp_1 would make lowp plus active. But... the way we have factored blend modes requires us to be able to lerp between the dst and possibly-out-of-range src values. This is not possible in lowp. If we try to multiply with values in [0x8001,0xffff], we'll just get garbage. We'll clamp them back in range, but sadly clamped garbage is still garbage. So the simplest thing to do is keep plus blends in floats. This CL doesn't even change that... we'd use floats before and after it. It just removes the lowp plus stage code that is both dead and buggy. As far as I can tell, no other drawing is currently gated by lowp missing clamp_1 or clamp_a. Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5 Reviewed-on: https://skia-review.googlesource.com/19909 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* use smarter float -> skfixed15 logic everywhereGravatar Mike Klein2017-06-06
| | | | | | | | | | | | | | | | | | | | | | This is the same logic from constant_color, covering all the other places where we convert from float to fixed, e.g. scale_1_float. This isn't quite ideal yet. We replace mulss+cvttss2si for addss+movd, which is great, but this leads to a silly sequence of code: addss %xmm2, %xmm0 movd %xmm0, %r9d movd %r9d, %xmm0 pshuflw $0x0, %xmm0, %xmm0 Those two movd are pointless... Again, all diffs due to switching from truncation to rounding. Change-Id: Icf6f3b6eb370fe41cea0cebcfda0b8907e055f41 Reviewed-on: https://skia-review.googlesource.com/18846 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* less naive lowp constant_colorGravatar Mike Klein2017-06-06
| | | | | | | | | | | This is as good as we can get without switching away from float inputs. All diffs due to rounding (from the +256.0f). Change-Id: I0d314f111d313577ce9078660178be17e865f11e Reviewed-on: https://skia-review.googlesource.com/18845 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp stagesGravatar Mike Klein2017-06-06
| | | | | | | Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc Reviewed-on: https://skia-review.googlesource.com/18722 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* more lowp blend modesGravatar Mike Klein2017-06-05
| | | | | | | | Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0 Reviewed-on: https://skia-review.googlesource.com/18627 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add some big easy stagesGravatar Mike Klein2017-06-05
| | | | | | | | | | srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float... this is enough for _lots_ of drawing. Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3 Reviewed-on: https://skia-review.googlesource.com/18622 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add constant_color, swap, move_dst_srcGravatar Mike Klein2017-06-05
| | | | | | | | | | This is enough for us to do some really simple draws. Also add some debug tools to help prioritize porting. Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776 Reviewed-on: https://skia-review.googlesource.com/18597 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* lowp: add move_src_dst and srcoverGravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | This is enough to run the bench SkRasterPipeline_compile. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 300 SkRasterPipeline_compile 48.4858ns After: 300 SkRasterPipeline_compile 37.5801ns Change-Id: Icb80348908dfb016826700a44566222c9f7a853c Reviewed-on: https://skia-review.googlesource.com/18595 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* slight streamlining for lowp load_8888 with pshufbGravatar Mike Klein2017-06-05
| | | | | | | | | We can use 2 pshufb to replace 4 unpacks when deinterlacing the colors. Change-Id: I713fbbc94f5cb9eaf14f85323b0ec76dc2246e98 Reviewed-on: https://skia-review.googlesource.com/18531 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* have start_pipeline() return limit againGravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | | | | | | This is spooky. I don't quite yet understand why, but this makes things much faster. Performance regressed across the board when we no longer needed the value and changed it to return void: https://perf.skia.org/e/?begin=1496176469&keys=6994&xbaroffset=28513 You can see similar regressions following this Chromium bug link. BUG=chromium:729237 Change-Id: I68371b0456014f909acf819aca52aa4f4f187460 Reviewed-on: https://skia-review.googlesource.com/18580 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on SkJumper lowp modeGravatar Mike Klein2017-06-04
Just 3 stages implemented so far: load_8888 swap_rb store_8888 That's enough to make the shortest non-trivial pipeline that you see in the new unit test. Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc Reviewed-on: https://skia-review.googlesource.com/18458 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>