aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper_generated_win.S
Commit message (Collapse)AuthorAge
* add stages for black and white colorsGravatar Mike Reed2017-07-06
| | | | | | | | | | | | | | histogram of test skps: black: 1/7 white: 2/7 other: 4/7 Bug: skia: Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361 Reviewed-on: https://skia-review.googlesource.com/21408 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Reed <reed@google.com>
* optimize for diff matrix typesGravatar Mike Reed2017-07-05
| | | | | | | | Bug: skia: Change-Id: I671e07c5bbb9e4ced92303c9959143324f7a6bdc Reviewed-on: https://skia-review.googlesource.com/21523 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Herb Derby <herb@google.com>
* 2pt conical stage for focal-point-outside caseGravatar Florin Malita2017-06-29
| | | | | | | | | | | | | | | | | A couple of annoyances here: 1) the prev vector_scale stage is not usable for masking, as NaN values can propagate through => switch to actual masking 2) for the outside case, we must select the min root when the gradient is flipped => split into two templated stages (_min, _max) (I'm not convinced that we need to flip the gradient for RP at all; we can investigate later) Change-Id: I0283812d613a53124f2987d1aea1f26e4533655e Reviewed-on: https://skia-review.googlesource.com/21162 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* 2pt conical stage for focal-pt-on-edge caseGravatar Florin Malita2017-06-28
| | | | | | | | | | | | | | | | | | | When the focal point is on the edge of the end circle, the quadratic equation devolves to linear. Add a stage to handle this case. As a complication, this case can produce "degenerate" values: 1) t == NaN 2) R(t) < 0 For these, we're supposed to draw transparent black - which means overwriting the color from the gradient stage. To support this, build a 0/1 vector mask in the context, and apply it post-gradient-stage. Change-Id: Ice4e3243abfd8c784bb810f6c310aed7a4ac7dc8 Reviewed-on: https://skia-review.googlesource.com/21111 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>
* 2ptconical stageGravatar Florin Malita2017-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Initial impl, for the well-behaved case (focal point inside). MBP numbers - Before: 3365.87 ! gradient_conical_clamp_shallow srgb 3590.88 ! gradient_conical_clamp_shallow_dither srgb 3376.91 ! gradient_conical_clamp_3color srgb 3351.64 ! gradient_conical_clamp_hicolor srgb 3379.35 ! gradient_conical_clamp srgb After: 648.93 ! gradient_conical_clamp_shallow srgb 665.12 ! gradient_conical_clamp_shallow_dither srgb 773.98 ! gradient_conical_clamp_3color srgb 1175.35 ! gradient_conical_clamp_hicolor srgb 619.17 ! gradient_conical_clamp srgb Change-Id: I07b22a758363e1f340a6041bca53bdef74229eb9 Reviewed-on: https://skia-review.googlesource.com/20906 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add bgra as 1st class formatGravatar Mike Klein2017-06-27
| | | | | | | | | | | | This is a start to eliminating swap_rb as a stage. I've just hit the main hot spots here. Going to look into the ~dozen other spots to see how they should work next. Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd Reviewed-on: https://skia-review.googlesource.com/20982 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* remove unused "swap" stageGravatar Mike Klein2017-06-27
| | | | | | | Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* specialize loaders for dst registers, to avoid move/swap stagesGravatar Mike Reed2017-06-27
| | | | | | | | Bug: skia: Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d Reviewed-on: https://skia-review.googlesource.com/20984 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* try not zeroing registers in start_pipelineGravatar Mike Klein2017-06-27
| | | | | | | | | | | | | | | | | Generally stages take care of state setup themselves, either with seed_shader, constant_color, a load, etc. I think these zeros may be unnecessarily cautious. This can't make anything draw more correctly, but it could make things - draw wrong - draw more slowly - draw more quickly so it's an interesting thing to try and keep an eye on. Change-Id: I7e5ea3cd79e55a65e1dbd214601e147ba3815b87 Reviewed-on: https://skia-review.googlesource.com/20976 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add _hsw lowp backendGravatar Mike Klein2017-06-27
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN Change-Id: Id53279c17589b3434629bb644358ee238af8649f Reviewed-on: https://skia-review.googlesource.com/20269 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com>
* somewhat less silly tail loads and storesGravatar Mike Klein2017-06-26
| | | | | | | | | | | | | | | | | | | | No reason to keep going one at a time when we know there are generally better ways to handle loading a power-of-two number of low lanes. This strategy scales up too, with quick answers for 8 (one 8 byte load), 12 (one 8 byte, one 4 byte), etc. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 46.946ns After: 43.341ns (This happens to be _lowp. Expect similar small speedups elsewhere.) Change-Id: I08f87769ea3c9f06ad13d2b1d5326e542b9b63a8 Reviewed-on: https://skia-review.googlesource.com/20903 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lean more on the compiler in lowp stagesGravatar Mike Klein2017-06-26
| | | | | | | | | | | | | | This refactors {from,to}_{byte,8888} to lean a bit more on the compiler, and to share code between the two. The algorithm is not exactly the same, but it's comparable, and the results of course are identical. This new algorithm is a lot easier to generalize to AVX2, and parallels the full-precision {from,to}_{byte,8888} functions in _stages.cpp. Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8 Reviewed-on: https://skia-review.googlesource.com/20828 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* use mul_inv instead of div for tilingGravatar Mike Reed2017-06-23
| | | | | | | | | Bug: skia: Change-Id: I5f88e7923fe204faba8dc5d87454805a4d470d52 Reviewed-on: https://skia-review.googlesource.com/20688 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* fix repeat/mirror sampling bleedGravatar Mike Klein2017-06-22
| | | | | | | | | | | | | | | | | I think this has been broken since we tried to simplify this in https://skia-review.googlesource.com/16547 The HSW backend does still look a little wrong, but improved, and the others seem fixed. Can you see how this affects your test cases, layout tests, etc? BUG=skia:6783 Change-Id: I17957ac8100331bea5b64d674bf43105048b72f6 Reviewed-on: https://skia-review.googlesource.com/20548 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* delete lowp plusGravatar Mike Klein2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | I have figured out how to implement lowp clamp_1/clamp_a, and implementing clamp_1 would make lowp plus active. But... the way we have factored blend modes requires us to be able to lerp between the dst and possibly-out-of-range src values. This is not possible in lowp. If we try to multiply with values in [0x8001,0xffff], we'll just get garbage. We'll clamp them back in range, but sadly clamped garbage is still garbage. So the simplest thing to do is keep plus blends in floats. This CL doesn't even change that... we'd use floats before and after it. It just removes the lowp plus stage code that is both dead and buggy. As far as I can tell, no other drawing is currently gated by lowp missing clamp_1 or clamp_a. Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5 Reviewed-on: https://skia-review.googlesource.com/19909 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Remove AVX+ special case for load<U8>().Gravatar Mike Klein2017-06-13
| | | | | | | | | We haven't really needed this since we got constants working. Change-Id: Ie9de8df861959696ed44fc2a64e259cca786bfe5 Reviewed-on: https://skia-review.googlesource.com/19670 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* use smarter float -> skfixed15 logic everywhereGravatar Mike Klein2017-06-06
| | | | | | | | | | | | | | | | | | | | | | This is the same logic from constant_color, covering all the other places where we convert from float to fixed, e.g. scale_1_float. This isn't quite ideal yet. We replace mulss+cvttss2si for addss+movd, which is great, but this leads to a silly sequence of code: addss %xmm2, %xmm0 movd %xmm0, %r9d movd %r9d, %xmm0 pshuflw $0x0, %xmm0, %xmm0 Those two movd are pointless... Again, all diffs due to switching from truncation to rounding. Change-Id: Icf6f3b6eb370fe41cea0cebcfda0b8907e055f41 Reviewed-on: https://skia-review.googlesource.com/18846 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* less naive lowp constant_colorGravatar Mike Klein2017-06-06
| | | | | | | | | | | This is as good as we can get without switching away from float inputs. All diffs due to rounding (from the +256.0f). Change-Id: I0d314f111d313577ce9078660178be17e865f11e Reviewed-on: https://skia-review.googlesource.com/18845 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* more easy lowp stagesGravatar Mike Klein2017-06-06
| | | | | | | Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc Reviewed-on: https://skia-review.googlesource.com/18722 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: only omit leaf frame pointersGravatar Mike Klein2017-06-05
| | | | | | | | | This seems to make Instruments work a lot better. Change-Id: I078f005d32e427b4eb31bc92c731e3444f2faffb Reviewed-on: https://skia-review.googlesource.com/18635 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* more lowp blend modesGravatar Mike Klein2017-06-05
| | | | | | | | Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0 Reviewed-on: https://skia-review.googlesource.com/18627 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add some big easy stagesGravatar Mike Klein2017-06-05
| | | | | | | | | | srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float... this is enough for _lots_ of drawing. Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3 Reviewed-on: https://skia-review.googlesource.com/18622 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* lowp: add constant_color, swap, move_dst_srcGravatar Mike Klein2017-06-05
| | | | | | | | | | This is enough for us to do some really simple draws. Also add some debug tools to help prioritize porting. Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776 Reviewed-on: https://skia-review.googlesource.com/18597 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* lowp: add move_src_dst and srcoverGravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | This is enough to run the bench SkRasterPipeline_compile. $ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300 Before: 300 SkRasterPipeline_compile 48.4858ns After: 300 SkRasterPipeline_compile 37.5801ns Change-Id: Icb80348908dfb016826700a44566222c9f7a853c Reviewed-on: https://skia-review.googlesource.com/18595 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* slight streamlining for lowp load_8888 with pshufbGravatar Mike Klein2017-06-05
| | | | | | | | | We can use 2 pshufb to replace 4 unpacks when deinterlacing the colors. Change-Id: I713fbbc94f5cb9eaf14f85323b0ec76dc2246e98 Reviewed-on: https://skia-review.googlesource.com/18531 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Real fix for stage perf regression.Gravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | | | | When we made start_pipeline() return void, the call into the tail!=0 run of the pipeline became eligble to be a tail-call, and Clang made that choice. This had the side effect of not going through vzeroupper on those tails. We now mark start_pipeline() as inelligible for tail calls when targeting AVX+. All paths go through the vzeroupper at the end. BUG=chromium:729237 Change-Id: I2099931284214f24c67b38979b3ad4b4d10e8bba Reviewed-on: https://skia-review.googlesource.com/18591 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* have start_pipeline() return limit againGravatar Mike Klein2017-06-05
| | | | | | | | | | | | | | | | | | | This is spooky. I don't quite yet understand why, but this makes things much faster. Performance regressed across the board when we no longer needed the value and changed it to return void: https://perf.skia.org/e/?begin=1496176469&keys=6994&xbaroffset=28513 You can see similar regressions following this Chromium bug link. BUG=chromium:729237 Change-Id: I68371b0456014f909acf819aca52aa4f4f187460 Reviewed-on: https://skia-review.googlesource.com/18580 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* start on SkJumper lowp modeGravatar Mike Klein2017-06-04
| | | | | | | | | | | | | | | | Just 3 stages implemented so far: load_8888 swap_rb store_8888 That's enough to make the shortest non-trivial pipeline that you see in the new unit test. Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc Reviewed-on: https://skia-review.googlesource.com/18458 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* plumb y through to SkJumperGravatar Mike Klein2017-06-01
| | | | | | | | | | | | | | | There'll still be a little more refactoring after this, but this is the main thing we want to do. This makes y available in a general-purpose register in pipeline stages, just like x. Stages that need y (seed_shader and dither) can just use it rather than pulling it off a context pointer. seed_shader loses its context pointer, and dither's gets simpler. Change-Id: Ic2d1e13b03fb45b73e308b38aafbb3a14c29cf7f Reviewed-on: https://skia-review.googlesource.com/18383 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* reland: We can mask load and store with just AVXGravatar Mike Klein2017-06-01
| | | | | | | | | | | | Originally reviewed here: https://skia-review.googlesource.com/c/17452/ CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-ShuttleA-GPU-GTX550Ti-x86_64-Release-Valgrind Change-Id: I2e593e897ce93147ec593c2a5de143217274ba2a Reviewed-on: https://skia-review.googlesource.com/18267 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clean up now that min_stride == 1Gravatar Mike Klein2017-05-31
| | | | | | | Change-Id: I1c285e818c3119bc5917dee6d7fbe4c0c62ff6d7 Reviewed-on: https://skia-review.googlesource.com/18153 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add tail handling for SSE* to SkJumper.Gravatar Herb Derby2017-05-25
| | | | | | | Change-Id: Icb9d385333082de2f99b7a25cfd7251717e3f663 Reviewed-on: https://skia-review.googlesource.com/17580 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
* add srcover_rgba_8888Gravatar Mike Klein2017-05-25
| | | | | | | Change-Id: Iabdc79183ccd2f9cc513d4bdc530fb078b1627ab Reviewed-on: https://skia-review.googlesource.com/17930 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make sure to_srgb maps 1 to 1Gravatar Mike Klein2017-05-25
| | | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android BUG=skia:6678,skia:6683 Change-Id: I217084fa0a11ad661a8751f0c3b1cade5cc52473 Reviewed-on: https://skia-review.googlesource.com/17902 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add stage for gaussian alpha to rgba for shadowsGravatar Mike Reed2017-05-24
| | | | | | | | | | speeds up GM:shadow_utils 20% Bug: skia: Change-Id: If52dd5e2c76ace82d06351af1419e0663a3a634f Reviewed-on: https://skia-review.googlesource.com/17844 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* remove min from repeat and mirror generallyGravatar Mike Klein2017-05-23
| | | | | | | | | | I don't think they do anything anymore after the inclusive/exclusive refactoring. Change-Id: I63f2e010a00953b5b6415de002bcb51ec2b73458 Reviewed-on: https://skia-review.googlesource.com/17490 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* retry tilers against 1Gravatar Mike Klein2017-05-23
| | | | | | | | | These weren't the Valgrind problem. Change-Id: Ic76df460cf0ce7f7ae8155d549f4e92a7f8de040 Reviewed-on: https://skia-review.googlesource.com/17701 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "We can mask load and store with just AVX."Gravatar Brian Osman2017-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 139e463dc6f965fdaed854efcb20c6cafbb6dbdc. Reason for revert: Crashes on Valgrind bots. Original change's description: > We can mask load and store with just AVX. > > Previously we were using AVX2 instructions to generate the masks, > and AVX2 instructions for the mask load and stores themselves. > > AVX came with float mask loads and stores, which will work perfectly > fine. I don't really get what the point of the 32-bit int loads and > stores are in AVX2, beyond maybe syntax sugar? > > Change-Id: I81fa55fb09daea4f5546f8c9ebbc886015edce51 > Reviewed-on: https://skia-review.googlesource.com/17452 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Ravi Mistry <rmistry@google.com> > TBR=mtklein@chromium.org,rmistry@google.com,herb@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I3a48f006c20475f6334ff94998281f381c696c93 Reviewed-on: https://skia-review.googlesource.com/17524 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Brian Osman <brianosman@google.com>
* Revert "add tilers against 1"Gravatar Brian Osman2017-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8110b849d60455d6fb594f26919f0f38c3ec9925. Reason for revert: Valgrind requires reverting ancestor commit. Original change's description: > add tilers against 1 > > Change-Id: I2482972a43cb89a93cbfb9e708614e0334002e53 > Reviewed-on: https://skia-review.googlesource.com/17483 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,herb@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Icbc3a2212f800854ef7b2b17aa99fedad182d53e Reviewed-on: https://skia-review.googlesource.com/17523 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Brian Osman <brianosman@google.com>
* add tilers against 1Gravatar Mike Klein2017-05-22
| | | | | | | Change-Id: I2482972a43cb89a93cbfb9e708614e0334002e53 Reviewed-on: https://skia-review.googlesource.com/17483 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* We can mask load and store with just AVX.Gravatar Mike Klein2017-05-22
| | | | | | | | | | | | | | Previously we were using AVX2 instructions to generate the masks, and AVX2 instructions for the mask load and stores themselves. AVX came with float mask loads and stores, which will work perfectly fine. I don't really get what the point of the 32-bit int loads and stores are in AVX2, beyond maybe syntax sugar? Change-Id: I81fa55fb09daea4f5546f8c9ebbc886015edce51 Reviewed-on: https://skia-review.googlesource.com/17452 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Ravi Mistry <rmistry@google.com>
* tidy up dither stageGravatar Mike Klein2017-05-20
| | | | | | | | | | Using the float iota was just an expedient to write the stage... this CL adds the U32 iota that dither really wants. Change-Id: I7990b10afd0c5277186b6b8e730245d291bcef0c Reviewed-on: https://skia-review.googlesource.com/17441 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* stage version of verticesGravatar Mike Reed2017-05-19
| | | | | | | | | | | | | | | | This CL, just to limit its size/complexity, only handles colors but not textures. Future CLs will cover everything. Performance is pretty exciting. Its faster than the old code-path, and when we fix a bug in pathutils to preserve opaqueness, it gets a lot faster (8 -> 5) Bug: skia: Change-Id: I4113060e25fe25fe4e6a0ea59bd4fa5e33abc668 Reviewed-on: https://skia-review.googlesource.com/17276 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* try Herb's new to_srgbGravatar Mike Klein2017-05-16
| | | | | | | | | | This was 6-8% faster than the previous code on my Trashcan. Change-Id: I70081009e233c83226d6d302f871fb7e86cdc438 Reviewed-on: https://skia-review.googlesource.com/16986 Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clamp to premul in ditherGravatar Mike Klein2017-05-16
| | | | | | | | | | | | | Dither can bump color values above alpha (duh), or below zero (duh), so clamp back to premul after dithering. BUG=skia:6644,skia:6643 Change-Id: Ida107e866380e06130af0d01467117bca929ba44 Reviewed-on: https://skia-review.googlesource.com/17070 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* fix SkJumper radial gradient precisionGravatar Mike Klein2017-05-15
| | | | | | | | | | | | rcp(rsqrt(x)) doesn't have enough precision when x is a coordinate. (It's fine when x is a color, like in the softlight blend mode.) Adds a GM to test this. It used to look quite ugly. Change-Id: Icec295c2e2f50ae7a5e3e33c62270f632a58f65c Reviewed-on: https://skia-review.googlesource.com/16914 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add evenly spaced stops and unify gradient contextsGravatar Herb Derby2017-05-15
| | | | | | | Change-Id: I17ac13b9d1ea6765e2c1a2b53aa6975eab408856 Reviewed-on: https://skia-review.googlesource.com/16713 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* composeshader stagesGravatar Mike Reed2017-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | needed to add two helper stages for composeshader load_rgba, store_rgba These just read/write the r,g,b,a registers to context memory, making no promise as to how the memory is formatted (e.g. interleaved -vs- planar). Note that we have similar existing stages, but they did not seem to suit: constant_color This guy loads 4 floats from memory, and splats them into registers. I need to load 4 entire registers. load_f32, store_f32 These offset where they read/write based on the 'x' register, plus they guarantee that the memory will be interleaved ala SkPM4f. Bug: skia: Change-Id: Iaa81f950660b837bdb34416ab3e342d56a92239b Reviewed-on: https://skia-review.googlesource.com/16716 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* fix SkJumperHSL blend modesGravatar Mike Klein2017-05-12
| | | | | | | | | | | | | | | I took a new, unprincipled approach here, which is to just mimic the legacy code path exactly (e.g. hue_modeproc in SkXfermode.cpp). This fixes how we handle alpha in these blend modes, and ought to be faster by avoiding the unpremul. BUG=skia:6621 Change-Id: I21635290985ff42d9421d2718f7a88cf44a85d7f Reviewed-on: https://skia-review.googlesource.com/16711 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* refactor gradient stage namesGravatar Mike Klein2017-05-12
| | | | | | | | | | | | | | | | | | | | | | This is just a name refactor and I'm happy to delay it until we're done with the current wave of gradient CLs. The main ideas: - we use the "linear_gradient" stages for all gradients, so cut the "linear" and just call them "gradient"; - remind ourselves that the 2-stop stage requires even spacing, i.e. stops at 0 and 1. This name should harmonize with Herb's new general evenly spaced gradient stage, currently "evenly_spaced_linear_gradient", and after it lands and I rebase, "evenly_spaced_gradient" - remind ourselves which polar coordinate xy_to_polar_unit returns, the angle. Change-Id: I0fd0c8bd4c1ead7d2d0fff45a199d318b71f34ac Reviewed-on: https://skia-review.googlesource.com/16500 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>