aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/SkJumper_generated_win.S
Commit message (Collapse)AuthorAge
* try Herb's new to_srgbGravatar Mike Klein2017-05-16
| | | | | | | | | | This was 6-8% faster than the previous code on my Trashcan. Change-Id: I70081009e233c83226d6d302f871fb7e86cdc438 Reviewed-on: https://skia-review.googlesource.com/16986 Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clamp to premul in ditherGravatar Mike Klein2017-05-16
| | | | | | | | | | | | | Dither can bump color values above alpha (duh), or below zero (duh), so clamp back to premul after dithering. BUG=skia:6644,skia:6643 Change-Id: Ida107e866380e06130af0d01467117bca929ba44 Reviewed-on: https://skia-review.googlesource.com/17070 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* fix SkJumper radial gradient precisionGravatar Mike Klein2017-05-15
| | | | | | | | | | | | rcp(rsqrt(x)) doesn't have enough precision when x is a coordinate. (It's fine when x is a color, like in the softlight blend mode.) Adds a GM to test this. It used to look quite ugly. Change-Id: Icec295c2e2f50ae7a5e3e33c62270f632a58f65c Reviewed-on: https://skia-review.googlesource.com/16914 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add evenly spaced stops and unify gradient contextsGravatar Herb Derby2017-05-15
| | | | | | | Change-Id: I17ac13b9d1ea6765e2c1a2b53aa6975eab408856 Reviewed-on: https://skia-review.googlesource.com/16713 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* composeshader stagesGravatar Mike Reed2017-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | needed to add two helper stages for composeshader load_rgba, store_rgba These just read/write the r,g,b,a registers to context memory, making no promise as to how the memory is formatted (e.g. interleaved -vs- planar). Note that we have similar existing stages, but they did not seem to suit: constant_color This guy loads 4 floats from memory, and splats them into registers. I need to load 4 entire registers. load_f32, store_f32 These offset where they read/write based on the 'x' register, plus they guarantee that the memory will be interleaved ala SkPM4f. Bug: skia: Change-Id: Iaa81f950660b837bdb34416ab3e342d56a92239b Reviewed-on: https://skia-review.googlesource.com/16716 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Reed <reed@google.com>
* fix SkJumperHSL blend modesGravatar Mike Klein2017-05-12
| | | | | | | | | | | | | | | I took a new, unprincipled approach here, which is to just mimic the legacy code path exactly (e.g. hue_modeproc in SkXfermode.cpp). This fixes how we handle alpha in these blend modes, and ought to be faster by avoiding the unpremul. BUG=skia:6621 Change-Id: I21635290985ff42d9421d2718f7a88cf44a85d7f Reviewed-on: https://skia-review.googlesource.com/16711 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* refactor gradient stage namesGravatar Mike Klein2017-05-12
| | | | | | | | | | | | | | | | | | | | | | This is just a name refactor and I'm happy to delay it until we're done with the current wave of gradient CLs. The main ideas: - we use the "linear_gradient" stages for all gradients, so cut the "linear" and just call them "gradient"; - remind ourselves that the 2-stop stage requires even spacing, i.e. stops at 0 and 1. This name should harmonize with Herb's new general evenly spaced gradient stage, currently "evenly_spaced_linear_gradient", and after it lands and I rebase, "evenly_spaced_gradient" - remind ourselves which polar coordinate xy_to_polar_unit returns, the angle. Change-Id: I0fd0c8bd4c1ead7d2d0fff45a199d318b71f34ac Reviewed-on: https://skia-review.googlesource.com/16500 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Revert "Evenly space gradient stage."Gravatar Mike Klein2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 892501d09bc8608704362235c73a59bb23a386b3. Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=721682 :( Original change's description: > Evenly space gradient stage. > > This seems like an experiment at this point because I don't know how to do > this kind of thing on arm. > > > Numbers from Skylake... > Before: > ./out/Release/nanobench --config srgb \ > --match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13 > Timer overhead: 36.7ns > ! -> high variance, ? -> moderate variance > micros bench > 439.92 ? gradient_linear_clamp_3color srgb > 2697.60 gradient_linear_clamp_hicolor srgb > 437.28 gradient_linear_clamp_3color_4f srgb > 2700.50 gradient_linear_clamp_hicolor_4f srgb > > > After: > micros bench > 382.35 gradient_linear_clamp_3color srgb > 593.49 gradient_linear_clamp_hicolor srgb > 382.36 gradient_linear_clamp_3color_4f srgb > 565.60 gradient_linear_clamp_hicolor_4f srgb > > > Numbers on my Mac Trashcan are about even; there is no > speedup or slowdown between master and this change. > > Change-Id: I04402452e23c0888512362fd1d6d5436cea61719 > Reviewed-on: https://skia-review.googlesource.com/15960 > Commit-Queue: Herb Derby <herb@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,mtklein@google.com,herb@google.com,fmalita@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ic6a064c66686b6f238ca1417ba1abd9ce25de1b4 Reviewed-on: https://skia-review.googlesource.com/16660 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Evenly space gradient stage.Gravatar herb2017-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This seems like an experiment at this point because I don't know how to do this kind of thing on arm. Numbers from Skylake... Before: ./out/Release/nanobench --config srgb \ --match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13 Timer overhead: 36.7ns ! -> high variance, ? -> moderate variance micros bench 439.92 ? gradient_linear_clamp_3color srgb 2697.60 gradient_linear_clamp_hicolor srgb 437.28 gradient_linear_clamp_3color_4f srgb 2700.50 gradient_linear_clamp_hicolor_4f srgb After: micros bench 382.35 gradient_linear_clamp_3color srgb 593.49 gradient_linear_clamp_hicolor srgb 382.36 gradient_linear_clamp_3color_4f srgb 565.60 gradient_linear_clamp_hicolor_4f srgb Numbers on my Mac Trashcan are about even; there is no speedup or slowdown between master and this change. Change-Id: I04402452e23c0888512362fd1d6d5436cea61719 Reviewed-on: https://skia-review.googlesource.com/15960 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* proposed: inclusive gradients, exclusive imagesGravatar Mike Klein2017-05-11
| | | | | | | | Change-Id: I5821f823a4c0df54d4388a2f455767f58ae646b8 Reviewed-on: https://skia-review.googlesource.com/16547 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fix alpha coverage for lerp_565 stage.Gravatar bungeman2017-05-10
| | | | | | | | | Like three gray with alpha blends and taking the max alpha. Change-Id: I104c84f784979030744127f8f66905ad9d1bdf0e Reviewed-on: https://skia-review.googlesource.com/15898 Commit-Queue: Ben Wagner <bungeman@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Add radial gradient stage.Gravatar Herb Derby2017-05-09
| | | | | | | Change-Id: Ie1f9640f5149f21bd8b3b864ff8b176232e1b0a9 Reviewed-on: https://skia-review.googlesource.com/15461 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
* jumper, finish blend modesGravatar Mike Klein2017-05-08
| | | | | | | | | | | | | | I've decided to ignore our existing CPU implementations and start from scratch, mostly referencing the GL ES 3.2 spec and w3 spec. This implementation ought to look a lot like the reference implementation I've written in gm/hsl.cpp, with the addition of handling alpha: unpremul, blend, re-premul with a simple SrcOver alpha. Change-Id: I38cf6be2dc66a6f46d7b18b91847f6933d2fab62 Reviewed-on: https://skia-review.googlesource.com/15316 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* move dither after the transfer functionGravatar Mike Klein2017-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no single dither rate that we can use in linear space if we're using a non-linear transfer function... if it's too high (like today) we'll dither too much around zero (e.g. 0 -> 5), and if it's too low we won't dither near one. We were thinking it'd be a good idea to move the dither later in the pipeline anyway. This has to be the right spot! Now that we're moving dither from operating in linear space to operating in bit-space (after transfer function, aware of bit-depth of destination) we have to start clamping [0,1] instead of [0,a]... But, I think I've rewritten things to make sure we don't need to clamp at all. The main idea is to make sure 0-dither and 1+dither round to 0 and 1 respectively. We can do this by making the dither span exclusive, switching from [-0.5,+0.5] to (-0.5,+0.5). In practice I'm doing that as [-0.4921875,+0.4921875], a maximum dither of 63/128 of a bit. Similarly, I don't think it makes sense to fold in the multiply by alpha anymore if we're after the transfer function. Change-Id: I55857bca80377c639fcdd29acc9b362931dd9d12 Reviewed-on: https://skia-review.googlesource.com/15254 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* dither stageGravatar Mike Klein2017-05-03
| | | | | | | | | | | | | | | | I think we can dither generically as a pipeline stage. I'm not married to where the dither happens, or the implementation, which is mostly cribbed from https://en.wikipedia.org/wiki/Ordered_dithering. BUG=skia:3302,skia:6224 Change-Id: If7f6b22a523ca0b34cb03c0aa97b6734c34e0133 Reviewed-on: https://skia-review.googlesource.com/15161 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Add sweep gradient to SkRasterPipelineGravatar Herb Derby2017-05-03
| | | | | | | | | | | | | | | | This is a prototype. I have remove the tiling, but maybe I should add it back in. I thinking about factoring out the common code with linear gradient in its own CL. I think radial gradient will be very close to this. Change-Id: I1dfcb4f944138ee623afdf10b2a8befde797c604 Reviewed-on: https://skia-review.googlesource.com/13766 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Herb Derby <herb@google.com>
* finish up constantsGravatar Mike Klein2017-05-01
| | | | | | | | | | | | | | | | | | For whatever reason, if I swap the condition in the if_then_else tests from < to >= and swap the then/else values, I can use constants in hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it has something to do with SSE, IEEE, and NaN, but I don't care enough to speculate any more concretely. This does that, removes C() and _f, updates some comments, and adds a guard in build_stages.py to yell if it sees trouble like LCPI40_4... This reminds me to try -ffast-math soon. I think that was mostly held back by constants. Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3 Reviewed-on: https://skia-review.googlesource.com/14942 Reviewed-by: Herb Derby <herb@google.com>
* refactor hsl_to_rgb a touchGravatar Mike Klein2017-05-01
| | | | | | | | | | | | | | This rewrites the existing logic to expose more of the symmetries, and especially to make them clearly identical subexpressions. I think it's clear that the intent in hue_to_rgb is to wrap the t value back into 0-1... that's t = fract(t). No GM diffs. Change-Id: I9d62d8f80bcb45711ee334f953d3f6410e068ce4 Reviewed-on: https://skia-review.googlesource.com/14940 Reviewed-by: Herb Derby <herb@google.com>
* fix t / t2 confusion in hsl_to_rgbGravatar Mike Klein2017-05-01
| | | | | | | | | | | | | | | | | | | | | | | | I think the original[1] SkRasterPipeline_opts.h version had a typo that I faithfully copied over to hsl_to_rgb. In Hue2RGB[2], the scalar equivalent of hue_to_rgb, we mutate t to keep it in 0-1 range[3]. In the SkRasterPipeline_opts.h code we introduced a new value t2 instead, and then used it everywhere, but accidentally typed 't' in the t < 1/6 case. The expression "p + (q - p)*6.0f*t" should have been "p + (q - p)*6.0f*t2". This fixes things by changing t in place, much like Hue2RGB does. The GM doesn't change anywhere, which is troubling. [1] https://skia-review.googlesource.com/c/7460/21/src/opts/SkRasterPipeline_opts.h#808 [2] https://skia-review.googlesource.com/c/7460/21/src/effects/SkHighContrastFilter.cpp#26 [3] I think this whole clamp should probably become "t = fract(t)". Will follow up. Change-Id: I3dcc1a79ffae46830178d931844ee3113f8bdfd1 Reviewed-on: https://skia-review.googlesource.com/14910 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* getting close on float constantsGravatar Mike Klein2017-05-01
| | | | | | | | | | | | | I think I'm now down to just the constants used in comparisons in hsl_to_rgb... so weird. I don't get what's special about this code. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I398139f00371bdbc45281a49bca56d02ba59cba9 Reviewed-on: https://skia-review.googlesource.com/14909 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* some float constantsGravatar Mike Klein2017-05-01
| | | | | | | | | | | | Trying to go slowly to find where problems arise. Weirdly, I think I got everything except hsl_to_rgb. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I4d85a4c1f40bd87e7cb18fc9b5ce020812dc31db Reviewed-on: https://skia-review.googlesource.com/14905 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, remove C(int)Gravatar Mike Klein2017-04-27
| | | | | | | | | | | This finishes off integer constants... they should all be normal now. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I66ecc6533807fc59bb5ac9d3c5f7ab9e6e1f0d7f Reviewed-on: https://skia-review.googlesource.com/14528 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, replace _i with normal constantsGravatar Mike Klein2017-04-27
| | | | | | | | | | | | | | | So far I only seem to be encountering constant pools with float constants, so integer constants should be easy to make normal. This just removes _i. There might be a couple integer constants generated with C() too... they'll be the next CL. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: Icc82cbc660d1e33bcdb5282072fb86cb5190d901 Reviewed-on: https://skia-review.googlesource.com/14527 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* clear out C(), _i, and _f constants from SkJumper_vectors.hGravatar Mike Klein2017-04-27
| | | | | | | | | | | | I think this is just gonna be a bunch of baby steps for now. This gets everything in SkJumper_vectors.h. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: Ic87faa9bf6380a4fc9a577936dad8c3a9c48472e Reviewed-on: https://skia-review.googlesource.com/14441 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>
* prep for more constantsGravatar Mike Klein2017-04-26
| | | | | | | | | | | | | - Add -z to print zero bytes instead of ... - avx+hsw will create 32-byte constants in .const, so we should disassemble those too, and align to 32 bytes. - The default _text section on Windows is 16-byte aligned, so we make a new one that's 32-byte aligned. Change-Id: Icb2a962baa4c3735e98a992f2285eaf5cb1680fd Reviewed-on: https://skia-review.googlesource.com/14364 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* remove to_2dot2 and from_2dot2Gravatar Mike Klein2017-04-26
| | | | | | | | | | The parametric_{r,g,b} stages are just as good now; under the hood it's all going through approx_powf. Change-Id: If7f3ae1e24fcee2ddb201c1d66ce1dd64820c89a Reviewed-on: https://skia-review.googlesource.com/14320 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, maybe we can just use constantsGravatar Mike Klein2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | As long as everything is laid out the same way they were originally, I don't think there's any reason we can't just use %rip-relative addressing on x86-64. Basically, we just need to keep all the sections together in order. Somewhat subtly we cannot just use -D to disassemble all sections. -D will double-disassemble[1] some bytes, which throws off our %rip-relative addressing of constants. You can see this in PS1. So we whitelist sections instead. [1], from man objdump: This option also has a subtle effect on the disassembly of instructions in code sections. When option -d is in effect objdump will assume that any symbols present in a code section occur on the boundary between instructions and it will refuse to disassemble across such a boundary. When option -D is in effect however this assumption is supressed. This means that it is possible for the output of -d and -D to differ if, for example, data is stored in code sections. Change-Id: Idbcfe08e67113b3f7d75749931c640ff90aa0bf4 Reviewed-on: https://skia-review.googlesource.com/14029 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* jumper, implement 2.2 stages with approx_powfGravatar Mike Klein2017-04-21
| | | | | | | | | | My main interest is getting rid of weird code, but it's also faster. The new bench drops from 667 to 412. Change-Id: Ibf889601284cf925780320c828394f79937dc705 Reviewed-on: https://skia-review.googlesource.com/14035 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, lab_to_xyzGravatar Mike Klein2017-04-21
| | | | | | | | | | I don't suppose you know any existing test coverage of this? I can't seem to trigger any... Change-Id: I7244053e2fb665888c9443d2bc52c5a23725ec53 Reviewed-on: https://skia-review.googlesource.com/13970 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, rework callback a bit, use it for color_lookup_tableGravatar Mike Klein2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | Looks like the color-space images have this well tested (even without lab_to_xyz) and the diffs look like rounding/FMA. The old plan to keep loads and stores outside callback was: 1) awkward, with too many pointers and pointers to pointers to track 2) misguided... load and store stages march ahead by x, working at ptr+0, ptr+8, ptr+16, etc. while callback always wants to be working at the same spot in the buffer. I spent a frustrating day in lldb to understood 2). :/ So now the stage always store4's its pixels to a buffer in the context before the callback, and when the callback returns it load4's them back from a pointer in the context, defaulting to that same buffer. Instead of passing a void* into the callback, we pass the context itself. This lets us subclass the context and add our own data... C-compatible object-oriented programming. Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43 Reviewed-on: https://skia-review.googlesource.com/13985 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* more symmetry for from_half/to_halfGravatar Mike Klein2017-04-20
| | | | | | | | | | | | | | | | Tweaks to make the parallels between from_half and to_half stand out. We can logically do the `auto denorm = em < ...;` comparisons as either U32 or I32. U32 would read more naturally, but we do I32 because some instruction sets have direct signed comparison but must synthesize an unsigned comparison. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Ic74fe5b3b850f5bb7fd00fd4435bc32b8628eecd Reviewed-on: https://skia-review.googlesource.com/13963 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* test and fix f16<->f32 conversion stagesGravatar Mike Klein2017-04-20
| | | | | | | | | | | | This refactors from_half() and to_half() a bit, totally reimplementing the non-hardware cases to be more clearly correct. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: I439463cf90935c5e8fe2369cbcf45e07f3af62c7 Reviewed-on: https://skia-review.googlesource.com/13921 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* refactor approx_{log2,pow2,powf}Gravatar Mike Klein2017-04-19
| | | | | | | | | | | | - Move to SkJumper_vectors.h - Fold the -127 and +2.774485010. - approx_powf(F,F) instead of approx_powf(F,float) for consistency. - A little layout reformatting. Change-Id: If9cb3d62a097cb6ecf89f157a1dde672c1516371 Reviewed-on: https://skia-review.googlesource.com/13865 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, parametric_{r,g,b,a}Gravatar Mike Klein2017-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've tried a couple of ideas for approx_powf(): 1) accumulate integer powers of x, then 4th roots, then 16th roots 2) continue 1) all the way to 256th roots 3) decompose into pow2 and log2, exploiting IEEE float layout 4) slightly tune constants used in 3) 5) accumulate integer powers of x, then 3+4) with different tuning 6) follow a source online, basically 5 with finesse 7) a new source quoting and improving on the method in 6). 7) seems perfect, enough that maybe we can explore improving its speed at cost of precision. Might be nice to get rid of those divides. If we allow a small tolerance (2-5) in our tests, we could use the very simple fast forms from 3) (e.g. PS 5). I wish I had some images to look at! Anything involving roots seems to be subverted by poor rsqrt precision. This change of course affects the pipelines created by the tests for exponential and full parametric gamma curves. What's less obvious is that it also means SkJumper can now for the first time run the pipeline created by the mixed gamma curves test. This means we now need to relax our tolerance for the table-based channel, just like we did when implementing table_{r,g,b,a}. This took me an embarassingly long time to figure out. *face palm* Change-Id: I451ee3c970a0a4a4e285f8aa8f6ef709a654d247 Reviewed-on: https://skia-review.googlesource.com/13656 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Herb Derby <herb@google.com>
* jumper, table_{r,g,b,a}Gravatar Mike Klein2017-04-17
| | | | | | | | | | | In testing, it didn't really seem like we're getting anything out of doing an interpolated lookup, so this just does a single rounded lookup. Change-Id: If85ba68675945b442076519dd7f1bf7540d1628d Reviewed-on: https://skia-review.googlesource.com/13646 Commit-Queue: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* jumper, u16_be load_tables stagesGravatar Mike Klein2017-04-17
| | | | | | | Change-Id: I738da1dac2ef5b74ef34cca14938c0b6cce2fe16 Reviewed-on: https://skia-review.googlesource.com/13596 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, load_rgb_u16_beGravatar Mike Klein2017-04-17
| | | | | | | | | testing: out/dm --src colorImage --colorImages images/colorspace/ --config srgb Change-Id: I3481e1fb4a070cfd5d95361329fd95af5fcfbd1f Reviewed-on: https://skia-review.googlesource.com/13590 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* add a callback stage to SkRasterPipelineGravatar Mike Klein2017-04-17
| | | | | | | | | | | | | | | | | | This lets us temporarily escape to piece of code outside SkRasterPipeline. We should be able to use this to replace - parametric_{r,g,b,a} - table_{r,g,b,a} - color_lookup_table - shader_adapter* * We want to obsolete shader_adapter for other reasons anyway, but we _could_ replace it with this if we want to. Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e Reviewed-on: https://skia-review.googlesource.com/12102 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* jumper, bilinear and bicubic sampling stagesGravatar Mike Klein2017-04-12
| | | | | | | | | | | | | | | | | | | | This splits SkImageShaderContext into three parts: - SkJumper_GatherCtx: always, already done - SkJumper_SamplerCtx: when bilinear or bicubic - MiscCtx: other little bits (the matrix, paint color, tiling limits) Thanks for the snazzy allocator that allows this Herb! Both SkJumper and SkRasterPipeline_opts.h should be speaking all the same types now. I've copied the comments about bilinear/bicubic to SkJumper with little typo fixes and clarifications. Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6 Reviewed-on: https://skia-review.googlesource.com/13269 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, rgb<->hslGravatar Mike Klein2017-04-12
| | | | | | | | | | | | | | | | | | | | | I've rearranged while porting, I hope making the logic clearer. Exactly one gm is affected, highcontrastfilter. The most interesting line is this, from hsl_to_rgb: F t2 = if_then_else(t < 0.0_f, t + 1.0_f, I had to write 0.0_f (instead of the usual 0) to force Clang to compare against a zero register instead of a 16-byte zero constant in memory. Register pressure is high in hsl_to_rgb, so something must have kicked in to prefer memory over zeroing a register. No big deal. It makes the code read more symmetrically anyway. Change-Id: I1a5ced72216234587760c6f803fb69315d18fae0 Reviewed-on: https://skia-review.googlesource.com/13242 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Add multi-stop SkJumper stage.Gravatar Herb Derby2017-04-10
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I954d02638a785bec284d2fdf8f46abfccd474e7a Reviewed-on: https://skia-review.googlesource.com/10211 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* jumper, byte_tables + byte_tables_rgbGravatar Mike Klein2017-04-07
| | | | | | | | | Factors out a function F from_byte(U8) too. Change-Id: Ib739ccbd509ddf25d2bfb7751ba6eaf51b16c12f Reviewed-on: https://skia-review.googlesource.com/11791 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, gather_f16Gravatar Mike Klein2017-04-07
| | | | | | | | | | Here we use 64-bit gather instructions for HSW, which I think we haven't done before. Change-Id: I7b22b3cc0b7a151952518bb9afb90624ebdb4a22 Reviewed-on: https://skia-review.googlesource.com/11602 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, gather_i8Gravatar Mike Klein2017-04-06
| | | | | | | Change-Id: Iefa8044bac0555c5fff370217a6270b4f3c64300 Reviewed-on: https://skia-review.googlesource.com/11582 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, more gathersGravatar Mike Klein2017-04-06
| | | | | | | | | | This is all the gathers except index 8 and f16, which aren't conceptually hard but I want to land separately. Change-Id: I525f2496e55451041bd6ea07985858fda7b56a40 Reviewed-on: https://skia-review.googlesource.com/11524 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, gather_8888Gravatar Mike Klein2017-04-06
| | | | | | | Change-Id: I70bd64d114a2460534bcb51d356e13d9bc3b8603 Reviewed-on: https://skia-review.googlesource.com/11491 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, add load_f32()Gravatar Mike Klein2017-04-06
| | | | | | | Change-Id: I71d85ffe29bc11678ff1e696fa4a2c93d0b4fcbe Reviewed-on: https://skia-review.googlesource.com/11446 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, kill off F4Gravatar Mike Klein2017-04-06
| | | | | | | | | | | | | | | | Its alignment (sometimes 4, sometimes 16) has proven to be error-prone. This also means we don't really need LazyCtx::load(). I think I only had it there to make sure we were doing unaligned loads of F4; the better way is to just never declare the data as aligned... The generated code isn't quite as good, but I can live with it. Change-Id: I5d57a580ca12c94ca84a5e8b72a66cf8d0c829eb Reviewed-on: https://skia-review.googlesource.com/11406 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, to_2dot2 and from_2dot2Gravatar Mike Klein2017-04-05
| | | | | | | | | Nothing too tricky here. Change-Id: I2a10548efc75a6fd875fcb242790880d9b9a28fd Reviewed-on: https://skia-review.googlesource.com/11388 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* jumper, load_u16_be and store_u16_beGravatar Mike Klein2017-04-05
| | | | | | | Change-Id: I2d58538ab071b217d8dbbf2d802493d9045eabf2 Reviewed-on: https://skia-review.googlesource.com/11384 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>