aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkRasterPipeline.h
Commit message (Collapse)AuthorAge
* bicubic, attempt gazillionGravatar Mike Klein2016-12-09
| | | | | | | | | | | | | - explicitly separate bilinear_ stages in x and y too BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ib7b4f9d26ea6abe9171068e92424479d811ee606 Reviewed-on: https://skia-review.googlesource.com/5636 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Refactor bilerp a little.Gravatar Mike Klein2016-12-06
| | | | | | | | | | | | | | | | | | | 1) rename to bilerp_xy, for x,y in {n[egative], p[ositive}; 2) pull out a save_xy stage to save off the original x,y; 3) also calculate the fractional x,y fx,fy once instead of 4 times. 1) is a pure refactor; 2) adds a stage but otherwise is nothing different; 3) changes images a little bit (fractional parts can vary a bit around powers of two). This extends naturally to naive bicubic using 16 bicubic_xy stages. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I666de5c21e978abb4feb6e3225e5b5920ba6c5b9 Reviewed-on: https://skia-review.googlesource.com/5550 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* remove upper limit on number of pipeline stagesGravatar Mike Klein2016-12-06
| | | | | | | | | | | Bicubic is going to blow right past 48. At this point the fixed preallocation strategy is starting to look naive... at 64 we'd allocate just over 1K for every pipeline (and every compiled pipeline). CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ib2944ead1217123aba2b6347fd9d5315217540c9 Reviewed-on: https://skia-review.googlesource.com/5551 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Reland "Add RasterPipeline implementation for SkColorSpaceXform"Gravatar Matt Sarett2016-12-01
| | | | | | | | | | | | | | | | | | | This is initially turned on for Linux debug builds, which allows us to start testing. Chrome for Android is a really good candidate for this (will appreciate the code size savings), but I'd first like to run some tests to understand the performance/size tradeoffs a little better. BUG:660416 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ifc80e663767df6bb767abb8b12b1ec5cec644ec5 Reviewed-on: https://skia-review.googlesource.com/5452 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Added CMYK support for ICC profiles.Gravatar raftias2016-12-01
| | | | | | | | | | | | | | Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow parsing and color xforming of ICC CMYK images. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I11e3d17180244281be3eb43fd608609925a7f71e Reviewed-on: https://skia-review.googlesource.com/5444 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Add RasterPipeline implementation for SkColorSpaceXform"Gravatar Brian Osman2016-12-01
| | | | | | | | | | | This reverts commit dd19ac7d10c7c00dd6e9b1f4c4c6aae729c7e6d4. Reason for revert: ASAN Change-Id: I59aacc092398c4db40696a8343d657a5ad7c0f66 Reviewed-on: https://skia-review.googlesource.com/5448 Commit-Queue: Brian Osman <brianosman@google.com> Reviewed-by: Brian Osman <brianosman@google.com>
* Add RasterPipeline implementation for SkColorSpaceXformGravatar Matt Sarett2016-12-01
| | | | | | | | | | | | | | | | | | | This is initially turned on for Linux debug builds, which allows us to start testing. Chrome for Android is a really good candidate for this (will appreciate the code size savings), but I'd first like to run some tests to understand the performance/size tradeoffs a little better. BUG:660416 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I0fb2512216dfc0bda2e5388f9865318eec22291e Reviewed-on: https://skia-review.googlesource.com/5348 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "Added CMYK support for ICC profiles."Gravatar Mike Klein2016-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 51c3fcd376c5c9972d9476b5532f6164375a38d1. Reason for revert: ASAN, MSAN both take issue with parse_and_load_gamma() Original change's description: > Added CMYK support for ICC profiles. > > Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel > inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec > (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow > parsing and color xforming of ICC CMYK images. > > BUG=skia: > > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197 > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa > Reviewed-on: https://skia-review.googlesource.com/5197 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Matt Sarett <msarett@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Leon Scroggins <scroggo@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,brianosman@google.com,raftias@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ib43fef00bc233c0b4fa47ed29040d69601def267 Reviewed-on: https://skia-review.googlesource.com/5423 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Added CMYK support for ICC profiles.Gravatar raftias2016-12-01
| | | | | | | | | | | | | | | | | | | | Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow parsing and color xforming of ICC CMYK images. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa Reviewed-on: https://skia-review.googlesource.com/5197 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com>
* constant means constantGravatar Mike Klein2016-11-30
| | | | | | | | | | | Strip all the "constant" verbiage out of stages that really just mean 1, single, scalar. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3d71202b348fadc3ced8ecb6c18c939cf92d7243 Reviewed-on: https://skia-review.googlesource.com/5396 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Bring back SkRasterPipeline::run() for one-off uses.Gravatar Mike Klein2016-11-30
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984 Reviewed-on: https://skia-review.googlesource.com/5353 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* support a8Gravatar Mike Klein2016-11-29
| | | | | | | | | | | Most of this is plumbing through the full paint to shaders instead of just the filter quality. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I6afde07566afa3a4391c24dca7017a9a4f5ec700 Reviewed-on: https://skia-review.googlesource.com/5317 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Consistent naming.Gravatar Mike Klein2016-11-29
| | | | | | | | | | | | | | | For stages that have {r,g,b,a} and {dr,dg,db,da} versions, name the {r,g,b,a} one "foo" and the {dr,dg,db,da} on "foo_d". The {r,g,b,a} registers are the ones most commonly used and fastest, so they get short ordinary names, and the d-registers are less commonly used and sometimes slower, so they get a suffix. Some stages naturally opearate on all 8 registers (the xfermodes, accumulate). These names for those look fine and aren't ambiguous. Also, a bit more re-arrangement in _opts.h. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia20029247642798a60a2566e8a26b84ed101dbd0 Reviewed-on: https://skia-review.googlesource.com/5291 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Convert blitter over to new style from_srgb, to_srgb.Gravatar Mike Klein2016-11-28
| | | | | | | | | | | Every sRGB GM changes, none noticeably. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I632845aea0f40751639cccbcfde8fa270cae0301 Reviewed-on: https://skia-review.googlesource.com/5275 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Split srgb out of accum stages.Gravatar Mike Klein2016-11-28
| | | | | | | | | | | | | | | | | | By stashing the scales in the context (i.e. on the stack), we can free up enough registers to really simplify how the bitmap sample stages interact. Nearest neighbor is straightforward now: just call the appropriate gather_ function, and you're done. The source pixels end up in the source registers. If they're sRGB encoded, follow up with from_srgb_s To bilerp, we bracket those 1 or 2 gather+from_srgb_s stages with a stage setting up each corner (x += dx, y += dy, save off scale) and a stage that accumulates into the d-registers (load saved scale, dr += scale * r, etc.). When all the samples are accumulated, copy the d-registers into the s-registers. from_srgb_d and to_srgb are lightly sketched here and will be used in the next CL, where I apply this same factoring to non-bitmap loads and stores. This is a little tricky, because we don't actually have a float->float to_srgb yet. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I272a1f278f0ea1b29a2f07ac225f753faa8dae81 Reviewed-on: https://skia-review.googlesource.com/5271 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Some simple pipeline refactoring.Gravatar Mike Klein2016-11-28
| | | | | | | | | | | | | | | | | | This is a batch of little tweaks that all preserve the existing logical behavior: - rename dst to move_dst_src to parallel move_src_dst - remove unused swap_src_dst - move swap_rb up with the other utility stages - factor out from_8888() to parallel from_565() and from_4444() - factor out gather() from the accum_* stages This changes the order of the math in accum_8888[_srgb] ever so slightly, from (scale * C) * (1/255.0f) to scale * (1/255.0f * C). It causes a few pixel diffs, but nothing noticeable. This makes the 8888 bilerp logic consistent with the other formats, which all convert to [0,1] float first before being scaled. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id37857b91be3086565169dcc9b1a537574e532aa Reviewed-on: https://skia-review.googlesource.com/5226 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Support sRGB 565.Gravatar Mike Klein2016-11-22
| | | | | | | | | | | | | | | It looks like I'm not going to be able to avoid supporting sRGB G8, I8, 565, 4444, 8888. (A8 and F16 will always be linear.) This fixes 565, and lays out the rest of the accum_*. I did a little reorganization to keep things in ascending bit depth, just for sanity. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5145 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ib0508e5a4ee1bab2044a76bcabc367841d634cd2 Reviewed-on: https://skia-review.googlesource.com/5145 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* bilerpGravatar Mike Klein2016-11-22
| | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5107 Change-Id: I5c30105501cbdb57896d9ec35737494eabd5998b Reviewed-on: https://skia-review.googlesource.com/5107 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Rearrange NN sampling to more naturally support bilerp.Gravatar Mike Klein2016-11-21
| | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5105 Change-Id: Ic692b5faf2d33fee31b119ff8d3653118b25b7c2 Reviewed-on: https://skia-review.googlesource.com/5105 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* perspective matrixGravatar Mike Klein2016-11-17
| | | | | | | | | | | | Nothing too tricky. The path of least resistance was to keep the matrix row-major when in perspective. It should make no difference in the end. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4983 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I48bb9de0265e7873c465874cc37076a8111f5ea1 Reviewed-on: https://skia-review.googlesource.com/4983 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Support SkImageShader in SkRasterPipeline blitterGravatar Mike Klein2016-11-17
| | | | | | | | | | | | | | First of many CLs, I'm sure. This handles 8888 or sRGB sources with an affine matrix, clamp/clamp tiling, and nearest-neighbor sampling only. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4906 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I99f7508852b3d44b6f52f7a0bee29a793af35c48 Reviewed-on: https://skia-review.googlesource.com/4906 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Hook into parametric and table raster pipeline stagesGravatar Matt Sarett2016-11-16
| | | | | | | | | | | | BUG:664864 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4913 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I909152f1abba60803f0ce2f970eec1f8f1816d78 Reviewed-on: https://skia-review.googlesource.com/4913 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Add trace and registers stages.Gravatar Mike Klein2016-11-16
| | | | | | | | | | | | | | | Yet more debugging tools. TBR=herb@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4908 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I92e2e6b17abfc32c8d3554e71dbf95abadbb3824 Reviewed-on: https://skia-review.googlesource.com/4908 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* add {parametric,table}_{r,g,b} stages.Gravatar Mike Klein2016-11-16
| | | | | | | | | | | | | | Think you can take over from here, hook these into SkColorSpaceXform_A2B, and remove the need for fn_1_{r,g,b}? BUG=skia:664864 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4888 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: If573fa2861f32f201f4db28598559290b1eef812 Reviewed-on: https://skia-review.googlesource.com/4888 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Add SkRasterPipeline::dump().Gravatar Mike Klein2016-11-15
| | | | | | | | | | | | | | Entirely for debugging. TBR=herb@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4871 Change-Id: I6d6972c40b11854441f566c12516a2ec8c75c78f Reviewed-on: https://skia-review.googlesource.com/4871 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* More shader preliminaries / refactoringGravatar Mike Klein2016-11-15
| | | | | | | | | | | | | | | | - thread through ctm - make blitter handle paint modulation instead of each shader TBR=herb@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4830 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I8161e6b3864c4e48e4d47d5ad40a56a13c02fee8 Reviewed-on: https://skia-review.googlesource.com/4830 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Start each pipeline with (x,y) in (dr,dg) registers for the shader.Gravatar Mike Klein2016-11-15
| | | | | | | | | | | | | | | | | | | | | Image shaders need to do some geometry work before sampling the image colors: 1) determine dst coordinates 2) map back to src coordinates 3) tiling Feeding (x,y) through as (dr,dg) registers makes step 1) easy, perhaps trivial, while leaving (r,g,b,a) with their usual meanings, "the color", starting with the paint color. This is easy to tweak into something like (x+0.5, y+0.5, 1) in (dr,dg,db) once this lands. Mostly I just want to get all the uninteresting boilerplate out of the way first. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4791 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ia07815d942ded6672dc1df785caf80a508fc8f37 Reviewed-on: https://skia-review.googlesource.com/4791 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Initial implementation of a SkColorSpace_A2B xformGravatar raftias2016-11-11
| | | | | | | | | | | | | | | | There is support for all features of SkColorSpace_A2B. Tests for these functionality were adapted from the XYZ xform, plus a CLUT-specific test was added. Shared functions used by both SkColorSpaceXform_XYZ and SkColorSpaceXform_A2B have been moved into a shared header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2449243003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2449243003
* swap_src_dst -> move_src_dstGravatar Mike Klein2016-11-04
| | | | | | | | | | | | | | | | | | | We thought it'd be handy to have swap_src_dst so that it could be used either to move dst into src or src into dst. Turns out, we already have a stage that moves dst into src (called "dst", the dst blend mode), so there's really no reason to have swap_src_dst over the strictly more efficient move_src_dst. swap_src_dst is typically 12 register moves, where move_src_dst is 4. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4461 Change-Id: Ib453775f854e313f823851978eaadc3995872312 Reviewed-on: https://skia-review.googlesource.com/4461 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* skrpb: evaluate color filters for constant shaders once.Gravatar mtklein2016-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | The simplest thing to do here is just run shader+color filter pipeline at construction time to create a new constant color shader (replacing the paint color). This reduces a pipeline like: - constant_color (paint color) - matrix_4x5 - clamp_a - load_d_foo, xfermode, lerp, store_foo to - constant_color (paint color -> matrix_4x5 -> clamp_a) - load_d_foo, xfermode, lerp, store_foo To implement this all, we add a new store_f32 stage that writes SkPM4f, and finally get around to implementing Sk8f::Store4() (store while reinterlacing). Sk4f::Store4() already exists for both SSE and NEON. Next step: reduce simple constant_color -> store pipelines (src mode, full coverage) into non-pipeline memsets. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2480823002 Review-Url: https://codereview.chromium.org/2480823002
* Basic pipeline blend mode strength reductions:Gravatar Mike Klein2016-11-03
| | | | | | | | | | | | | | | | | | | | - when the shader is opaque, srcover becomes src - don't load dst when blitting in src mode with full coverage - fold coverage into src alpha when in srcover mode It's not obvious that we can fold coverage into src alpha when using a 565 mask, so I've not attempted that. What would we do about alpha? Over all GMs this causes a single 1-bit difference in sRGB mode. No 565 or f16 diffs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4349 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ib9161f498c4efa6b348ca74522166da64d09a7da Reviewed-on: https://skia-review.googlesource.com/4349 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add Matrix colorfilter pipeline stages.Gravatar Mike Klein2016-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This breaks the color filter down into a couple logical steps: - go to unpremul - apply the 4x5 matrix - clamp to [0,1] - go to premul Because we already have handy premul clamp stages, we swap the order of clamp and premul. This is lossless. While adding our stages to the pipeline, we analyze the matrix to see if we can skip any steps: - we can skip unpremul if the shader is opaque (alphas are all 1 ~~~> we're already unpremul); - we can skip the premul back if the color filter always produces opaque (here, are the inputs opaque and do we keep them that way, but we could also check for an explicit 0 0 0 0 1 alpha row); - we can skip the clamp_0 if the matrix can never produce a value less than 0; - we can skip the clamp_1 if the matrix can never produce a value greater than 1. The only thing that should seem missing is per-pixel alpha checks. We don't do those here, but instead make up for it by operating on 4-8 pixels at a time. We don't split the 4x5 matrix into a 4x4 and 1x4 translate. We could, but when we have FMA (new x86, all ARMv8) we might as well work the translate for free into the FMAs. This makes gm/fadefilter.cpp draw differently in sRGB and F16 modes, bringing them in line with the GPU sRGB and GPU f16 configs. It's unclear to me what was wrong with the old CPU implementation. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4346 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I14082ded8fb8d63354167d9e6b3f8058f840253e Reviewed-on: https://skia-review.googlesource.com/4346 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: implement SkLumaColorFilterGravatar Mike Klein2016-11-01
| | | | | | | | | | | | | | | | | | After getting discouraged by the non-separable xfermodes, I decided to look at filling out the color filters instead. This one's nice and easy. There's only 1 GM that exercises this color filter, and it's drawing noticeably lighter now in f16 and sRGB configs. 565 is unchanged. This makes me think the diffs are due to lost precision in the previous method, which was going through the default fallback to 8888 filterSpan(). I double checked: the f16 config now draws nearly identically to the gpuf16 config. It used to be quite different. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4183 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ic6feaecae5cf18493b5df89733f6a5ca362e9a75 Reviewed-on: https://skia-review.googlesource.com/4183 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Only clamp when we think our math requires it.Gravatar Mike Klein2016-10-27
| | | | | | | | | | | | | | | | | | If we require our inputs are sound, in-gamut, premul colors (a in [0,1], r,g,b in [0,a]) then we should only need to clamp when the math we perform requires it. The safety clamps before each store are paranoia. The main thing this pipeline handles right now that needs clamping is the plus transfermode. This is either used to blend, where the clamp must come after the coverage lerp, or used via a mode color filter, where we have no choice but to clamp right at the end of the color filer. This changes how the mode color filter draws with the plus transfermode. It didn't used to clamp at all. I think this is a bug fix. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4034 Change-Id: I3cbaade2127cc88c8782596f45749c4fe4b0e953 Reviewed-on: https://skia-review.googlesource.com/4034 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline::compile().Gravatar Mike Klein2016-10-25
| | | | | | | | | | | I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Move SkRasterPipeline further into SkOpts.Gravatar Mike Klein2016-10-25
| | | | | | | | | | | | | | | | | | The portable code now becomes entirely focused on enum+ptr descriptions, leaving the concrete implementation of the pipeline to SkOpts::run_pipeline(). As implemented, the concrete implementation is basically the same, with a little more type safety. Speed is essentially unchanged on my laptop, and that's having run_pipeline() rebuild its concrete state every call. There's room for improvement there if we split this into a compile_pipeline() / run_pipeline() sort of thing, which is my next planned CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3920 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ie4c554f51040426de7c5c144afa5d9d9d8938012 Reviewed-on: https://skia-review.googlesource.com/3920 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline refactorGravatar Mike Klein2016-10-20
| | | | | | | | | | | | | | | | | | | | | | - Give body and tail functions separate types. This frees a register in body functions, especially important for Windows. - Fill out default, SSE4.1, and HSW versions of all functions. This means we don't have to mess around with SkNf_abi... all functions come from the same compilation unit where SkNf is a single consistent type. - Move Stage::next() into SkRasterPipeline_opts.h as a static inline function. - Remove Stage::ctx() entirely... fCtx is literally the same thing. This is a step along the way toward building the entire pipeline in src/opts, removing the need for all the stages to be functions living in SkOpts. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3680 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot Change-Id: I7de78ffebc15b9bad4eda187c9f50369cd7e5e42 Reviewed-on: https://skia-review.googlesource.com/3680 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkNx_abi for passing Sk4f as function arguments, etc.Gravatar Mike Klein2016-10-15
| | | | | | | | | | | | CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422 Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7 Reviewed-on: https://skia-review.googlesource.com/3422 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Add SkRasterPipeline support to SkModeColorFilter.Gravatar Mike Klein2016-10-12
| | | | | | | | | | | | | | | | The shader leaves its color in r,g,b,a, so to implement this color filter, we move {r,g,b,a} into {dr,dg,db,da}, then load the filter's color in {r,g,b,a}, then apply the xfermode as usual. I've left a note about how we could sometimes cut a stage for some xfermodes. Similarly we really only need to move_src_dst instead of swap_src_dst, but it seemed handy and less error prone to do a full two way swap. As usual, we can always circle back and fine-tune these things if we want. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3243 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I928c0fb25236eb75cf238134c6bebb53af5ddf07 Reviewed-on: https://skia-review.googlesource.com/3243 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* SkRasterPipeline: 8x pipelines, without any 8x code enabled.Gravatar Mike Klein2016-10-12
| | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines, attempt 2"Gravatar Mike Klein2016-10-10
| | | | | | | | | | | | This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, attempt 2Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines"Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make test lower-level, make const_cast more visible.Gravatar Mike Klein2016-10-05
| | | | | | | | | | | | | | | | | | I can only think there's something funky going on with the hidden const_cast inside SkRasterPipeline.cpp, or with the Sk4h -> Sk4f conversion. So make the const_cast visible and write the test directly in halfs instead of floats. CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-GN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3003 Change-Id: I3a7e19ae165fce44f177b4968a63ec04e25c4b93 Reviewed-on: https://skia-review.googlesource.com/3003 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make all SkRasterPipeline stages stock stages in SkOpts.Gravatar Mike Klein2016-10-04
| | | | | | | | | | | | | | | | | If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>
* Add an enum layer of indirection for stock raster pipeline stages.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Start moving SkRasterPipeline stages to SkOpts.Gravatar Mike Klein2016-09-29
| | | | | | | | | | | | | | | | | | | | This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Rearrange SkRasterPipeline scanline tail handling.Gravatar Mike Klein2016-09-28
| | | | | | | | | | | | | | | | | | | We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels. This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step. This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy. For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717 Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f Reviewed-on: https://skia-review.googlesource.com/2717 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: add last() and docs.Gravatar Mike Klein2016-09-27
| | | | | | | | | | | | | | | | | | | Today if you use the simple SK_RASTER_STAGE interface to build a pipeline, each stage you add calls into a next stage. The last stage you add calls into a special backstop stage JustReturn that, well, just returns, ending the pipeline. This adds last(), which cuts that last stage off the pipeline. Instead, the stage you add using last() returns directly, ending the pipeline itself without jumping into JustReturn. This reduces the overhead of using the pipelined version of SkRasterPipelineBench from ~25% to ~20% on my desktop. Also, add docs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2713 Change-Id: I11469378e2765c6e34db52eb3eef648d6612da3f Reviewed-on: https://skia-review.googlesource.com/2713 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>