skia - 2D graphics library

	Commit message (Collapse)	Author	Age
*	Remove sk_linear_to_srgb_noclamp().	Mike Klein	2016-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've just re-noticed this can happen: 1) we have a properly premultiplied linear color; 2) we convert that to sRGB; 3) we convert that back to linear; 4) that color does not appear to be premultiplied. Removing sk_linear_to_srgb_noclamp(), and thus always clamping to [0,1] here in linear space, does not fix this problem. However, it does help keep it from propagating too badly. Just double-checked: the older Sk4f pipeline (SkXfermode4f, SkPM4fPriv, etc) already use sk_linear_to_srgb() exclusively, so they're already doing this same clamp. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4314 Change-Id: I6352eb0ba969eb25674e8441e43bb51e1e1c0df3 Reviewed-on: https://skia-review.googlesource.com/4314 Reviewed-by: Yuqian Li <liyuqian@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Basic pipeline blend mode strength reductions:	Mike Klein	2016-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- when the shader is opaque, srcover becomes src - don't load dst when blitting in src mode with full coverage - fold coverage into src alpha when in srcover mode It's not obvious that we can fold coverage into src alpha when using a 565 mask, so I've not attempted that. What would we do about alpha? Over all GMs this causes a single 1-bit difference in sRGB mode. No 565 or f16 diffs. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4349 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ib9161f498c4efa6b348ca74522166da64d09a7da Reviewed-on: https://skia-review.googlesource.com/4349 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Add Matrix colorfilter pipeline stages.	Mike Klein	2016-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This breaks the color filter down into a couple logical steps: - go to unpremul - apply the 4x5 matrix - clamp to [0,1] - go to premul Because we already have handy premul clamp stages, we swap the order of clamp and premul. This is lossless. While adding our stages to the pipeline, we analyze the matrix to see if we can skip any steps: - we can skip unpremul if the shader is opaque (alphas are all 1 ~~~> we're already unpremul); - we can skip the premul back if the color filter always produces opaque (here, are the inputs opaque and do we keep them that way, but we could also check for an explicit 0 0 0 0 1 alpha row); - we can skip the clamp_0 if the matrix can never produce a value less than 0; - we can skip the clamp_1 if the matrix can never produce a value greater than 1. The only thing that should seem missing is per-pixel alpha checks. We don't do those here, but instead make up for it by operating on 4-8 pixels at a time. We don't split the 4x5 matrix into a 4x4 and 1x4 translate. We could, but when we have FMA (new x86, all ARMv8) we might as well work the translate for free into the FMAs. This makes gm/fadefilter.cpp draw differently in sRGB and F16 modes, bringing them in line with the GPU sRGB and GPU f16 configs. It's unclear to me what was wrong with the old CPU implementation. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4346 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I14082ded8fb8d63354167d9e6b3f8058f840253e Reviewed-on: https://skia-review.googlesource.com/4346 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline: implement SkLumaColorFilter	Mike Klein	2016-11-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After getting discouraged by the non-separable xfermodes, I decided to look at filling out the color filters instead. This one's nice and easy. There's only 1 GM that exercises this color filter, and it's drawing noticeably lighter now in f16 and sRGB configs. 565 is unchanged. This makes me think the diffs are due to lost precision in the previous method, which was going through the default fallback to 8888 filterSpan(). I double checked: the f16 config now draws nearly identically to the gpuf16 config. It used to be quite different. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4183 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ic6feaecae5cf18493b5df89733f6a5ca362e9a75 Reviewed-on: https://skia-review.googlesource.com/4183 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Only clamp when we think our math requires it.	Mike Klein	2016-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we require our inputs are sound, in-gamut, premul colors (a in [0,1], r,g,b in [0,a]) then we should only need to clamp when the math we perform requires it. The safety clamps before each store are paranoia. The main thing this pipeline handles right now that needs clamping is the plus transfermode. This is either used to blend, where the clamp must come after the coverage lerp, or used via a mode color filter, where we have no choice but to clamp right at the end of the color filer. This changes how the mode color filter draws with the plus transfermode. It didn't used to clamp at all. I think this is a bug fix. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4034 Change-Id: I3cbaade2127cc88c8782596f45749c4fe4b0e953 Reviewed-on: https://skia-review.googlesource.com/4034 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Convert SkRasterPipeline loads and stores to indirect.	Mike Klein	2016-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to change the underlying pointer without rebuilding the pipeline, e.g. when moving the blitter from scanline to scanline. The extra overhead when not needed is measurable but small, <2%. We can always add back direct stages later for cases where we know the context pointer will not change. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3943 Change-Id: I827d7e6e4e67d02dd2802610f898f98c5f36f8cb Reviewed-on: https://skia-review.googlesource.com/3943 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline::compile().	Mike Klein	2016-10-25
\| \| \| \| \| \| \| \| \| \| \|	I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Move SkRasterPipeline further into SkOpts.	Mike Klein	2016-10-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The portable code now becomes entirely focused on enum+ptr descriptions, leaving the concrete implementation of the pipeline to SkOpts::run_pipeline(). As implemented, the concrete implementation is basically the same, with a little more type safety. Speed is essentially unchanged on my laptop, and that's having run_pipeline() rebuild its concrete state every call. There's room for improvement there if we split this into a compile_pipeline() / run_pipeline() sort of thing, which is my next planned CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3920 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ie4c554f51040426de7c5c144afa5d9d9d8938012 Reviewed-on: https://skia-review.googlesource.com/3920 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Zero tail stack buffers.	Mike Klein	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MSAN and the no-SIMD bots have noticed the top lanes of the tail vectors are not initialized. As they were written it was faster to leave them unintialized, but as re-written here it's equal speed and now safe. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3790 Change-Id: Icd41ba14ae6baf9947eb361a366f1ce19ad8aa67 Reviewed-on: https://skia-review.googlesource.com/3790 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline_opts: split next() into next_body() and next_tail().	Mike Klein	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This may work around an Chrome/MSVC/PGO compiler crasher. Even if not, it's harmless for performance, and arguably more readable. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3789 Change-Id: I0bf23f65d7832b9f43e275f85e7985fcd6b13b9f Reviewed-on: https://skia-review.googlesource.com/3789 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline: memcpy-free tail code.	Mike Klein	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	We don't call the tail code nearly as often as the body code, but when we do and call memcpy(), we first have to vzeroupper back into the non-AVX world. That does seem to slow things down considerably. You wouldn't think it, but this gives a nice speed up (tested on Windows). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3783 Change-Id: I40cbe1e529f2431825edec7638265601b64e7ec5 Reviewed-on: https://skia-review.googlesource.com/3783 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline refactor	Mike Klein	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Give body and tail functions separate types. This frees a register in body functions, especially important for Windows. - Fill out default, SSE4.1, and HSW versions of all functions. This means we don't have to mess around with SkNf_abi... all functions come from the same compilation unit where SkNf is a single consistent type. - Move Stage::next() into SkRasterPipeline_opts.h as a static inline function. - Remove Stage::ctx() entirely... fCtx is literally the same thing. This is a step along the way toward building the entire pipeline in src/opts, removing the need for all the stages to be functions living in SkOpts. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3680 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot Change-Id: I7de78ffebc15b9bad4eda187c9f50369cd7e5e42 Reviewed-on: https://skia-review.googlesource.com/3680 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	More explicit body/tail function versioning.	Mike Klein	2016-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MSVC was not picking up that tail==0 in the non-_tail versions of the kernel functions in SkRasterPipeline_opts.h. This passes through a template bool parameter to convey the same message. This makes the body code a bit smaller and faster on MSVC now by removing the tail>0 check and code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3669 Change-Id: I8bf81717a83f216eb1eb28a75dac41779dc508c1 Reviewed-on: https://skia-review.googlesource.com/3669 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	SkNx_abi for passing Sk4f as function arguments, etc.	Mike Klein	2016-10-15
\| \| \| \| \| \| \| \| \| \| \| \|	CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422 Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7 Reviewed-on: https://skia-review.googlesource.com/3422 Reviewed-by: Mike Klein <mtklein@chromium.org>
*	Add SkRasterPipeline support to SkModeColorFilter.	Mike Klein	2016-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The shader leaves its color in r,g,b,a, so to implement this color filter, we move {r,g,b,a} into {dr,dg,db,da}, then load the filter's color in {r,g,b,a}, then apply the xfermode as usual. I've left a note about how we could sometimes cut a stage for some xfermodes. Similarly we really only need to move_src_dst instead of swap_src_dst, but it seemed handy and less error prone to do a full two way swap. As usual, we can always circle back and fine-tune these things if we want. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3243 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I928c0fb25236eb75cf238134c6bebb53af5ddf07 Reviewed-on: https://skia-review.googlesource.com/3243 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
*	SkRasterPipeline: 8x pipelines, without any 8x code enabled.	Mike Klein	2016-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Revert "SkRasterPipeline: 8x pipelines, attempt 2"	Mike Klein	2016-10-10
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline: 8x pipelines, attempt 2	Mike Klein	2016-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Revert "SkRasterPipeline: 8x pipelines"	Mike Klein	2016-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
*	SkRasterPipeline: 8x pipelines	Mike Klein	2016-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Make load4 and store4 part of SkNx properly.	Mike Klein	2016-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every type now nominally has Load4() and Store4() methods. The ones that we use are implemented. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd Reviewed-on: https://skia-review.googlesource.com/3046 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Make all SkRasterPipeline stages stock stages in SkOpts.	Mike Klein	2016-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>
*	sk_linear_from_srgb_math	Mike Klein	2016-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looks great (imperceptibly different) but ~10% slower on both ARMv8 and x86-64. Probably need to hide the table-or-math logic behind Sk4f/Sk8f unless we find faster math. I do like the new look of the pipeline stages though. A lot clearer. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2880 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I44952237d56ba167445b07d4830eb8959c4d47b7 Reviewed-on: https://skia-review.googlesource.com/2880 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
*	Add an enum layer of indirection for stock raster pipeline stages.	Mike Klein	2016-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is handy now, and becomes necessary with fancier backends: - most code can't speak the type of AVX pipeline stages, so indirection's definitely needed there; - if the pipleine is entirely composed of stock stages, these enum values become an abstract recipe that can be JITted. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342 Reviewed-on: https://skia-review.googlesource.com/2782 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
*	Start moving SkRasterPipeline stages to SkOpts.	Mike Klein	2016-09-29
	This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>