aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* simplify sse41::srcover_srgb_srgbGravatar Mike Klein2017-05-30
| | | | | | | | Most importantly, remove the undefined behavior implied by "delta". Change-Id: I8f9740804ec74dd40b049eafd4f0d51b36ce3237 Reviewed-on: https://skia-review.googlesource.com/18140 Reviewed-by: Herb Derby <herb@google.com>
* remove sse2::srcover_srgb_srgbGravatar Mike Klein2017-05-30
| | | | | | | Change-Id: Icc570d8a8f1df1dea202e1d234433491122b9b67 Reviewed-on: https://skia-review.googlesource.com/18135 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Faster and more accurate blit_row_s32a_opaque for ARMGravatar Matteo Franchin2017-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change ARM implementation of alpha blending to work on 8 pixels at a time (using NEON). Also improve the accuracy of alpha blending by using a formula based on SkMulDiv255Round rather than SkPMSrcOver. Note that a number of variations of this code were considered. Here are some notes: - A 16 pixels at a time version was considered. This performs well for the case of extreme alpha (all-opaque or all-transparent pixels), but performs worst than the 8 pixels version when there are frequent transitions of alpha. Also gcc 6.2.1 seems to have troubles with register pressure when using this version. - If the branch to detect the fully-opaque or fully-transparent cases is removed, then the performance increases significantly for images which are all partially transparent (especially on ARM Cortex A72), but can significantly decrease for images that are almost fully opaque or fully transparent. This implementation is a compromise to the effects described above. This patch produces a ~10% improvement on the nanobench's sub-scores repeatTile_BGRA_8888_A, constXTile_MM_filter_trans, constXTile_CC_trans, constXTile_RR_filter_trans when running on ARM Cortex A72. Improvements of greater magnitude (20% to 30%) are observed when running on ARM Cortex A53. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I1f0c9f549057613bbffd26e6651f3beeb0019af9 Bug: skia: Reviewed-on: https://skia-review.googlesource.com/16520 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* move sk_memset?? to SkOptsGravatar Mike Klein2017-05-23
| | | | | | | | | This lets the compiler generate AVX versions with wider writes. Change-Id: Ia63825e70c72bdb4d14bef97d8b4ea4be54c9d84 Reviewed-on: https://skia-review.googlesource.com/17715 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove old 565 destination optsGravatar Mike Klein2017-05-06
| | | | | | | | | | This is not an important format, and the code is dead or close to it. The code is an occasional maintenance burden so I'd like it gone. Change-Id: I4ad921533abf3211e6a81e6e475b848795eea060 Reviewed-on: https://skia-review.googlesource.com/15600 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* CRC32 no longer restricted to ARM64Gravatar Amaury Le Leyzour2017-05-04
| | | | | | | | | | | | On a simple benchmark the CRC32 version is about 3x faster on ARM Cortex A57 (Aarch32) than the Murmur3 scalar version. BUG=skia: Change-Id: I71515e8463a33924998b837ff9f32202690dd2fe Reviewed-on: https://skia-review.googlesource.com/15480 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fix new IT blocks ARMv8Gravatar Amaury Le Leyzour2017-04-26
| | | | | | | | | | | | | | | | | | ARMv8 specifies that an IT block should be followed by only one 16-bit instruction. * SkFloatToFix is back to a C implementation that mirrors the assembly code. * S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let the compiler choose what is best in the context of the IT block. And replaced 'keep_dst' by 'ip' where low register or high register does not matter. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: If587110a0c74b637ae99460419d46cf969c694fc Reviewed-on: https://skia-review.googlesource.com/9346 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* long live SkJumperGravatar Mike Klein2017-04-21
| | | | | | | | Change-Id: I5de3c8daae80e437b3553ab6afcee7120a1bb775 Reviewed-on: https://skia-review.googlesource.com/14038 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* kill off shader_adapterGravatar Mike Klein2017-04-21
| | | | | | | | | | | | | | | I still plan to replace this more thoroughly with a different blitter, but for now just implement it using callback. This is the last stage not supported by SkJumper! Will follow up by removing all of SkRasterPipeline_opts.h and anything that indicates SkJumper might not work. Change-Id: I96ba2bb0a26266f3b658e5f3153ec7d5bbd46799 Reviewed-on: https://skia-review.googlesource.com/14037 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, rework callback a bit, use it for color_lookup_tableGravatar Mike Klein2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | Looks like the color-space images have this well tested (even without lab_to_xyz) and the diffs look like rounding/FMA. The old plan to keep loads and stores outside callback was: 1) awkward, with too many pointers and pointers to pointers to track 2) misguided... load and store stages march ahead by x, working at ptr+0, ptr+8, ptr+16, etc. while callback always wants to be working at the same spot in the buffer. I spent a frustrating day in lldb to understood 2). :/ So now the stage always store4's its pixels to a buffer in the context before the callback, and when the callback returns it load4's them back from a pointer in the context, defaulting to that same buffer. Instead of passing a void* into the callback, we pass the context itself. This lets us subclass the context and add our own data... C-compatible object-oriented programming. Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43 Reviewed-on: https://skia-review.googlesource.com/13985 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* add a callback stage to SkRasterPipelineGravatar Mike Klein2017-04-17
| | | | | | | | | | | | | | | | | | This lets us temporarily escape to piece of code outside SkRasterPipeline. We should be able to use this to replace - parametric_{r,g,b,a} - table_{r,g,b,a} - color_lookup_table - shader_adapter* * We want to obsolete shader_adapter for other reasons anyway, but we _could_ replace it with this if we want to. Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e Reviewed-on: https://skia-review.googlesource.com/12102 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* jumper, bilinear and bicubic sampling stagesGravatar Mike Klein2017-04-12
| | | | | | | | | | | | | | | | | | | | This splits SkImageShaderContext into three parts: - SkJumper_GatherCtx: always, already done - SkJumper_SamplerCtx: when bilinear or bicubic - MiscCtx: other little bits (the matrix, paint color, tiling limits) Thanks for the snazzy allocator that allows this Herb! Both SkJumper and SkRasterPipeline_opts.h should be speaking all the same types now. I've copied the comments about bilinear/bicubic to SkJumper with little typo fixes and clarifications. Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6 Reviewed-on: https://skia-review.googlesource.com/13269 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make all gather_*() use SkJumper_GatherCtxGravatar Mike Klein2017-04-12
| | | | | | | | | | | SkJumper_GatherCtx is a prefix of SkImageShaderContext, so this is a no-op. It helps to keep things straight, and I do want to split apart the GatherCtx from a new SamplingCtx. Change-Id: I9c5f436b096624c2809e1f810e9bcd6c6b00b883 Reviewed-on: https://skia-review.googlesource.com/13264 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove SkNx AVX codeGravatar Mike Klein2017-04-11
| | | | | | | | | | | | | | We can't realistically use AVX and SkNx together because of ODR problems, so remove the code that may tempt us to try. Remaining code paths using AVX: - one intrinsics-only routine in SkOpts_hsw.cpp - SkJumper Change-Id: I0d2d03b47ea4a0eec27f2de2b28a4c3d1ff8376f Reviewed-on: https://skia-review.googlesource.com/13121 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* fix -Fast botGravatar Mike Klein2017-04-10
| | | | | | | | | | | N=8 on that bot. CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Release-Fast Change-Id: If54ae800b50d9dffb9f983b23ff6f522657943b1 Reviewed-on: https://skia-review.googlesource.com/13061 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add multi-stop SkJumper stage.Gravatar Herb Derby2017-04-10
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I954d02638a785bec284d2fdf8f46abfccd474e7a Reviewed-on: https://skia-review.googlesource.com/10211 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* tweaks to make gather_* easier in SkJumperGravatar Mike Klein2017-04-06
| | | | | | | | | | | | | | This moves all the values that gather_8888, gather_a8, etc. need to the front of SkImageShaderContext, and dereferences the color table. This should be a no-op, but will make these stages easier to write in SkJumper. Change-Id: I0dff97d5113d14e941e7b717cd85f0036764eb88 Reviewed-on: https://skia-review.googlesource.com/11492 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove trace and registers stagesGravatar Mike Klein2017-04-05
| | | | | | | | | These can't really be done with SkJumper. Change-Id: Ic357f00695eacd2766f6dfb9a3be13b0c07c3650 Reviewed-on: https://skia-review.googlesource.com/11386 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add support for F32 sources to SkColorSpaceXformGravatar Matt Sarett2017-03-21
| | | | | | | | | | | | | | This also subtlely allows clients to convert between F32 and F16. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ied5f2295fce00c69d8cf85730be899f3f8597915 Reviewed-on: https://skia-review.googlesource.com/9914 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* Remove SK_SUPPORT_LEGACY_BROKEN_LERP supportGravatar Florin Malita2017-03-13
| | | | | | | | | | | | | | Chromium change landed. BUG=chromium:696216 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3e67392b0fdad8c5a3ad256e4f190123dff6c846 Reviewed-on: https://skia-review.googlesource.com/9551 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Revert "Fix new IT blocks ARMv8"Gravatar Mike Klein2017-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 90165c2269bc33ca3d6aaa73d528194daf48da4e. Reason for revert: Skia and Chrome iOS builds broken. ../../third_party/skia/include/private/SkFixed.h:106:41: error: invalid output constraint '+t' in asm asm("vcvt.s32.f32 %0, %0, #16": "+t"(x)); Original change's description: > Fix new IT blocks ARMv8 > > ARMv8 specifies that an IT block should be followed by only one 16-bit instruction. > * SkFloatToFix is back to a C implementation that mirrors the assembly code. > > * S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let > the compiler choose what is best in the context of the IT block. And replaced > 'keep_dst' by 'ip' where low register or high register does not matter. > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: I096759841c972e9300c1d0293bc80d3c3ff2747b > Reviewed-on: https://skia-review.googlesource.com/9340 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,amaury.leleyzour@arm.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Idbcbda88039066153e1c34233d43366ab114fd01 Reviewed-on: https://skia-review.googlesource.com/9332 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fix new IT blocks ARMv8Gravatar Amaury Le Leyzour2017-03-06
| | | | | | | | | | | | | | | | | | ARMv8 specifies that an IT block should be followed by only one 16-bit instruction. * SkFloatToFix is back to a C implementation that mirrors the assembly code. * S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let the compiler choose what is best in the context of the IT block. And replaced 'keep_dst' by 'ip' where low register or high register does not matter. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I096759841c972e9300c1d0293bc80d3c3ff2747b Reviewed-on: https://skia-review.googlesource.com/9340 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Enable legacy premuls in SkColorSpaceXformGravatar Matt Sarett2017-02-22
| | | | | | | | | | | | | | | | | | | | | ***Will allow for simplified Android framework code, they typically want a color correct transform followed by a gamma encoded premul. ***Chrome does the same, so this will make it easier to replace their codecs. ***Will decrease code size. Both types of premuls are moved off the fast path here - one is essentially unused in production and the other is not "encouraged". ***Will actually make the common case faster: sRGB->sRGB means no color xform, just premul in SkSwizzler. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia4ec1d273b6f137151f951d37c0ebf975f6b9a3e Reviewed-on: https://skia-review.googlesource.com/8848 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* Simplify more: remove SkRasterPipeline::compile().Gravatar Mike Klein2017-02-16
| | | | | | | | | | | | It's easier to work on SkJumper if everything funnels through run(). I don't anticipate huge benefit from compile() without JITing, but it's something we can always put back if we find a need. Change-Id: Id5256fd21495e8195cad1924dbad81856416d913 Reviewed-on: https://skia-review.googlesource.com/8468 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Implement SkHighContrastFilterGravatar Dominic Mazzoni2017-02-16
| | | | | | | | | | | | | | | | | This is a color filter to apply several contrast adjustments for users with low vision, including inverting the colors (in either RGB or HSL space), applying gamma correction, converting to grayscale, and increasing the contrast. BUG=skia:6235 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Icb8f3e290932d8bcd9387fb1f39dd20767e15cf6 Reviewed-on: https://skia-review.googlesource.com/7460 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Make raster pipeline support all pixel conversionsGravatar Matt Sarett2017-02-14
| | | | | | | | | | | BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Idc76999d0f5591a567b3976cb9db829c350e4be2 Reviewed-on: https://skia-review.googlesource.com/8304 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* Remove SkTextureCompressor.Gravatar Herb Derby2017-02-10
| | | | | | | | | | | | | | | | | | | This is the ultimate state of what it looks like to remove SkTextureCompressor. This end result will result from the following steps. 1) Remove Skia dep on ktx (done) 2) Move format over to ktx (done) 3) Remove all SkTexture compressor code CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3ad7a6abbea006a3034d95662c652d6db90b86ef Reviewed-on: https://skia-review.googlesource.com/8272 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Robert Phillips <robertphillips@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com>
* Make header files self-sufficientGravatar Hal Canary2017-02-10
| | | | | | | Change-Id: Ice7d761b1023da77e50e5d6aa597964f7d9aa1d8 Reviewed-on: https://skia-review.googlesource.com/8302 Commit-Queue: Hal Canary <halcanary@google.com> Reviewed-by: Mike Reed <reed@google.com>
* Pixel conversion refactors: use raster pipeline for 565 and grayGravatar Matt Sarett2017-02-09
| | | | | | | | | | | | | | | I'm trying not to do too much in one CL. But, in general, I hope to drop (non-performance important/optimized) special cases and use the pipeline. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I724d3982f1467f6232371360b860484f13b1ede8 Reviewed-on: https://skia-review.googlesource.com/8271 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* SkRasterPipeline shader adapterGravatar Florin Malita2017-01-31
| | | | | | | | | | | | | Reland of https://skia-review.googlesource.com/c/7615. (lifted from https://skia-review.googlesource.com/c/7088/) CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I797a2f0ae80209c8637875418e08d2fa03249672 Reviewed-on: https://skia-review.googlesource.com/7731 Commit-Queue: Florin Malita <fmalita@google.com> Reviewed-by: Mike Reed <reed@google.com>
* Revert "SkRasterPipeline shader adapter"Gravatar Florin Malita2017-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6d11ed2951fadc281433606a8edc6774bed39735. Build failure: https://chromium-swarm.appspot.com/task?id=3403da0cdee8c210&refresh=10 ../../../src/core/SkOpts.cpp -o obj/src/core/libskia.SkOpts.o In file included from ../../../src/core/SkOpts.cpp:46: ../../../src/opts/SkRasterPipeline_opts.h:1093:11: error: no member named 'Load4' in '(anonymous namespace)::SkNx<8, float>' SkNf::Load4(buf, &r, &g, &b, &a); ~~~~~~^ 1 error generated. Leak: https://chromium-swarm.appspot.com/task?id=3403df55fd5eaa10&refresh=10 Original change's description: > SkRasterPipeline shader adapter > > (lifted from https://skia-review.googlesource.com/c/7088/) > > R=​mtklein@google.com,herb@google.com,reed@google.com > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: Idddb84069423c5fc535bea0a65a5b21a4d07084d > Reviewed-on: https://skia-review.googlesource.com/7615 > Commit-Queue: Florin Malita <fmalita@chromium.org> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Reed <reed@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,herb@google.com,fmalita@chromium.org,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I68c0be3398bde93cd0842baf25b025c5fe1c3df7 Reviewed-on: https://skia-review.googlesource.com/7730 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>
* SkRasterPipeline shader adapterGravatar Florin Malita2017-01-29
| | | | | | | | | | | | | | (lifted from https://skia-review.googlesource.com/c/7088/) R=mtklein@google.com,herb@google.com,reed@google.com CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Idddb84069423c5fc535bea0a65a5b21a4d07084d Reviewed-on: https://skia-review.googlesource.com/7615 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com>
* Really use vpmaddwd in hsw::convolve_vertical().Gravatar Mike Klein2017-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No pixel diffs. Performance on 8888 looks like an overall win. Before: micros bench 222.41 bitmap_scale_filter_64_256 40.06 bitmap_scale_filter_256_64 8.17 bitmap_scale_filter_90_10 10.32 bitmap_scale_filter_90_30 22.50 bitmap_scale_filter_90_80 1.80 bitmap_scale_filter_90_90 57.51 bitmap_scale_filter_80_90 41.99 bitmap_scale_filter_30_90 31.51 bitmap_scale_filter_10_90 After: micros bench 193.60 bitmap_scale_filter_64_256 46.26 bitmap_scale_filter_256_64 7.81 bitmap_scale_filter_90_10 9.99 bitmap_scale_filter_90_30 22.05 bitmap_scale_filter_90_80 1.96 bitmap_scale_filter_90_90 52.07 bitmap_scale_filter_80_90 37.73 bitmap_scale_filter_30_90 27.63 bitmap_scale_filter_10_90 Change-Id: I2f29366b0fd503176c5af4d825fa524e632da21b Reviewed-on: https://skia-review.googlesource.com/7630 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Graceful degredation for SkOpts_hsw.Gravatar Mike Klein2017-01-26
| | | | | | | | | | | | | __AVX2__ will not be defined if you omit -mavx2. Android does this intentionally for x86 builds. (No mobile CPU supports AVX2 AFAIK.) This should fix the Android roll. Change-Id: Ib94c862641abc11fbb46863afc53bcc049f362ad Reviewed-on: https://skia-review.googlesource.com/7633 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Fix variable names in convolve_vertical().Gravatar Mike Klein2017-01-26
| | | | | | | | | These new names reflect the actual pixels stored in each register. Change-Id: I8e626196cd8bcbef622e4fb87ac3566a79d3573a Reviewed-on: https://skia-review.googlesource.com/7624 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* SkOpts_hsw ODR paranoiaGravatar Mike Klein2017-01-26
| | | | | | | | | | | | | | | | | | I'm warming back up to the idea of very careful use of SkOpts_hsw. But if we're going to do that, we need a strict header discipline. No header can be assumed to be safe without vetting, and most aren't. Today there's only one function defined in SkOpts_hsw, so this CL mostly rewrites that convolve_vertically() to use no headers beyond immintrin.h and stdint.h, both safe. It shared very little code with the others anyway, so we're not losing anything by putting it directly into SkOpts_hsw.cpp. I have also streamlined the implementation considerably to improve maintainability and readability. Change-Id: Ia03daae660e54125a0d2e2988464cfc930349e80 Reviewed-on: https://skia-review.googlesource.com/7611 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Move shader register setup to SkRasterPipelineBlitter.Gravatar Mike Klein2017-01-23
| | | | | | | | | | | | | | | | | | | | | We've been seeding the initial values of our registers to x+0.5,y+0.5, 1,0, 0,0,0,0 (useful values for shaders to start with) in all pipelines. This CL changes that to do so only when blitting, and only when we have a shader. The nicest part of this change is that SkRasterPipeline itself no longer needs to have a concept of y, or what x means. It just marches x through [x,x+n), and the blitter handles y and layers the meaning of "dst x coordinate" onto x. This ought to make SkSplicer a little easier to work with too. dm --src gm --config f16 srgb 565 all draws the same. Change-Id: I69d8c1cc14a06e5dfdd6a7493364f43a18f8dec5 Reviewed-on: https://skia-review.googlesource.com/7353 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: implement support for SkTableColorFilterGravatar Mike Klein2017-01-23
| | | | | | | | | | | | | This adds and uses a byte_tables stage that converts to bytes, looks up in the tables, then converts back to floats. We treat this as color filter as pure math, not considering anything colorspace related: no transfer functions, no gamut to change, etc. Change-Id: If5fefc1bcef61a0fb0ae279002a0dd1547e429ea Reviewed-on: https://skia-review.googlesource.com/7413 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Remove SkColorCubeFilter. It is unused.Gravatar Mike Klein2017-01-21
| | | | | | | Change-Id: Iec5fc759e331de24caea1347f9510917260d379b Reviewed-on: https://skia-review.googlesource.com/7363 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Replace raster pipeline nextafter() calls with SkNu nudgingGravatar Florin Malita2017-01-20
| | | | | | | | | | | | | | | | | | (courtesy of mtklein@) nanobench -m gradient_linear_clamp\$ --config f16 --ms 2000 -q Before: 753.66 After: 658.69 R=mtklein@google.com,herb@google.com CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ie1442da340f6cfc9aef65bec1f114c0e5db89fcb Reviewed-on: https://skia-review.googlesource.com/7351 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Just clamp to limit-epsilon.Gravatar Mike Klein2017-01-20
| | | | | | | | | | | | | I think the -0.5f was an implementation detail of Herb's bilerp that we don't need here. It happened to also be clamping us to something less than limit (limit-0.5), so we do need to replace that with a little nudge to keep us on tile. Change-Id: I4ebd32e0ad38c724a17dc8bc35d9ea228eeeca32 Reviewed-on: https://skia-review.googlesource.com/7338 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* SkRasterPipeline impl for 2-stop linear gradientsGravatar Florin Malita2017-01-20
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia2b630cf9c0826fbfc3342707c005030d0529bbc Reviewed-on: https://skia-review.googlesource.com/7186 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Add F16 support to SkPNGImageEncoderGravatar Matt Sarett2017-01-19
| | | | | | | | | | | | BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ifd221365a7b9f9a4a4fc5382621e0da7189e1148 Reviewed-on: https://skia-review.googlesource.com/6526 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Reland "Respect full precision for RGB16 PNGs" (part 2)Gravatar Matt Sarett2017-01-19
| | | | | | | | | | | | | | | | | | | | | This lands all the new xform hooks but no change to src/codec. So the new decode features are turned off. I'm relanding this in pieces to try to bisect a strange MSAN error. Original CL: https://skia-review.googlesource.com/c/7085/ BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD,Build-Ubuntu-Clang-x86_64-Release-Fast Change-Id: I451a2a29c73ca475e9e7a5ded58d4948d6b8be19 Reviewed-on: https://skia-review.googlesource.com/7277 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Respect full precision for RGB16 PNGs"Gravatar Matt Sarett2017-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7a090c403da1dad6a2e19f2011158bd894a62d91. Reason for revert: <INSERT REASONING HERE> Original change's description: > Respect full precision for RGB16 PNGs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 > Reviewed-on: https://skia-review.googlesource.com/7085 > Commit-Queue: Matt Sarett <msarett@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Reed <reed@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ibd9879bc4f65ca0c2457dd0bfb5eb008d9a8f672 Reviewed-on: https://skia-review.googlesource.com/7183 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* Respect full precision for RGB16 PNGsGravatar Matt Sarett2017-01-18
| | | | | | | | | | | | | BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 Reviewed-on: https://skia-review.googlesource.com/7085 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Reed <reed@google.com>
* Revert "Optimize SkBlend by using NEON intrinsics"Gravatar Mike Klein2017-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7adde145d3913cfd67b90bf83a9ea54386a285a7. Reason for revert: may be breaking our Android One test bots. Original change's description: > Optimize SkBlend by using NEON intrinsics > > Use NEON intrinsics to check the alpha channel of the pixels. > > In some case, it's about 14 times faster than the original implementation. > > $ ./bin/droid out/arm64_release/nanobench --samples 300 --nompd --match LinearSrcOver -v > neon_opt.log > $ ./bin/compare neon_opt.log clean.log > LinearSrcOver_yellow_rose.pngVSkOptsDefault 1.8ms -> 24.9ms 13.8x > LinearSrcOver_iconstrip.pngVSkOptsDefault 5.71ms -> 69.8ms 12.2x > LinearSrcOver_plane.pngVSkOptsDefault 1.45ms -> 11ms 7.62x > LinearSrcOver_baby_tux.pngVSkOptsDefault 1.88ms -> 9.96ms 5.29x > LinearSrcOver_mandrill_512.pngVSkOptsDefault 1.41ms -> 4.62ms 3.29x > LinearSrcOver_yellow_rose.pngVSkOptsTrivial 24.9ms -> 24.9ms 1x > LinearSrcOver_yellow_rose.pngVSkOptsNonSimdCore 2.17ms -> 2.18ms 1x > LinearSrcOver_plane.pngVSkOptsTrivial 11.1ms -> 11.1ms 1x > LinearSrcOver_plane.pngVSkOptsNonSimdCore 1.5ms -> 1.5ms 1x > LinearSrcOver_mandrill_512.pngVSkOptsNonSimdCore 2.39ms -> 2.39ms 1x > LinearSrcOver_iconstrip.pngVSkOptsNonSimdCore 6.43ms -> 6.43ms 1x > LinearSrcOver_baby_tux.pngVSkOptsBruteForce 22.3ms -> 22.3ms 1x > LinearSrcOver_yellow_rose.pngVSkOptsBruteForce 45.5ms -> 45.5ms 1x > LinearSrcOver_baby_tux.pngVSkOptsNonSimdCore 2.02ms -> 2.02ms 1x > LinearSrcOver_iconstrip.pngVSkOptsTrivial 69.7ms -> 69.7ms 1x > LinearSrcOver_baby_tux.pngVSkOptsTrivial 9.96ms -> 9.95ms 1x > LinearSrcOver_mandrill_512.pngVSkOptsBruteForce 99.3ms -> 99.2ms 1x > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: Ia576365578d65b771440da65fdf41f090ccf0541 > Reviewed-on: https://skia-review.googlesource.com/6860 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,bsalomon@google.com,joel.liang@arm.com,reviews@skia.org BUG=skia: NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ie40eb5a7c27807aaf396429a82a1a2dd328c2b5b Reviewed-on: https://skia-review.googlesource.com/7036 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Fix out of bounds read in RP::load_tables_u16_be()Gravatar Matt Sarett2017-01-13
| | | | | | | | | | | BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I4f6dd002b03812d63bf62342c346ea21f6865466 Reviewed-on: https://skia-review.googlesource.com/7027 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* Optimize SkBlend by using NEON intrinsicsGravatar Joel Liang2017-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use NEON intrinsics to check the alpha channel of the pixels. In some case, it's about 14 times faster than the original implementation. $ ./bin/droid out/arm64_release/nanobench --samples 300 --nompd --match LinearSrcOver -v > neon_opt.log $ ./bin/compare neon_opt.log clean.log LinearSrcOver_yellow_rose.pngVSkOptsDefault 1.8ms -> 24.9ms 13.8x LinearSrcOver_iconstrip.pngVSkOptsDefault 5.71ms -> 69.8ms 12.2x LinearSrcOver_plane.pngVSkOptsDefault 1.45ms -> 11ms 7.62x LinearSrcOver_baby_tux.pngVSkOptsDefault 1.88ms -> 9.96ms 5.29x LinearSrcOver_mandrill_512.pngVSkOptsDefault 1.41ms -> 4.62ms 3.29x LinearSrcOver_yellow_rose.pngVSkOptsTrivial 24.9ms -> 24.9ms 1x LinearSrcOver_yellow_rose.pngVSkOptsNonSimdCore 2.17ms -> 2.18ms 1x LinearSrcOver_plane.pngVSkOptsTrivial 11.1ms -> 11.1ms 1x LinearSrcOver_plane.pngVSkOptsNonSimdCore 1.5ms -> 1.5ms 1x LinearSrcOver_mandrill_512.pngVSkOptsNonSimdCore 2.39ms -> 2.39ms 1x LinearSrcOver_iconstrip.pngVSkOptsNonSimdCore 6.43ms -> 6.43ms 1x LinearSrcOver_baby_tux.pngVSkOptsBruteForce 22.3ms -> 22.3ms 1x LinearSrcOver_yellow_rose.pngVSkOptsBruteForce 45.5ms -> 45.5ms 1x LinearSrcOver_baby_tux.pngVSkOptsNonSimdCore 2.02ms -> 2.02ms 1x LinearSrcOver_iconstrip.pngVSkOptsTrivial 69.7ms -> 69.7ms 1x LinearSrcOver_baby_tux.pngVSkOptsTrivial 9.96ms -> 9.95ms 1x LinearSrcOver_mandrill_512.pngVSkOptsBruteForce 99.3ms -> 99.2ms 1x BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia576365578d65b771440da65fdf41f090ccf0541 Reviewed-on: https://skia-review.googlesource.com/6860 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Attempt 3: SkRasterPipelineBlitter: support A8Gravatar Mike Klein2017-01-13
| | | | | | | | | | | Now that SkOpts_hsw.cpp no longer hooks in SkRasterPipeline_opts, it should be safe to try this again. This reverts commit 86d55b312a2649d80890ccf75f24571ada0265f1. Change-Id: I2d495600ca9d3a0f49c2e02fbaaae349cefac3a1 Reviewed-on: https://skia-review.googlesource.com/6985 Reviewed-by: Mike Klein <mtklein@chromium.org>