aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
...
* Use RasterPipeline to support full precision on 16-bit RGBA pngsGravatar Matt Sarett2017-01-13
| | | | | | | | | | | | Reland of Original Change: https://skia-review.googlesource.com/6260 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I809984dd9af225103bfbe83492a17c19da7c5e40 Reviewed-on: https://skia-review.googlesource.com/6980 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Use RasterPipeline to support full precision on 16-bit RGBA pngs"Gravatar Matt Sarett2017-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bb2339da39ab3ee59121acd911920dafcd4a2f72. Reason for revert: Breaks MSAN Original change's description: > Use RasterPipeline to support full precision on 16-bit RGBA pngs > > TODO: Support more precision on 16-bit RGB pngs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d > Reviewed-on: https://skia-review.googlesource.com/6260 > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Matt Sarett <msarett@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com,reviews@skia.org BUG=skia: NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I47579c20af033a75883e2b35567cb9c690ce54b0 Reviewed-on: https://skia-review.googlesource.com/6975 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* Use RasterPipeline to support full precision on 16-bit RGBA pngsGravatar Matt Sarett2017-01-12
| | | | | | | | | | | | | | | TODO: Support more precision on 16-bit RGB pngs BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d Reviewed-on: https://skia-review.googlesource.com/6260 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* disable runtime detected AVX2 raster pipelinesGravatar Mike Klein2017-01-12
| | | | | | | | | | | | | It's proving too difficult to keep on top of all the ways we might cause ODR violations that crash Chrome. I'd rather focus on other ways of running the pipelines that won't have that particular problem. Our -Fast bots will keep testing and benchmarking AVX2 raster pipelines. BUG=chromium:679147,chromium:654213,chromium:664864,chromium:666707,etc. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I35ba8f5f4303107237fd78a6ce442d7c26e5fbef Reviewed-on: https://skia-review.googlesource.com/6827 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* remove xbyak experimentGravatar Mike Klein2017-01-10
| | | | | | | | | | | SkSplicer is better. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I014ec0e9fb00a8a4694d442e672c65402621dc67 Reviewed-on: https://skia-review.googlesource.com/6830 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkXbyak: loop inside, only bodyGravatar Mike Klein2017-01-07
| | | | | | | | | | | | SkXbyak_… 927 …JITCompiled 1x …Interpreted 1.33x …HandWritten 1.97x CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I486bbc341a38354345bfcf3d6150d1628f83f186 Reviewed-on: https://skia-review.googlesource.com/6726 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "Retry "SkRasterPipelineBlitter: support A8"..."Gravatar Mike Klein2017-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit f55ea6a1deb21120944d406124a2984b5009260a. Reason for revert: crbug.com/679147 Original change's description: > Retry "SkRasterPipelineBlitter: support A8"... > > ...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter. > > I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness. > > CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c > Reviewed-on: https://skia-review.googlesource.com/6627 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,brianosman@google.com,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I6a36b4c087a52e54f4d591ded40e6a202fb77068 Reviewed-on: https://skia-review.googlesource.com/6760 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Fix Sk8f::Store4 (for HSW)Gravatar Matt Sarett2017-01-06
| | | | | | | | | | | | | This should fix the colorspacexform gm in Gold. https://gold.skia.org/search?head=true&include=false&limit=50&neg=false&pos=false&query=name%3Dcolorspacexform%26source_type%3Dgm&unt=true BUG=skia: Change-Id: I05e2c2c0e7d7095f6935e60ff1bf89858380335f Reviewed-on: https://skia-review.googlesource.com/6721 Commit-Queue: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Add a real SkXbyak bench, implement enough to run it.Gravatar Mike Klein2017-01-06
| | | | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD SkXbyak_… 9320 …JITCompiled 1x …Interpreted 1.24x …HandWritten 2.5x Change-Id: I37d2d255ff32dcce73d29081d506e2d67477af97 Reviewed-on: https://skia-review.googlesource.com/6697 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Avoid SkFixed overflow in decal bitmap procsGravatar Florin Malita2017-01-06
| | | | | | | | | | | | | | | | | | The check for decal mode can overflow in SkFixed. Promote to 64bit (48.16) instead. Also update can_truncate_to_fixed_for_decal() to take SkFixed params and used it in ClampX_ClampY_filter_scale_SSE2(). BUG=chromium:675444 R=reed@google.com CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I759464cdaa5c005159e38e32167fb1937e2a1d28 Reviewed-on: https://skia-review.googlesource.com/6538 Reviewed-by: Cary Clark <caryclark@google.com> Commit-Queue: Florin Malita <fmalita@chromium.org>
* SkXbyak basicsGravatar Mike Klein2017-01-06
| | | | | | | | | | | | A little JIT proof of concept for SkRasterPipeline, using xbyak, which is a header-only assembler. It's x86-only, but supports x86 very thoroughly, and it's very user friendly (at least as far as assembler libraries go...). CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ie17e562b0f3fff5914041badfb2c1fe4f86efab8 Reviewed-on: https://skia-review.googlesource.com/5730 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Heather Miller <hcm@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Retry "SkRasterPipelineBlitter: support A8"...Gravatar Mike Klein2017-01-06
| | | | | | | | | | | | | | ...preferring SkA8_Coverage_Blitter over SkRasterPipelineBlitter. I think we could make this work with SkRasterPipelineBlitter (tell it, draw white in Src mode with this mask), but the existing blitter is pretty hard to beat in efficiency and correctness. CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I72df9995c63f3334d8111c59711818cb5ed1e63c Reviewed-on: https://skia-review.googlesource.com/6627 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* simplify by removing _d stagesGravatar Mike Klein2017-01-06
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I75e232faee6ad48f65bac5b119a461280b27bbc8 Reviewed-on: https://skia-review.googlesource.com/6661 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Use stack instead of malloc() for most calls to SkRasterPipeline::run().Gravatar Mike Klein2017-01-05
| | | | | | | | | | | | | Also split bench into run/compile variants to measure the effect: Before …f16_compile 1x …f16_run 1.02x …srgb_compile 1.56x …srgb_run 1.61x After …f16_run 1x …f16_compile 1.01x …srgb_compile 1.58x …srgb_run 1.59x CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I8e65fb2acdbb05ccc0b3894f16d7646603c3e74d Reviewed-on: https://skia-review.googlesource.com/6621 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipelineBlitter: support A8"Gravatar Mike Klein2017-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit f44373c119290b501d4aec7385e16d12c28a1f0f. Reason for revert: MSAN Original change's description: > SkRasterPipelineBlitter: support A8 > > This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter. > > I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16. > > CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8 > Reviewed-on: https://skia-review.googlesource.com/6554 > Reviewed-by: Mike Reed <reed@google.com> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I9ead3c3335e1776e9a1639ca0481253821505d67 Reviewed-on: https://skia-review.googlesource.com/6625 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipelineBlitter: support A8Gravatar Mike Klein2017-01-05
| | | | | | | | | | | | | | This adds support for loading and storing A8, then uses it in SkRasterPipelineBlitter. I think this handles all dst formats now: A8, 565, 8888 (by policy, sRGB only) and F16. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id207f6e6c56b6bfcc301d77dd23e0959bb7afba8 Reviewed-on: https://skia-review.googlesource.com/6554 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Retry trim another instruction off SkRasterPipeline overheadGravatar Mike Klein2017-01-04
| | | | | | | | | | | | | | | | | | | This time, with manual program memory management instead of std::vector<void*>. Using STL types from SkOpts_hsw.cpp is not safe. Things like std::vector<void*> are inlined but not anonymous, so they're deduped by the linker arbitrarily. This is bad when we pick the version compiled with AVX instructions on a machine that doesn't support AVX... std::vector<Stage> was safe before because Stage itself was anonymous. While not anonymous, std::vector<Stage> is unique to the compilation unit, because you can only refer to the anonymous Stage in the compilation unit. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_chromium_asan_rel_ng;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I015e27583b6b6ff06b5e9f63e3f40ee6b27d6dbd Reviewed-on: https://skia-review.googlesource.com/6550 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* auto-generate _d stagesGravatar Mike Klein2017-01-04
| | | | | | | | | | | | Stage foo_d should always be the same logic as stage foo swapping r and dr, g and dg, b and db, a and da. This means we can infer their definitions. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia0a3abb29a201c647d9ec1860211abfbc19b56ae Reviewed-on: https://skia-review.googlesource.com/6555 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "trim another instruction off SkRasterPipeline overhead"Gravatar Mike Klein2017-01-04
| | | | | | | | | | | This reverts commit e61c40707e70a2be9e32227a929173864f7895e1. Reason for revert: this and the ODR caused operations on ContiguousContainerBase::elements_, another std::vector<void*> in Chrome, to start using AVX2 instructions. Boy this is annoying... Change-Id: I2c4837ad70fdef8096db904022b0703b88c6fd6c Reviewed-on: https://skia-review.googlesource.com/6549 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* trim another instruction off SkRasterPipeline overheadGravatar Mike Klein2017-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The overhead of a stage today is 3 x86 instructions, typically looking something like this: - movq (%rdi), %rax // Load the next stage function pointer. - addq $0x10, %rdi // Step our progress ahead 16 bytes to that next stage. - jmpq *%rax // Transfer control to that stage. But if we make sure the pointer's in esi/rsi, we can use lodsd/lodsq to do those first two steps in one instruction: - lodsq (%rsi), %rax (≈ movq (%rdi), %rax; addq $0x8, %rsi). - jmpq *%rax This CL rearranges things so that we can take advantage of this and generally trim off an instruction of overhead. Instead of a vector of {Fn, ctx} pairs, we'll flatten it down into a single interlaced program vector of void*, basically just ommitting any null context pointers. We pass the pointer to program as the second argument to Fn, putting it in rsi. These two changes together make getting the next Fn to call or the current context the same cheap lodsq instruction, encapsulated as load_and_increment(). Here's how the simple "modulate" blend stage changes: vmulps %ymm4, %ymm0, %ymm0 vmulps %ymm5, %ymm1, %ymm1 vmulps %ymm6, %ymm2, %ymm2 vmulps %ymm7, %ymm3, %ymm3 movq (%rdi), %rax addq $0x10, %rdi jmpq *%rax ~~~~~~~~> vmulps %ymm4, %ymm0, %ymm0 vmulps %ymm5, %ymm1, %ymm1 vmulps %ymm6, %ymm2, %ymm2 vmulps %ymm7, %ymm3, %ymm3 lodsq (%rsi), %rax jmpq *%rax This does make getting the current context a one-time, destructive operation. It's switched from referring to ctx as a void* directly to using ctx() as a thunk that returns a void*. No stage so far has ever referred to ctx twice, and it all appears to inline, so this seems harmless. "matrix_2x3" is a good example of what stages that use context pointers end up looking like: lodsq (%rsi), %rax vbroadcastss (%rax), %ymm9 vbroadcastss 0x8(%rax), %ymm10 vbroadcastss 0x10(%rax), %ymm8 vfmadd231ps %ymm10, %ymm1, %ymm8 vfmadd231ps %ymm9, %ymm0, %ymm8 vbroadcastss 0x4(%rax), %ymm10 vbroadcastss 0xc(%rax), %ymm11 vbroadcastss 0x14(%rax), %ymm9 vfmadd231ps %ymm11, %ymm1, %ymm9 vfmadd231ps %ymm10, %ymm0, %ymm9 lodsq (%rsi), %rax vmovaps %ymm8, %ymm0 vmovaps %ymm9, %ymm1 jmpq *%rax We can't do this with MSVC, as there's no intrinsic for it I can find, and they disallow inline assembly, and rsi is not used to pass arguments to functions there anyway. ARM doesn't need it... it does this in two instructions naturally anyway. We could do this for 32-bit x86 but I'd just rather focus on x86-64. It's unclear to me that this makes things any faster, but doesn't appear to make things any slower, and makes I think both the code and disassembly simpler. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia7b543a6718c75a33095371924003c5402b3445a Reviewed-on: https://skia-review.googlesource.com/6271 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* remove MIPS DSP optimizationsGravatar Mike Klein2016-12-21
| | | | | | | | | | | | | | | | | There are only a couple of these, primarily focused on index8 srcs and 565 dsts. The burden's starting to outweigh the benefit. No one on the team knows MIPS assembly. If we're going to try this again, I'd rather we try some sort of SkNx / compiler-intrinsic based approach, probably targeting MIPS SIMD (MSA), not this older instruction set. We already ignore these files for 64-bit MIPS. This just closes the loop on 32-bit MIPS. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD,Build-Ubuntu-Clang-mips64el-Debug-Android,Build-Ubuntu-Clang-mips64el-Release-Android,Build-Ubuntu-Clang-mipsel-Debug-Android,Build-Ubuntu-Clang-mipsel-Release-Android BUG=skia:6065 Change-Id: Iecac15b56f59625b2e743ea36e7791b90bb0b422 Reviewed-on: https://skia-review.googlesource.com/6353 Reviewed-by: Leon Scroggins <scroggo@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Fix swapped interpretation of c and e in SkColorSpace_ICCGravatar Matt Sarett2016-12-19
| | | | | | | | | | | | | | | | The ICC errata supports the opposite of what we do. http://www.color.org/icc_specs2.xalter TBR=reed@google.com BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I18ace7f312926b264e624c30d8cb983eff5c434b Reviewed-on: https://skia-review.googlesource.com/6277 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Brian Osman <brianosman@google.com>
* Consolidate TILEX_LOW_BITS/TILEY_LOW_BITS -> EXTRACT_LOW_BITSGravatar Florin Malita2016-12-19
| | | | | | | | | | | R=reed@google.com CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I5450d1ae3239c9d4e70502fc042222410ac77e72 Reviewed-on: https://skia-review.googlesource.com/6265 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Florin Malita <fmalita@chromium.org>
* Added optimized sRGB/2.2 gamma stages into A2B color xformGravatar raftias2016-12-15
| | | | | | | | | | | | | | | | Hooked up existing to/from srgb, and to_2dot2 stages into SkColorSpaceXform_A2B. Added a from_2dot2 stage to the raster pipeline to complete the other direction. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I3887af3f59f67329d7e843e7355ff54e22cc4ed0 Reviewed-on: https://skia-review.googlesource.com/5840 Commit-Queue: Robert Aftias <raftias@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Revert "Revert "SkNx basically always is fast now.""Gravatar Mike Klein2016-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8ba64d1996ba6c9ecfb12132cdab7d5d99af7456. Reason for revert: does not appear to have been blocking the roll. Original change's description: > Revert "SkNx basically always is fast now." > > This reverts commit 21f783829619186442041de6008f7f58f4f6250d. > > Reason for revert: roll? > > Original change's description: > > SkNx basically always is fast now. > > > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > > Reviewed-on: https://skia-review.googlesource.com/5946 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Mike Reed <reed@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > > > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org > NOPRESUBMIT=true > NOTREECHECKS=true > NOTRY=true > > Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 > Reviewed-on: https://skia-review.googlesource.com/6040 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reviews@skia.org,fmalita@chromium.org,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I230dd4c2abb2d14ffc302be5376b9eaacbbeafcc Reviewed-on: https://skia-review.googlesource.com/6026 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "Revert "clamp to premul when reading premul sRGB""Gravatar Mike Klein2016-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2e018f548d76b0688f9873c683cffc681fec40ec. Reason for revert: doesn't appear to have been the roll problem. Original change's description: > Revert "clamp to premul when reading premul sRGB" > > This reverts commit 04e10da8362a0dcabd795a4ad53f617719ca0d20. > > Reason for revert: roll? > > Change-Id: Id0a8dcd62763bd6eddde120c513ca97e098a4268 > Reviewed-on: https://skia-review.googlesource.com/6022 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reviews@skia.org,brianosman@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I399ca5e728ce6766c6707682c4c6b685681ffdeb Reviewed-on: https://skia-review.googlesource.com/6025 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "clamp to premul when reading premul sRGB"Gravatar Mike Klein2016-12-14
| | | | | | | | | | | This reverts commit 04e10da8362a0dcabd795a4ad53f617719ca0d20. Reason for revert: roll? Change-Id: Id0a8dcd62763bd6eddde120c513ca97e098a4268 Reviewed-on: https://skia-review.googlesource.com/6022 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "SkNx basically always is fast now."Gravatar Mike Klein2016-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 21f783829619186442041de6008f7f58f4f6250d. Reason for revert: roll? Original change's description: > SkNx basically always is fast now. > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > Reviewed-on: https://skia-review.googlesource.com/5946 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 Reviewed-on: https://skia-review.googlesource.com/6040 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* clamp to premul when reading premul sRGBGravatar Mike Klein2016-12-13
| | | | | | | | | | | | | | | It's pretty easy to start with sound premultiplied linear floats, pack those to sRGB encoded bytes, then read them back to linear floats and find them not quite premultiplied, with a color channel just a smidge greater than the alpha channel. This can happen basically any time we have different transfer functions for alpha and colors... sRGB being the only one we draw into. This is an annoying problem with no known good solution. So apply the clamp hammer. These new calls on SkRasterPipeline should make it impossible to get wrong. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I4c974f4a7b151f3f684946f1e83d06b1b288fd01 Reviewed-on: https://skia-review.googlesource.com/5945 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkNx basically always is fast now.Gravatar Mike Klein2016-12-13
| | | | | | | | | | | | We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 Reviewed-on: https://skia-review.googlesource.com/5946 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* funnel f16 through standard load/store/gatherGravatar Mike Klein2016-12-12
| | | | | | | | | | | | | This is a precursor to using mask load, mask store, and gather instructions for f16. This is a slight performance win too, through slightly simpler code generation. Having done this, it now makes sense to give a name to f16->f32 conversion, from_f16(). Finally, while we're at this, also send store_f32 through store(), so that now all formats use load, gather, and store uniformly. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I403f16f712936e2bcf3294e72c863cb6c6fbcf0c Reviewed-on: https://skia-review.googlesource.com/5731 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Convert image width and height (used by tiling) to float once.Gravatar Mike Klein2016-12-12
| | | | | | | | | | | The storage cost is the same, so might as well do this when building the pipeline instead of when running it. This also avoids the awkward cvtsi2ss instruction that screws with register renaming. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I1c7d5bad558870256a31e3da969eee5d80fb93a8 Reviewed-on: https://skia-review.googlesource.com/5782 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* bicubic, attempt gazillionGravatar Mike Klein2016-12-09
| | | | | | | | | | | | | - explicitly separate bilinear_ stages in x and y too BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ib7b4f9d26ea6abe9171068e92424479d811ee606 Reviewed-on: https://skia-review.googlesource.com/5636 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Added clamps in SkRasterPipeline's gamma stages.Gravatar raftias2016-12-08
| | | | | | | | | | | | | | Clamping stages were also removed from SkColorSpace_A2B as they are now not needed. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I24e2e411e12b463854e980cb10c0e6bafb4a7c42 Reviewed-on: https://skia-review.googlesource.com/5546 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Robert Aftias <raftias@google.com>
* Make sure all the convolve functions are defined.Gravatar Mike Klein2016-12-08
| | | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-GCC-x86_64-Release-Fast,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I5ae1fdee957f796d8051bbb0eca9e037aef9b2c9 Reviewed-on: https://skia-review.googlesource.com/5655 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add AVX2 version of ConvolveVerticallyGravatar xiangze.zhang2016-12-07
| | | | | | | | | | | | | | | | ConvolveVertically time is reduced about 60% using haswell cpu. Nanobench results: before after bitmap_scale_filter_64_256 611us 302us bitmap_scale_filter_80_90 101us 64.9us bitmap_scale_filter_30_90 82.3us 51.4us bitmap_scale_filter_10_90 73.6us 42.4us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2526733002 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Review-Url: https://codereview.chromium.org/2526733002
* Refactor bilerp a little.Gravatar Mike Klein2016-12-06
| | | | | | | | | | | | | | | | | | | 1) rename to bilerp_xy, for x,y in {n[egative], p[ositive}; 2) pull out a save_xy stage to save off the original x,y; 3) also calculate the fractional x,y fx,fy once instead of 4 times. 1) is a pure refactor; 2) adds a stage but otherwise is nothing different; 3) changes images a little bit (fractional parts can vary a bit around powers of two). This extends naturally to naive bicubic using 16 bicubic_xy stages. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I666de5c21e978abb4feb6e3225e5b5920ba6c5b9 Reviewed-on: https://skia-review.googlesource.com/5550 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* remove upper limit on number of pipeline stagesGravatar Mike Klein2016-12-06
| | | | | | | | | | | Bicubic is going to blow right past 48. At this point the fixed preallocation strategy is starting to look naive... at 64 we'd allocate just over 1K for every pipeline (and every compiled pipeline). CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ib2944ead1217123aba2b6347fd9d5315217540c9 Reviewed-on: https://skia-review.googlesource.com/5551 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Manual byte -> float conversion.Gravatar Mike Klein2016-12-05
| | | | | | | | | | | | | | | | | | | This is a follow-up to reviews.skia.org/5540, which did float -> byte. We use the same trick here exploiting 32768.0f / 0x47000000. The benefit here is smaller than the other CL, but still measurable. The exchange here is: before: int->float, multiply after: OR, FMA The cost of an FMA is the same as a multiply, so we're basically just replacing int->float conversion with a bitwise OR. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ieac2247664afa3ff415aec2b48c21505905bee23 Reviewed-on: https://skia-review.googlesource.com/5542 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Tricky float -> byte conversion in store_8888.Gravatar Mike Klein2016-12-02
| | | | | | | | | | | | | | | | | | | | | In IEEE, for each byte BB, the float 0x470000BB equals 32768.0f + BB*(1/256.0f). So to turn a [0,1] float into a byte, we can - multiply by (255/256.0f) to get into [0,255/256.0f] range, - add 32768.0f to get into [32768.0f, 32768.0f + 255/256.0f] range, - look at the low byte. Those first two of course are an FMA. Using this trick here makes store_8888 measurably faster. Instead of a FMA then float->int trunc, we do an FMA then a bitwise AND. Overall the math goes from 4 FMA + 4 trunc + 3 shift to 4 FMA + 3 AND + 3 shift (we can skip the shift for red and the AND for alpha). As you might guess, AND is cheaper than trunc, so this is a net win. I should be able to follow up with the same trick in reverse in from_8888(). CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I42c8f4a6ea0b6c22160517cf5f9c048f01c9a330 Reviewed-on: https://skia-review.googlesource.com/5540 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkNf_round, use it in store_565 and store_tables.Gravatar Mike Klein2016-12-02
| | | | | | | | | This gives us a place to bottleneck this sort of conversion. Every time I try to use the rounding float -> int instructions, they're just a little slower than working the 1/2 into the scale with FMA. Weird. Change-Id: I7718112b234b4b38ba6af8fef59a47642021839a Reviewed-on: https://skia-review.googlesource.com/5483 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* use fma in store_8888Gravatar Mike Klein2016-12-02
| | | | | | | | | | | | | I think we just happened not to here. This improves Adobe -> sRGB pipeline conversion by about 3-4%. While at it, unify all the fma() lambdas into SkNf_fma(). I'd have called it fma(), but IIRC there was some sort of name conflict there with type-generic fma() functions from the C math.h or something silly like that. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id176671fec27c984efa4703c5be2fb63b7f0b11f Reviewed-on: https://skia-review.googlesource.com/5474 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkColorSpaceXform bug fixes attempt 2Gravatar Matt Sarett2016-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (1) Clamping If we're going to clamp (8888 outputs), we need to clamp properly to alpha (not 1) when we premultiply. This fix is made in SkColorSpaceXform_XYZ. An alternative fix would move all clamping out of the store functions, to before the gamma encoding. This generally makes sense, but the "to 2.2 conversion" may introduce NaNs and always needs a clamp. So another fix is to just have an extra clamp in the store 2.2 function. Since we have two pipelines, let's try this one in SkColorSpaceXform_Pipeline :). (2) Correctly handle the memcpy() case. This is not changed from a previous (reverted) CL. Looks like this only ever worked for RGBA inputs, never got updated when we added BGRA inputs. This probably flew under the radar because the clients are smart enough to avoid performing a color xform altogether when the color spaces match. BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I0b59239d2488ce9fdbe11efbd96567e420bb9813 Reviewed-on: https://skia-review.googlesource.com/5464 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Reland "Add RasterPipeline implementation for SkColorSpaceXform"Gravatar Matt Sarett2016-12-01
| | | | | | | | | | | | | | | | | | | This is initially turned on for Linux debug builds, which allows us to start testing. Chrome for Android is a really good candidate for this (will appreciate the code size savings), but I'd first like to run some tests to understand the performance/size tradeoffs a little better. BUG:660416 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ifc80e663767df6bb767abb8b12b1ec5cec644ec5 Reviewed-on: https://skia-review.googlesource.com/5452 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Avoid creating std::function in run_pipeline().Gravatar Mike Klein2016-12-01
| | | | | | | | | | | | | | | | | This avoids a malloc/free per SkRasterPipeline::run(), with no downside. $ out/nanobench --benchType skcolorcodec --colorImages images/colorspace/201293.jpg --skps noskps --xform_only --srgb --ms 10000 target: 273µs current: 395µs this CL: 375µs CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Icd62f505f555ebf4ca66ee77a476f59cab68433d Reviewed-on: https://skia-review.googlesource.com/5447 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Added CMYK support for ICC profiles.Gravatar raftias2016-12-01
| | | | | | | | | | | | | | Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow parsing and color xforming of ICC CMYK images. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I11e3d17180244281be3eb43fd608609925a7f71e Reviewed-on: https://skia-review.googlesource.com/5444 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Add RasterPipeline implementation for SkColorSpaceXform"Gravatar Brian Osman2016-12-01
| | | | | | | | | | | This reverts commit dd19ac7d10c7c00dd6e9b1f4c4c6aae729c7e6d4. Reason for revert: ASAN Change-Id: I59aacc092398c4db40696a8343d657a5ad7c0f66 Reviewed-on: https://skia-review.googlesource.com/5448 Commit-Queue: Brian Osman <brianosman@google.com> Reviewed-by: Brian Osman <brianosman@google.com>
* Add RasterPipeline implementation for SkColorSpaceXformGravatar Matt Sarett2016-12-01
| | | | | | | | | | | | | | | | | | | This is initially turned on for Linux debug builds, which allows us to start testing. Chrome for Android is a really good candidate for this (will appreciate the code size savings), but I'd first like to run some tests to understand the performance/size tradeoffs a little better. BUG:660416 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I0fb2512216dfc0bda2e5388f9865318eec22291e Reviewed-on: https://skia-review.googlesource.com/5348 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "Added CMYK support for ICC profiles."Gravatar Mike Klein2016-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 51c3fcd376c5c9972d9476b5532f6164375a38d1. Reason for revert: ASAN, MSAN both take issue with parse_and_load_gamma() Original change's description: > Added CMYK support for ICC profiles. > > Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel > inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec > (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow > parsing and color xforming of ICC CMYK images. > > BUG=skia: > > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197 > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa > Reviewed-on: https://skia-review.googlesource.com/5197 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Matt Sarett <msarett@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Leon Scroggins <scroggo@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,brianosman@google.com,raftias@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: Ib43fef00bc233c0b4fa47ed29040d69601def267 Reviewed-on: https://skia-review.googlesource.com/5423 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Added CMYK support for ICC profiles.Gravatar raftias2016-12-01
| | | | | | | | | | | | | | | | | | | | Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow parsing and color xforming of ICC CMYK images. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa Reviewed-on: https://skia-review.googlesource.com/5197 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com>