| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This time, with manual program memory management instead of std::vector<void*>.
Using STL types from SkOpts_hsw.cpp is not safe. Things like std::vector<void*>
are inlined but not anonymous, so they're deduped by the linker arbitrarily. This
is bad when we pick the version compiled with AVX instructions on a machine that
doesn't support AVX...
std::vector<Stage> was safe before because Stage itself was anonymous. While not anonymous, std::vector<Stage> is unique to the compilation unit, because you can only refer to the anonymous Stage in the compilation unit.
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_chromium_asan_rel_ng;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I015e27583b6b6ff06b5e9f63e3f40ee6b27d6dbd
Reviewed-on: https://skia-review.googlesource.com/6550
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stage foo_d should always be the same logic as stage foo swapping r and dr, g
and dg, b and db, a and da. This means we can infer their definitions.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ia0a3abb29a201c647d9ec1860211abfbc19b56ae
Reviewed-on: https://skia-review.googlesource.com/6555
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit e61c40707e70a2be9e32227a929173864f7895e1.
Reason for revert: this and the ODR caused operations on ContiguousContainerBase::elements_, another std::vector<void*> in Chrome, to start using AVX2 instructions. Boy this is annoying...
Change-Id: I2c4837ad70fdef8096db904022b0703b88c6fd6c
Reviewed-on: https://skia-review.googlesource.com/6549
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The overhead of a stage today is 3 x86 instructions, typically looking something like this:
- movq (%rdi), %rax // Load the next stage function pointer.
- addq $0x10, %rdi // Step our progress ahead 16 bytes to that next stage.
- jmpq *%rax // Transfer control to that stage.
But if we make sure the pointer's in esi/rsi, we can use lodsd/lodsq to do those first two steps in one instruction:
- lodsq (%rsi), %rax (≈ movq (%rdi), %rax; addq $0x8, %rsi).
- jmpq *%rax
This CL rearranges things so that we can take advantage of this and generally trim off an instruction of overhead. Instead of a vector of {Fn, ctx} pairs, we'll flatten it down into a single interlaced program vector of void*, basically just ommitting any null context pointers. We pass the pointer to program as the second argument to Fn, putting it in rsi. These two changes together make getting the next Fn to call or the current context the same cheap lodsq instruction, encapsulated as load_and_increment().
Here's how the simple "modulate" blend stage changes:
vmulps %ymm4, %ymm0, %ymm0
vmulps %ymm5, %ymm1, %ymm1
vmulps %ymm6, %ymm2, %ymm2
vmulps %ymm7, %ymm3, %ymm3
movq (%rdi), %rax
addq $0x10, %rdi
jmpq *%rax
~~~~~~~~>
vmulps %ymm4, %ymm0, %ymm0
vmulps %ymm5, %ymm1, %ymm1
vmulps %ymm6, %ymm2, %ymm2
vmulps %ymm7, %ymm3, %ymm3
lodsq (%rsi), %rax
jmpq *%rax
This does make getting the current context a one-time, destructive operation. It's switched from referring to ctx as a void* directly to using ctx() as a thunk that returns a void*. No stage so far has ever referred to ctx twice, and it all appears to inline, so this seems harmless. "matrix_2x3" is a good example of what stages that use context pointers end up looking like:
lodsq (%rsi), %rax
vbroadcastss (%rax), %ymm9
vbroadcastss 0x8(%rax), %ymm10
vbroadcastss 0x10(%rax), %ymm8
vfmadd231ps %ymm10, %ymm1, %ymm8
vfmadd231ps %ymm9, %ymm0, %ymm8
vbroadcastss 0x4(%rax), %ymm10
vbroadcastss 0xc(%rax), %ymm11
vbroadcastss 0x14(%rax), %ymm9
vfmadd231ps %ymm11, %ymm1, %ymm9
vfmadd231ps %ymm10, %ymm0, %ymm9
lodsq (%rsi), %rax
vmovaps %ymm8, %ymm0
vmovaps %ymm9, %ymm1
jmpq *%rax
We can't do this with MSVC, as there's no intrinsic for it I can find, and they disallow inline assembly, and rsi is not used to pass arguments to functions there anyway. ARM doesn't need it... it does this in two instructions naturally anyway. We could do this for 32-bit x86 but I'd just rather focus on x86-64.
It's unclear to me that this makes things any faster, but doesn't appear to make things any slower, and makes I think both the code and disassembly simpler.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ia7b543a6718c75a33095371924003c5402b3445a
Reviewed-on: https://skia-review.googlesource.com/6271
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ICC errata supports the opposite of what we do.
http://www.color.org/icc_specs2.xalter
TBR=reed@google.com
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I18ace7f312926b264e624c30d8cb983eff5c434b
Reviewed-on: https://skia-review.googlesource.com/6277
Commit-Queue: Matt Sarett <msarett@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hooked up existing to/from srgb, and to_2dot2 stages into
SkColorSpaceXform_A2B. Added a from_2dot2 stage to the raster pipeline
to complete the other direction.
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I3887af3f59f67329d7e843e7355ff54e22cc4ed0
Reviewed-on: https://skia-review.googlesource.com/5840
Commit-Queue: Robert Aftias <raftias@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 2e018f548d76b0688f9873c683cffc681fec40ec.
Reason for revert: doesn't appear to have been the roll problem.
Original change's description:
> Revert "clamp to premul when reading premul sRGB"
>
> This reverts commit 04e10da8362a0dcabd795a4ad53f617719ca0d20.
>
> Reason for revert: roll?
>
> Change-Id: Id0a8dcd62763bd6eddde120c513ca97e098a4268
> Reviewed-on: https://skia-review.googlesource.com/6022
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
>
TBR=mtklein@chromium.org,reviews@skia.org,brianosman@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I399ca5e728ce6766c6707682c4c6b685681ffdeb
Reviewed-on: https://skia-review.googlesource.com/6025
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 04e10da8362a0dcabd795a4ad53f617719ca0d20.
Reason for revert: roll?
Change-Id: Id0a8dcd62763bd6eddde120c513ca97e098a4268
Reviewed-on: https://skia-review.googlesource.com/6022
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's pretty easy to start with sound premultiplied linear floats, pack those to sRGB encoded bytes, then read them back to linear floats and find them not quite premultiplied, with a color channel just a smidge greater than the alpha channel. This can happen basically any time we have different transfer functions for alpha and colors... sRGB being the only one we draw into.
This is an annoying problem with no known good solution. So apply the clamp hammer.
These new calls on SkRasterPipeline should make it impossible to get wrong.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I4c974f4a7b151f3f684946f1e83d06b1b288fd01
Reviewed-on: https://skia-review.googlesource.com/5945
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a precursor to using mask load, mask store, and gather instructions for f16. This is a slight performance win too, through slightly simpler code generation. Having done this, it now makes sense to give a name to f16->f32 conversion, from_f16().
Finally, while we're at this, also send store_f32 through store(), so that now all formats use load, gather, and store uniformly.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I403f16f712936e2bcf3294e72c863cb6c6fbcf0c
Reviewed-on: https://skia-review.googlesource.com/5731
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The storage cost is the same, so might as well do this when building the pipeline instead of when running it. This also avoids the awkward cvtsi2ss instruction that screws with register renaming.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I1c7d5bad558870256a31e3da969eee5d80fb93a8
Reviewed-on: https://skia-review.googlesource.com/5782
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- explicitly separate bilinear_ stages in x and y too
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ib7b4f9d26ea6abe9171068e92424479d811ee606
Reviewed-on: https://skia-review.googlesource.com/5636
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clamping stages were also removed from SkColorSpace_A2B as they are now
not needed.
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I24e2e411e12b463854e980cb10c0e6bafb4a7c42
Reviewed-on: https://skia-review.googlesource.com/5546
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Robert Aftias <raftias@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) rename to bilerp_xy, for x,y in {n[egative], p[ositive};
2) pull out a save_xy stage to save off the original x,y;
3) also calculate the fractional x,y fx,fy once instead of 4 times.
1) is a pure refactor;
2) adds a stage but otherwise is nothing different;
3) changes images a little bit (fractional parts can vary a bit around powers of two).
This extends naturally to naive bicubic using 16 bicubic_xy stages.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I666de5c21e978abb4feb6e3225e5b5920ba6c5b9
Reviewed-on: https://skia-review.googlesource.com/5550
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Bicubic is going to blow right past 48. At this point the fixed preallocation strategy is starting to look naive... at 64 we'd allocate just over 1K for every pipeline (and every compiled pipeline).
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ib2944ead1217123aba2b6347fd9d5315217540c9
Reviewed-on: https://skia-review.googlesource.com/5551
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to reviews.skia.org/5540, which did float -> byte.
We use the same trick here exploiting 32768.0f / 0x47000000.
The benefit here is smaller than the other CL, but still measurable.
The exchange here is:
before: int->float, multiply
after: OR, FMA
The cost of an FMA is the same as a multiply, so we're basically just replacing int->float conversion with a bitwise OR.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ieac2247664afa3ff415aec2b48c21505905bee23
Reviewed-on: https://skia-review.googlesource.com/5542
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In IEEE, for each byte BB, the float 0x470000BB equals 32768.0f + BB*(1/256.0f).
So to turn a [0,1] float into a byte, we can
- multiply by (255/256.0f) to get into [0,255/256.0f] range,
- add 32768.0f to get into [32768.0f, 32768.0f + 255/256.0f] range,
- look at the low byte.
Those first two of course are an FMA.
Using this trick here makes store_8888 measurably faster. Instead of a FMA then float->int trunc, we do an FMA then a bitwise AND. Overall the math goes from 4 FMA + 4 trunc + 3 shift to 4 FMA + 3 AND + 3 shift (we can skip the shift for red and the AND for alpha). As you might guess, AND is cheaper than trunc, so this is a net win.
I should be able to follow up with the same trick in reverse in from_8888().
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I42c8f4a6ea0b6c22160517cf5f9c048f01c9a330
Reviewed-on: https://skia-review.googlesource.com/5540
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
This gives us a place to bottleneck this sort of conversion. Every time I try to use the rounding float -> int instructions, they're just a little slower than working the 1/2 into the scale with FMA. Weird.
Change-Id: I7718112b234b4b38ba6af8fef59a47642021839a
Reviewed-on: https://skia-review.googlesource.com/5483
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think we just happened not to here. This improves Adobe -> sRGB pipeline conversion by about 3-4%.
While at it, unify all the fma() lambdas into SkNf_fma(). I'd have called it fma(), but IIRC there was some sort of name conflict there with type-generic fma() functions from the C math.h or something silly like that.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Id176671fec27c984efa4703c5be2fb63b7f0b11f
Reviewed-on: https://skia-review.googlesource.com/5474
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(1) Clamping
If we're going to clamp (8888 outputs), we need to clamp properly
to alpha (not 1) when we premultiply. This fix is made in
SkColorSpaceXform_XYZ.
An alternative fix would move all clamping out of the store
functions, to before the gamma encoding. This generally makes sense,
but the "to 2.2 conversion" may introduce NaNs and always needs a
clamp. So another fix is to just have an extra clamp in the store 2.2
function. Since we have two pipelines, let's try this one in
SkColorSpaceXform_Pipeline :).
(2) Correctly handle the memcpy() case.
This is not changed from a previous (reverted) CL.
Looks like this only ever worked for RGBA inputs,
never got updated when we added BGRA inputs.
This probably flew under the radar because the
clients are smart enough to avoid performing a
color xform altogether when the color spaces
match.
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I0b59239d2488ce9fdbe11efbd96567e420bb9813
Reviewed-on: https://skia-review.googlesource.com/5464
Commit-Queue: Matt Sarett <msarett@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is initially turned on for Linux debug builds,
which allows us to start testing.
Chrome for Android is a really good candidate for
this (will appreciate the code size savings), but
I'd first like to run some tests to understand the
performance/size tradeoffs a little better.
BUG:660416
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ifc80e663767df6bb767abb8b12b1ec5cec644ec5
Reviewed-on: https://skia-review.googlesource.com/5452
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids a malloc/free per SkRasterPipeline::run(), with no downside.
$ out/nanobench --benchType skcolorcodec --colorImages images/colorspace/201293.jpg --skps noskps --xform_only --srgb --ms 10000
target: 273µs
current: 395µs
this CL: 375µs
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Icd62f505f555ebf4ca66ee77a476f59cab68433d
Reviewed-on: https://skia-review.googlesource.com/5447
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel
inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec
(the only file that supports CMYK) and SkColorSpaceXform_A2B to allow
parsing and color xforming of ICC CMYK images.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I11e3d17180244281be3eb43fd608609925a7f71e
Reviewed-on: https://skia-review.googlesource.com/5444
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit dd19ac7d10c7c00dd6e9b1f4c4c6aae729c7e6d4.
Reason for revert: ASAN
Change-Id: I59aacc092398c4db40696a8343d657a5ad7c0f66
Reviewed-on: https://skia-review.googlesource.com/5448
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is initially turned on for Linux debug builds,
which allows us to start testing.
Chrome for Android is a really good candidate for
this (will appreciate the code size savings), but
I'd first like to run some tests to understand the
performance/size tradeoffs a little better.
BUG:660416
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I0fb2512216dfc0bda2e5388f9865318eec22291e
Reviewed-on: https://skia-review.googlesource.com/5348
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 51c3fcd376c5c9972d9476b5532f6164375a38d1.
Reason for revert: ASAN, MSAN both take issue with parse_and_load_gamma()
Original change's description:
> Added CMYK support for ICC profiles.
>
> Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel
> inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec
> (the only file that supports CMYK) and SkColorSpaceXform_A2B to allow
> parsing and color xforming of ICC CMYK images.
>
> BUG=skia:
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197
> CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
>
>
> Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa
> Reviewed-on: https://skia-review.googlesource.com/5197
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Matt Sarett <msarett@google.com>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Leon Scroggins <scroggo@google.com>
>
TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,brianosman@google.com,raftias@google.com,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: Ib43fef00bc233c0b4fa47ed29040d69601def267
Reviewed-on: https://skia-review.googlesource.com/5423
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changed ICC parsing/SkGammas/SkColorLookUpTable to handle non-3-channel
inputs. Parsed CMYK A2B ICC profiles. Integrated this with SkJpegCodec
(the only file that supports CMYK) and SkColorSpaceXform_A2B to allow
parsing and color xforming of ICC CMYK images.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5197
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Id6619f63f04071f79cd2d84321857dfa269ad3aa
Reviewed-on: https://skia-review.googlesource.com/5197
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Leon Scroggins <scroggo@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
It's cute in compile_pipeline(), but as before, clearer and simpler in the blitter.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ib83ff097e4e057e72aed785797e6ac0029ca5dbf
Reviewed-on: https://skia-review.googlesource.com/5399
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
No change in behavior. This just moves the responsibility for this optimization to the blitter (which knows what it's doing) rather than to compile_pipeline(), which sort of has to guess.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I93ad0ac896075deab995b865b188b42de637f0f7
Reviewed-on: https://skia-review.googlesource.com/5398
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Strip all the "constant" verbiage out of stages that really just mean 1, single, scalar.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I3d71202b348fadc3ced8ecb6c18c939cf92d7243
Reviewed-on: https://skia-review.googlesource.com/5396
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984
Reviewed-on: https://skia-review.googlesource.com/5353
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://skia-review.googlesource.com/c/5275/ removed it, and perf noticed.
This is obviously not very pretty or scalable. I plan to folow up with a more thorough and principled way to do this sort of constant-color + invariant-stage == constant-color optimization.
BUG=skia:6013
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I377386f67e66169cce6e0cb0831f3b7154496840
Reviewed-on: https://skia-review.googlesource.com/5338
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Most of this is plumbing through the full paint to shaders instead of just the filter quality.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I6afde07566afa3a4391c24dca7017a9a4f5ec700
Reviewed-on: https://skia-review.googlesource.com/5317
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a very telling FIXME in the MSAN source code:
// FIXME: detect and handle SSE maskstore/maskload
For now, just tell MSAN (correctly) that it's initialized.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I6aec67b99e4d930cb72e438458b33ed116535009
Reviewed-on: https://skia-review.googlesource.com/5311
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ia7a133f515e29e16700aabc0633c77a703425f41
Reviewed-on: https://skia-review.googlesource.com/5239
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They're really similar, so let's make them look that way.
Finally use mask load, mask store, and gather instructions for 8888.
We avoid mask load and store when tail == 0. It's faster (one memory load instead of two) and a cheap test.
For gather, the intrinsics make it look like we could do the same, but it really all boils down to the same masked instruction in the end.
There's probably a better way to implement mask() with math instead of memory loads, but this works for now.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I578f47d4562ea19d983057bf2f4c3e21d0ab9a0e
Reviewed-on: https://skia-review.googlesource.com/5234
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Id9a6d00e68a7cf7464c6a561bd97e63abf6886c4
Reviewed-on: https://skia-review.googlesource.com/5307
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
| |
Was just reading the disassembly and noticed the opportunity.
Change-Id: I25d4b70802f9a9563491f3126da69829611a9b28
Reviewed-on: https://skia-review.googlesource.com/5235
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For stages that have {r,g,b,a} and {dr,dg,db,da} versions, name the {r,g,b,a} one "foo" and the {dr,dg,db,da} on "foo_d". The {r,g,b,a} registers are the ones most commonly used and fastest, so they get short ordinary names, and the d-registers are less commonly used and sometimes slower, so they get a suffix.
Some stages naturally opearate on all 8 registers (the xfermodes, accumulate). These names for those look fine and aren't ambiguous.
Also, a bit more re-arrangement in _opts.h.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ia20029247642798a60a2566e8a26b84ed101dbd0
Reviewed-on: https://skia-review.googlesource.com/5291
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file is getting complicated. We're still in early days and I want to keep it nimble and easy to refactor.
Simplifications:
- go back to one stage function type, packing x and tail together in one size_t
(x_tail = x*N+tail; x = x_tail/N, tail = x_tail%N, all cheap for power of 2 N);
- all stages call next(), ending in just_return;
- stop coddling MSVC with kIsTail.
These simplifications should all make things a little slower, some only when using subpar compilers.
On a positive note, this should cut code size by about half.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I7de87c5a3cb0cbbf1e0ed0588f1ccb860a498e66
Reviewed-on: https://skia-review.googlesource.com/5285
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Every sRGB GM changes, none noticeably.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I632845aea0f40751639cccbcfde8fa270cae0301
Reviewed-on: https://skia-review.googlesource.com/5275
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By stashing the scales in the context (i.e. on the stack), we can free up enough registers to really simplify how the bitmap sample stages interact.
Nearest neighbor is straightforward now: just call the appropriate gather_ function, and you're done. The source pixels end up in the source registers. If they're sRGB encoded, follow up with from_srgb_s
To bilerp, we bracket those 1 or 2 gather+from_srgb_s stages with a stage setting up each corner (x += dx, y += dy, save off scale) and a stage that accumulates into the d-registers (load saved scale, dr += scale * r, etc.). When all the samples are accumulated, copy the d-registers into the s-registers.
from_srgb_d and to_srgb are lightly sketched here and will be used in the next CL, where I apply this same factoring to non-bitmap loads and stores. This is a little tricky, because we don't actually have a float->float to_srgb yet.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I272a1f278f0ea1b29a2f07ac225f753faa8dae81
Reviewed-on: https://skia-review.googlesource.com/5271
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing invert() logic explodes when a == 0.
Less terribly, invert() also does not turn 1.0f into 1.0f, so we now use a float divide. This will cause a small diff in the matrix color filter GM due to increased unpremul precision.
There's an alternative to try if this stage turns out to be speed critical:
auto scale = (a == 0.0f).thenElse(0.0f,
a.invert() * (1.0f / SkNf(1.0f).invert()));
The (1.0f / SkNf(1.0f).invert()) bit there is a constant, scaling a bit to make 1.0f produce 1.0f.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I9db72eda108d3d28583a4357f90a0dcd7e4d8a6f
Reviewed-on: https://skia-review.googlesource.com/5227
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a batch of little tweaks that all preserve the existing logical behavior:
- rename dst to move_dst_src to parallel move_src_dst
- remove unused swap_src_dst
- move swap_rb up with the other utility stages
- factor out from_8888() to parallel from_565() and from_4444()
- factor out gather() from the accum_* stages
This changes the order of the math in accum_8888[_srgb] ever so slightly, from (scale * C) * (1/255.0f) to scale * (1/255.0f * C). It causes a few pixel diffs, but nothing noticeable. This makes the 8888 bilerp logic consistent with the other formats, which all convert to [0,1] float first before being scaled.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Id37857b91be3086565169dcc9b1a537574e532aa
Reviewed-on: https://skia-review.googlesource.com/5226
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5147
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: Id08804803b2bbeab4fa88538491e99e53d5c2efe
Reviewed-on: https://skia-review.googlesource.com/5147
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like I'm not going to be able to avoid supporting sRGB G8, I8, 565, 4444, 8888.
(A8 and F16 will always be linear.) This fixes 565, and lays out the rest of the accum_*.
I did a little reorganization to keep things in ascending bit depth, just for sanity.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5145
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: Ib0508e5a4ee1bab2044a76bcabc367841d634cd2
Reviewed-on: https://skia-review.googlesource.com/5145
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5125
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I2e338ae14db0068d9a09e16a0678dd2ee9f97efd
Reviewed-on: https://skia-review.googlesource.com/5125
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
| |
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5107
Change-Id: I5c30105501cbdb57896d9ec35737494eabd5998b
Reviewed-on: https://skia-review.googlesource.com/5107
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5105
Change-Id: Ic692b5faf2d33fee31b119ff8d3653118b25b7c2
Reviewed-on: https://skia-review.googlesource.com/5105
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need to do any of this funky swap, load, srcin nonsense. We've got a perfectly good scale_constant_float stage just perfect to be used instead.
While we're at it, we only need to modulate by paint alpha if the paint's not opaque. x*1== x...
This puts the (x,y) inputs to shaders in (r,g) where they expect them. It also frees (dr,dg,db,da) for use by the shader. Might be handy for bilerp.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5100
Change-Id: Ief60c469ecac8300798b67cc68817cc1d127cf17
Reviewed-on: https://skia-review.googlesource.com/5100
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|