| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
This was 6-8% faster than the previous code on my Trashcan.
Change-Id: I70081009e233c83226d6d302f871fb7e86cdc438
Reviewed-on: https://skia-review.googlesource.com/16986
Reviewed-by: Matt Sarett <msarett@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dither can bump color values above alpha (duh), or below
zero (duh), so clamp back to premul after dithering.
BUG=skia:6644,skia:6643
Change-Id: Ida107e866380e06130af0d01467117bca929ba44
Reviewed-on: https://skia-review.googlesource.com/17070
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
rcp(rsqrt(x)) doesn't have enough precision when x is a coordinate.
(It's fine when x is a color, like in the softlight blend mode.)
Adds a GM to test this. It used to look quite ugly.
Change-Id: Icec295c2e2f50ae7a5e3e33c62270f632a58f65c
Reviewed-on: https://skia-review.googlesource.com/16914
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I17ac13b9d1ea6765e2c1a2b53aa6975eab408856
Reviewed-on: https://skia-review.googlesource.com/16713
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
needed to add two helper stages for composeshader
load_rgba, store_rgba
These just read/write the r,g,b,a registers to context memory, making no promise as to how the
memory is formatted (e.g. interleaved -vs- planar).
Note that we have similar existing stages, but they did not seem to suit:
constant_color
This guy loads 4 floats from memory, and splats them into registers. I need to load 4 entire
registers.
load_f32, store_f32
These offset where they read/write based on the 'x' register, plus they guarantee that the memory
will be interleaved ala SkPM4f.
Bug: skia:
Change-Id: Iaa81f950660b837bdb34416ab3e342d56a92239b
Reviewed-on: https://skia-review.googlesource.com/16716
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I took a new, unprincipled approach here, which is to just mimic the
legacy code path exactly (e.g. hue_modeproc in SkXfermode.cpp).
This fixes how we handle alpha in these blend modes, and ought to be
faster by avoiding the unpremul.
BUG=skia:6621
Change-Id: I21635290985ff42d9421d2718f7a88cf44a85d7f
Reviewed-on: https://skia-review.googlesource.com/16711
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a name refactor and I'm happy to delay it until we're done
with the current wave of gradient CLs. The main ideas:
- we use the "linear_gradient" stages for all gradients,
so cut the "linear" and just call them "gradient";
- remind ourselves that the 2-stop stage requires even spacing, i.e.
stops at 0 and 1. This name should harmonize with Herb's new
general evenly spaced gradient stage, currently
"evenly_spaced_linear_gradient", and after it lands and I rebase,
"evenly_spaced_gradient"
- remind ourselves which polar coordinate xy_to_polar_unit returns,
the angle.
Change-Id: I0fd0c8bd4c1ead7d2d0fff45a199d318b71f34ac
Reviewed-on: https://skia-review.googlesource.com/16500
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 892501d09bc8608704362235c73a59bb23a386b3.
Reason for revert:
https://bugs.chromium.org/p/chromium/issues/detail?id=721682
:(
Original change's description:
> Evenly space gradient stage.
>
> This seems like an experiment at this point because I don't know how to do
> this kind of thing on arm.
>
>
> Numbers from Skylake...
> Before:
> ./out/Release/nanobench --config srgb \
> --match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13
> Timer overhead: 36.7ns
> ! -> high variance, ? -> moderate variance
> micros bench
> 439.92 ? gradient_linear_clamp_3color srgb
> 2697.60 gradient_linear_clamp_hicolor srgb
> 437.28 gradient_linear_clamp_3color_4f srgb
> 2700.50 gradient_linear_clamp_hicolor_4f srgb
>
>
> After:
> micros bench
> 382.35 gradient_linear_clamp_3color srgb
> 593.49 gradient_linear_clamp_hicolor srgb
> 382.36 gradient_linear_clamp_3color_4f srgb
> 565.60 gradient_linear_clamp_hicolor_4f srgb
>
>
> Numbers on my Mac Trashcan are about even; there is no
> speedup or slowdown between master and this change.
>
> Change-Id: I04402452e23c0888512362fd1d6d5436cea61719
> Reviewed-on: https://skia-review.googlesource.com/15960
> Commit-Queue: Herb Derby <herb@google.com>
> Reviewed-by: Mike Klein <mtklein@chromium.org>
>
TBR=mtklein@chromium.org,mtklein@google.com,herb@google.com,fmalita@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: Ic6a064c66686b6f238ca1417ba1abd9ce25de1b4
Reviewed-on: https://skia-review.googlesource.com/16660
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems like an experiment at this point because I don't know how to do
this kind of thing on arm.
Numbers from Skylake...
Before:
./out/Release/nanobench --config srgb \
--match gradient_linear_clamp_3color gradient_linear_clamp_hicolor -q 19:48:13
Timer overhead: 36.7ns
! -> high variance, ? -> moderate variance
micros bench
439.92 ? gradient_linear_clamp_3color srgb
2697.60 gradient_linear_clamp_hicolor srgb
437.28 gradient_linear_clamp_3color_4f srgb
2700.50 gradient_linear_clamp_hicolor_4f srgb
After:
micros bench
382.35 gradient_linear_clamp_3color srgb
593.49 gradient_linear_clamp_hicolor srgb
382.36 gradient_linear_clamp_3color_4f srgb
565.60 gradient_linear_clamp_hicolor_4f srgb
Numbers on my Mac Trashcan are about even; there is no
speedup or slowdown between master and this change.
Change-Id: I04402452e23c0888512362fd1d6d5436cea61719
Reviewed-on: https://skia-review.googlesource.com/15960
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Change-Id: I5821f823a4c0df54d4388a2f455767f58ae646b8
Reviewed-on: https://skia-review.googlesource.com/16547
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
Like three gray with alpha blends and taking the max alpha.
Change-Id: I104c84f784979030744127f8f66905ad9d1bdf0e
Reviewed-on: https://skia-review.googlesource.com/15898
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: Ie1f9640f5149f21bd8b3b864ff8b176232e1b0a9
Reviewed-on: https://skia-review.googlesource.com/15461
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've decided to ignore our existing CPU implementations and start from
scratch, mostly referencing the GL ES 3.2 spec and w3 spec.
This implementation ought to look a lot like the reference
implementation I've written in gm/hsl.cpp, with the addition of
handling alpha: unpremul, blend, re-premul with a simple SrcOver alpha.
Change-Id: I38cf6be2dc66a6f46d7b18b91847f6933d2fab62
Reviewed-on: https://skia-review.googlesource.com/15316
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no single dither rate that we can use in linear space if we're
using a non-linear transfer function... if it's too high (like today)
we'll dither too much around zero (e.g. 0 -> 5), and if it's too low we
won't dither near one.
We were thinking it'd be a good idea to move the dither later in the
pipeline anyway. This has to be the right spot!
Now that we're moving dither from operating in linear space to operating
in bit-space (after transfer function, aware of bit-depth of
destination) we have to start clamping [0,1] instead of [0,a]...
But, I think I've rewritten things to make sure we don't need to clamp
at all. The main idea is to make sure 0-dither and 1+dither round to 0
and 1 respectively. We can do this by making the dither span exclusive,
switching from [-0.5,+0.5] to (-0.5,+0.5). In practice I'm doing that
as [-0.4921875,+0.4921875], a maximum dither of 63/128 of a bit.
Similarly, I don't think it makes sense to fold in the multiply by alpha
anymore if we're after the transfer function.
Change-Id: I55857bca80377c639fcdd29acc9b362931dd9d12
Reviewed-on: https://skia-review.googlesource.com/15254
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think we can dither generically as a pipeline stage.
I'm not married to where the dither happens, or the implementation,
which is mostly cribbed from
https://en.wikipedia.org/wiki/Ordered_dithering.
BUG=skia:3302,skia:6224
Change-Id: If7f6b22a523ca0b34cb03c0aa97b6734c34e0133
Reviewed-on: https://skia-review.googlesource.com/15161
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a prototype. I have remove the tiling,
but maybe I should add it back in.
I thinking about factoring out the common code with
linear gradient in its own CL.
I think radial gradient will be very close to this.
Change-Id: I1dfcb4f944138ee623afdf10b2a8befde797c604
Reviewed-on: https://skia-review.googlesource.com/13766
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For whatever reason, if I swap the condition in the if_then_else tests
from < to >= and swap the then/else values, I can use constants in
hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it
has something to do with SSE, IEEE, and NaN, but I don't care enough to
speculate any more concretely.
This does that, removes C() and _f, updates some comments, and adds a
guard in build_stages.py to yell if it sees trouble like LCPI40_4...
This reminds me to try -ffast-math soon. I think that was mostly held
back by constants.
Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3
Reviewed-on: https://skia-review.googlesource.com/14942
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This rewrites the existing logic to expose more of the symmetries,
and especially to make them clearly identical subexpressions.
I think it's clear that the intent in hue_to_rgb is to wrap the t value
back into 0-1... that's t = fract(t).
No GM diffs.
Change-Id: I9d62d8f80bcb45711ee334f953d3f6410e068ce4
Reviewed-on: https://skia-review.googlesource.com/14940
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think the original[1] SkRasterPipeline_opts.h version had a typo that
I faithfully copied over to hsl_to_rgb. In Hue2RGB[2], the scalar
equivalent of hue_to_rgb, we mutate t to keep it in 0-1 range[3]. In
the SkRasterPipeline_opts.h code we introduced a new value t2 instead,
and then used it everywhere, but accidentally typed 't' in the t < 1/6 case.
The expression "p + (q - p)*6.0f*t" should have been "p + (q - p)*6.0f*t2".
This fixes things by changing t in place, much like Hue2RGB does.
The GM doesn't change anywhere, which is troubling.
[1] https://skia-review.googlesource.com/c/7460/21/src/opts/SkRasterPipeline_opts.h#808
[2] https://skia-review.googlesource.com/c/7460/21/src/effects/SkHighContrastFilter.cpp#26
[3] I think this whole clamp should probably become "t = fract(t)".
Will follow up.
Change-Id: I3dcc1a79ffae46830178d931844ee3113f8bdfd1
Reviewed-on: https://skia-review.googlesource.com/14910
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think I'm now down to just the constants used in comparisons in
hsl_to_rgb... so weird. I don't get what's special about this code.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: I398139f00371bdbc45281a49bca56d02ba59cba9
Reviewed-on: https://skia-review.googlesource.com/14909
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trying to go slowly to find where problems arise.
Weirdly, I think I got everything except hsl_to_rgb.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: I4d85a4c1f40bd87e7cb18fc9b5ce020812dc31db
Reviewed-on: https://skia-review.googlesource.com/14905
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This finishes off integer constants... they should all be normal now.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: I66ecc6533807fc59bb5ac9d3c5f7ab9e6e1f0d7f
Reviewed-on: https://skia-review.googlesource.com/14528
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far I only seem to be encountering constant pools with float
constants, so integer constants should be easy to make normal.
This just removes _i. There might be a couple integer constants
generated with C() too... they'll be the next CL.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: Icc82cbc660d1e33bcdb5282072fb86cb5190d901
Reviewed-on: https://skia-review.googlesource.com/14527
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think this is just gonna be a bunch of baby steps for now.
This gets everything in SkJumper_vectors.h.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: Ic87faa9bf6380a4fc9a577936dad8c3a9c48472e
Reviewed-on: https://skia-review.googlesource.com/14441
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add -z to print zero bytes instead of ...
- avx+hsw will create 32-byte constants in .const,
so we should disassemble those too, and align to 32 bytes.
- The default _text section on Windows is 16-byte aligned,
so we make a new one that's 32-byte aligned.
Change-Id: Icb2a962baa4c3735e98a992f2285eaf5cb1680fd
Reviewed-on: https://skia-review.googlesource.com/14364
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
| |
The parametric_{r,g,b} stages are just as good now;
under the hood it's all going through approx_powf.
Change-Id: If7f3ae1e24fcee2ddb201c1d66ce1dd64820c89a
Reviewed-on: https://skia-review.googlesource.com/14320
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As long as everything is laid out the same way they were originally, I don't
think there's any reason we can't just use %rip-relative addressing on x86-64.
Basically, we just need to keep all the sections together in order.
Somewhat subtly we cannot just use -D to disassemble all sections. -D will
double-disassemble[1] some bytes, which throws off our %rip-relative addressing
of constants. You can see this in PS1. So we whitelist sections instead.
[1], from man objdump:
This option also has a subtle effect on the disassembly of instructions in code
sections. When option -d is in effect objdump will assume that any symbols
present in a code section occur on the boundary between instructions and it will
refuse to disassemble across such a boundary. When option -D is in effect however
this assumption is supressed. This means that it is possible for the output of -d
and -D to differ if, for example, data is stored in code sections.
Change-Id: Idbcfe08e67113b3f7d75749931c640ff90aa0bf4
Reviewed-on: https://skia-review.googlesource.com/14029
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
| |
My main interest is getting rid of weird code, but it's also faster.
The new bench drops from 667 to 412.
Change-Id: Ibf889601284cf925780320c828394f79937dc705
Reviewed-on: https://skia-review.googlesource.com/14035
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
I don't suppose you know any existing test coverage of this?
I can't seem to trigger any...
Change-Id: I7244053e2fb665888c9443d2bc52c5a23725ec53
Reviewed-on: https://skia-review.googlesource.com/13970
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like the color-space images have this well tested (even without
lab_to_xyz) and the diffs look like rounding/FMA.
The old plan to keep loads and stores outside callback was:
1) awkward, with too many pointers and pointers to pointers to track
2) misguided... load and store stages march ahead by x,
working at ptr+0, ptr+8, ptr+16, etc. while callback
always wants to be working at the same spot in the buffer.
I spent a frustrating day in lldb to understood 2). :/
So now the stage always store4's its pixels to a buffer in the context
before the callback, and when the callback returns it load4's them back
from a pointer in the context, defaulting to that same buffer.
Instead of passing a void* into the callback, we pass the context
itself. This lets us subclass the context and add our own data...
C-compatible object-oriented programming.
Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43
Reviewed-on: https://skia-review.googlesource.com/13985
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tweaks to make the parallels between from_half and to_half stand out.
We can logically do the `auto denorm = em < ...;` comparisons as either
U32 or I32. U32 would read more naturally, but we do I32 because some
instruction sets have direct signed comparison but must synthesize an
unsigned comparison.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug
Change-Id: Ic74fe5b3b850f5bb7fd00fd4435bc32b8628eecd
Reviewed-on: https://skia-review.googlesource.com/13963
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors from_half() and to_half() a bit, totally
reimplementing the non-hardware cases to be more clearly correct.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug
Change-Id: I439463cf90935c5e8fe2369cbcf45e07f3af62c7
Reviewed-on: https://skia-review.googlesource.com/13921
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Move to SkJumper_vectors.h
- Fold the -127 and +2.774485010.
- approx_powf(F,F) instead of approx_powf(F,float) for consistency.
- A little layout reformatting.
Change-Id: If9cb3d62a097cb6ecf89f157a1dde672c1516371
Reviewed-on: https://skia-review.googlesource.com/13865
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've tried a couple of ideas for approx_powf():
1) accumulate integer powers of x, then 4th roots, then 16th roots
2) continue 1) all the way to 256th roots
3) decompose into pow2 and log2, exploiting IEEE float layout
4) slightly tune constants used in 3)
5) accumulate integer powers of x, then 3+4) with different tuning
6) follow a source online, basically 5 with finesse
7) a new source quoting and improving on the method in 6).
7) seems perfect, enough that maybe we can explore improving its speed
at cost of precision. Might be nice to get rid of those divides. If we
allow a small tolerance (2-5) in our tests, we could use the very simple
fast forms from 3) (e.g. PS 5). I wish I had some images to look at!
Anything involving roots seems to be subverted by poor rsqrt precision.
This change of course affects the pipelines created by the tests for
exponential and full parametric gamma curves. What's less obvious is
that it also means SkJumper can now for the first time run the pipeline
created by the mixed gamma curves test. This means we now need to relax
our tolerance for the table-based channel, just like we did when
implementing table_{r,g,b,a}.
This took me an embarassingly long time to figure out. *face palm*
Change-Id: I451ee3c970a0a4a4e285f8aa8f6ef709a654d247
Reviewed-on: https://skia-review.googlesource.com/13656
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In testing, it didn't really seem like we're getting anything out of
doing an interpolated lookup, so this just does a single rounded lookup.
Change-Id: If85ba68675945b442076519dd7f1bf7540d1628d
Reviewed-on: https://skia-review.googlesource.com/13646
Commit-Queue: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I738da1dac2ef5b74ef34cca14938c0b6cce2fe16
Reviewed-on: https://skia-review.googlesource.com/13596
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
testing: out/dm --src colorImage --colorImages images/colorspace/ --config srgb
Change-Id: I3481e1fb4a070cfd5d95361329fd95af5fcfbd1f
Reviewed-on: https://skia-review.googlesource.com/13590
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us temporarily escape to piece of code outside
SkRasterPipeline. We should be able to use this to replace
- parametric_{r,g,b,a}
- table_{r,g,b,a}
- color_lookup_table
- shader_adapter*
* We want to obsolete shader_adapter for other reasons anyway,
but we _could_ replace it with this if we want to.
Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e
Reviewed-on: https://skia-review.googlesource.com/12102
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This splits SkImageShaderContext into three parts:
- SkJumper_GatherCtx: always, already done
- SkJumper_SamplerCtx: when bilinear or bicubic
- MiscCtx: other little bits (the matrix, paint color, tiling limits)
Thanks for the snazzy allocator that allows this Herb!
Both SkJumper and SkRasterPipeline_opts.h should be speaking all the
same types now.
I've copied the comments about bilinear/bicubic to SkJumper with little
typo fixes and clarifications.
Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6
Reviewed-on: https://skia-review.googlesource.com/13269
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've rearranged while porting, I hope making the logic clearer.
Exactly one gm is affected, highcontrastfilter.
The most interesting line is this, from hsl_to_rgb:
F t2 = if_then_else(t < 0.0_f, t + 1.0_f,
I had to write 0.0_f (instead of the usual 0) to force Clang to compare
against a zero register instead of a 16-byte zero constant in memory.
Register pressure is high in hsl_to_rgb, so something must have kicked
in to prefer memory over zeroing a register.
No big deal. It makes the code read more symmetrically anyway.
Change-Id: I1a5ced72216234587760c6f803fb69315d18fae0
Reviewed-on: https://skia-review.googlesource.com/13242
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I954d02638a785bec284d2fdf8f46abfccd474e7a
Reviewed-on: https://skia-review.googlesource.com/10211
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
| |
Factors out a function F from_byte(U8) too.
Change-Id: Ib739ccbd509ddf25d2bfb7751ba6eaf51b16c12f
Reviewed-on: https://skia-review.googlesource.com/11791
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
Here we use 64-bit gather instructions for HSW,
which I think we haven't done before.
Change-Id: I7b22b3cc0b7a151952518bb9afb90624ebdb4a22
Reviewed-on: https://skia-review.googlesource.com/11602
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: Iefa8044bac0555c5fff370217a6270b4f3c64300
Reviewed-on: https://skia-review.googlesource.com/11582
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
This is all the gathers except index 8 and f16, which aren't conceptually hard
but I want to land separately.
Change-Id: I525f2496e55451041bd6ea07985858fda7b56a40
Reviewed-on: https://skia-review.googlesource.com/11524
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I70bd64d114a2460534bcb51d356e13d9bc3b8603
Reviewed-on: https://skia-review.googlesource.com/11491
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I71d85ffe29bc11678ff1e696fa4a2c93d0b4fcbe
Reviewed-on: https://skia-review.googlesource.com/11446
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Its alignment (sometimes 4, sometimes 16) has proven to be error-prone.
This also means we don't really need LazyCtx::load().
I think I only had it there to make sure we were doing unaligned loads
of F4; the better way is to just never declare the data as aligned...
The generated code isn't quite as good, but I can live with it.
Change-Id: I5d57a580ca12c94ca84a5e8b72a66cf8d0c829eb
Reviewed-on: https://skia-review.googlesource.com/11406
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
Nothing too tricky here.
Change-Id: I2a10548efc75a6fd875fcb242790880d9b9a28fd
Reviewed-on: https://skia-review.googlesource.com/11388
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I2d58538ab071b217d8dbbf2d802493d9045eabf2
Reviewed-on: https://skia-review.googlesource.com/11384
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|