| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
Bug: skia:7459
Change-Id: Iccc2588f80e22b13ed5d23656b8c75d7b7058a36
Reviewed-on: https://skia-review.googlesource.com/92700
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Yuqian Li <liyuqian@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The updated algorithm matches our new GPU algorithm
(https://skia.org/dev/design/conical) and it brings
about 7%-26% speedup. In the next CL, I'll simplify
the GPU code by reusing the CPU code in this CL.
7.20% faster in gradient_conical_clamp_hicolor
8.94% faster in gradient_conicalZero_clamp_hicolor
10.00% faster in gradient_conicalOut_clamp_hicolor
11.72% faster in gradient_conicalOutZero_clamp_hicolor
13.62% faster in gradient_conical_clamp_3color
16.52% faster in gradient_conicalZero_clamp_3color
17.48% faster in gradient_conical_clamp
17.70% faster in gradient_conical_clamp_shallow
20.60% faster in gradient_conicalOut_clamp_3color
20.98% faster in gradient_conicalOutZero_clamp_3color
21.79% faster in gradient_conicalZero_clamp
22.48% faster in gradient_conicalOut_clamp
26.13% faster in gradient_conicalOutZero_clamp
Bug: skia:
Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c
Reviewed-on: https://skia-review.googlesource.com/90262
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
This brings a little more symmetry to _stages.cpp and _stages_lowp.cpp.
Change-Id: Icfcbd3f264ab97d8445ad8e14c25b4a07c780aea
Reviewed-on: https://skia-review.googlesource.com/90030
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
This time, rewrite the loop over sample points to be a little
friendlier to 32-bit x86 code generation. The previous version
created an object file indirection feature build_stages.py can't handle.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb
Reviewed-on: https://skia-review.googlesource.com/89180
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is more consistent with our other SK_BUILD_FOR_... macros,
and less likely to collide with other preprocessor logic.
(Luckily, this was defined in public.bzl, so we can do this
all in one CL in the Skia repo.)
Change-Id: I5f232888288c9c53fad445545d983d0fb0b4add8
Reviewed-on: https://skia-review.googlesource.com/86940
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85.
Reason for revert:
Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android
Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Original change's description:
> attempt 2: add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
> Reviewed-on: https://skia-review.googlesource.com/86860
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,liyuqian@google.com
Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/86601
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
Reviewed-on: https://skia-review.googlesource.com/86860
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trying to carefully manage the in-gamut / out-of-gamut state
of the pipeline, let's do what a GPU would do, clamping to representable
range in any float -> integer conversion.
Most effects doing table lookups now clamp themselves internally, and
the store_foo() methods clamp when the destination is fixed point. In
turn the from_srgb() conversions and all future transfer function stages
can care less about this stuff.
If I'm thinking right, the _lowp side of things need not change at all,
and that will soften the performance impact of this change. Anything
that was fast to begin with was probably running a _lowp pipeline.
Bug: skia:7419
Change-Id: Id2e080ac240a97b900a1ac131c85d9e15f70af32
Reviewed-on: https://skia-review.googlesource.com/85740
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The path involving approx_log2() and approx_pow2() does not produce 0.
And it's probably not a good idea to think about what approx_log2(0) is
anyway.
Change-Id: If5f48298c5bd5565ae808ebdfbd02649f4dd3046
Reviewed-on: https://skia-review.googlesource.com/85840
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some interesting things are starting to fall out already,
like the fact that I needed to add a gamma_dst stage to
be able to draw into gamma-transfer-fn destinations.
I've also had to pass an SkAlphaType through to the linearize
functions so that they can maintain premul invariants. I'm not
sure this is actually a good idea... if you can, please double-
check my logic at SkRasterPipeline.cpp:128?
If it's correct logic, I'm going to need to do it all over the place.
But I imagine you don't do this and somehow get away with it.
Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959
Reviewed-on: https://skia-review.googlesource.com/84680
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're still working out why these intrinsics aren't available.
Long term options:
- get the intrinsics to work
- convert to inline asm
Change-Id: I07edf1944daf01842f01b26ad874f62314d0f68f
Reviewed-on: https://skia-review.googlesource.com/84222
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to be a bit more pedantic here to support builds that may be
using AVX2 as part of their baseline but perhaps not enabling all the
related features SkJumper would like to use.
E.g. we've seen Tensorflow build with AVX2 and FMA, but not F16C.
So check all three {AVX2,FMA,F16C}, and only then build stages in HSW
mode. I've updated the define as a reminder.
This only affects builds using these features for their _baseline_
stages... the offline-compiled stages in SkJumper_generated.S are
not affected.
Change-Id: I9bfb3bae3589d35043b748782cefa8c213726d6a
Reviewed-on: https://skia-review.googlesource.com/84221
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed these little bits while working on that old-Clang fix.
- We can force-inline anytime we've got Clang,
not just when JUMPER_IS_OFFLINE.
- The _aarch64 and _vfp4 WRAP functions are dead code,
as they're never compiled offline now.
Change-Id: I5850daded2ffcfe50ceeadc43f89fa8597df3387
Reviewed-on: https://skia-review.googlesource.com/84060
Commit-Queue: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Older versions of Clang, at least vanilla 3.9 and "Apple LLVM version
8.1.0 (clang-802.0.42)" seem to crash when compiling SkJumper_stages.cpp
with NEON for ARMv7 without at least -O1.
So detect that case, and fall back to scalar code.
Change-Id: I3c1595da491bef38c18f47f96690700c67fdc70e
Reviewed-on: https://skia-review.googlesource.com/83980
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VFPv4 gives us two interesting features:
- FMA
- f16<->f32 conversions
Even without FMAs, NEON still has non-fused MLA instructions. We don't
really care about the fusedness of those mul-adds, so losing FMA here is
kind of no big deal.
We already maintain portable code to do f16<->f32 conversions, so it's
not much of a maintanence hit to use that instead of the native
instructions. To my knowledge software F16 rendering is not a
performance critical mode of operation for any of our users.
This drops our minimum requirement to basically just having NEON.
Devices like the Nexus 7 2012 will now take SkJumper fast paths
instead of portable code. (Though actually, we've only ever
required NEON for _lowp... only the float code also needed vfpv4).
The main file to look at here is actually SkJumper_vectors.h,
where you will see all the substantive changes. The rest just
kind of tears down most of the old complexity, add adds ABI
to put just a little of it back. :)
Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/83521
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here's the tiny performance gain:
$python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic"
firstsecond (compared to master) is likely
4.23% faster in gradient_conicalOut_clamp_3color
4.23% faster in gradient_conicalOutZero_clamp_3color
4.79% faster in gradient_conical_clamp_shallow_dither
6.04% faster in gradient_conical_clamp_3color
6.04% faster in gradient_conicalZero_clamp_3color
6.42% faster in gradient_conicalOut_clamp
6.43% faster in gradient_conicalOutZero_clamp
6.74% faster in gradient_conical_clamp
6.98% faster in gradient_conical_clamp_shallow
6.98% faster in gradient_conicalZero_clamp
Bug: skia:
Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043
Reviewed-on: https://skia-review.googlesource.com/76561
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d.
Reason for revert: lots of crashing GPU bots.
Original change's description:
> add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
> Reviewed-on: https://skia-review.googlesource.com/82100
> Reviewed-by: Yuqian Li <liyuqian@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com
Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/83340
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
Reviewed-on: https://skia-review.googlesource.com/82100
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out I was defining JUMPER_HAS_NEON_LOWP,
but checking JUMPER_NEON_HAS_LOWP.
🤦
Change-Id: Ib328190ce35a367bf3d08d8e66f0ab8791ccb8b2
Reviewed-on: https://skia-review.googlesource.com/82320
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
705.32 -> 457.76 gradient_sweep_clamp_3color
609.38 -> 345.34 gradient_radial1_clamp_3color
Change-Id: I0165ac8f004ee095ada4f12b33db0a94ae39fca3
Reviewed-on: https://skia-review.googlesource.com/69902
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a3dd5ec3a769fb833ce77878cd4e551c15e5074d.
Reason for revert: breaking build on Build-Debian9-Clang-x86_64_Release-Fast
Original change's description:
> more powerful map()
>
> Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608
> Reviewed-on: https://skia-review.googlesource.com/69901
> Reviewed-by: Florin Malita <fmalita@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,fmalita@chromium.org
Change-Id: Ice989dd6a6b2786f318791dd91f2c06f689cb979
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/70105
Reviewed-by: Greg Daniel <egdaniel@google.com>
Commit-Queue: Greg Daniel <egdaniel@google.com>
|
|
|
|
|
|
|
| |
Change-Id: Icbae002999a295e3a9d1d2e6046e686784d5f608
Reviewed-on: https://skia-review.googlesource.com/69901
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I15f83a72645fed0ed8dca9c9aad66c5db5eb247a
Reviewed-on: https://skia-review.googlesource.com/69920
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was originally going to add these to help test a lowp dither, but
after looking at diffs I don't think lowp dither is a good idea.
Non-dithered lowp gradients look fine to me so far.
I'd have done conics, but they scare me.
Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3
Reviewed-on: https://skia-review.googlesource.com/66460
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
The 4444 image in all_bitmap_configs now draws slightly different before
and after serialization. (It's serialized as 8888.) Still looks fine.
Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd
Reviewed-on: https://skia-review.googlesource.com/65960
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Move a couple stages around in the enum to places
that make more sense, and guass_a_to_rbga in the code too.
2) mirror the SkRasterPipeline stage enum with either:
LOWP(st): the stage is implemented in low precision
TODO(st): the stage should be lowp, but isn't
NOPE(st): the stage shouldn't be done in lowp.
3) statically enforce that all stages are covered by one of
LOWP, TODO, or NOPE.
Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767
Reviewed-on: https://skia-review.googlesource.com/63800
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I5629e74c4c13ddb9217fd3c2df3388030fa03f0c
Reviewed-on: https://skia-review.googlesource.com/63780
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really
doing them any good. Probably a good idea to cover both kN32 options
any time we specialize like this?
There's one small diff, so I've lazily guarded this by
SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway.
Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478
Reviewed-on: https://skia-review.googlesource.com/63301
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills out a couple more matrix and gather stages.
Deletes a not particularly important unit test that was using a
scale matrix in a weird, non-lowp compatible way.
This will require guards for Blink layout tests.
Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3
Reviewed-on: https://skia-review.googlesource.com/62520
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method is a little simpler macro-wise,
and makes it easier to guard new lowp stages:
LOWP(foo)
LOWP(bar)
#ifndef SK_LEGACY_LOWP_BAZ
LOWP(baz)
#endif
Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab
Reviewed-on: https://skia-review.googlesource.com/63141
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op refactor.
It's just always surprised me that the matrix_scale_translate
stage expects [tx ty sx sy], when scales precede the translates
in the names and in both normal row-major and column-major matrix
layouts.
This switches to [sx sy tx ty], scale then translate.
Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6
Reviewed-on: https://skia-review.googlesource.com/62541
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're going to want to assign types to the stages depending on their
inputs and outputs:
GG: x,y -> x,y
GP: x,y -> r,g,b,a
PP: r,g,b,a -> r,g,b,a
(There are a couple other degenerate cases here, where a stage ignores
its inputs or creates no outputs, but we can always just pretend their
null input or output is one type or the other arbitrarily.)
The GG stages will be pretty much entirely float code, and the GP stages
a mix of float math and byte stuff.
Since we've chosen U16 to match our register size in _lowp land,
we'll unpack each F register across two of those for transport between
stages. This is a notional, free operation in both directions.
Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715
Reviewed-on: https://skia-review.googlesource.com/60800
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
As this array grows longer it causes troublesome code generation
when we're compiling offline, but it's easy as an argument.
Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86
Reviewed-on: https://skia-review.googlesource.com/60564
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today (x,y) are the integer coordinates of the first destination pixel
we're working on. By renaming them (dx,dy), we free up the names (x,y)
for working (i.e. _source_) x and y.
Until now we've generally just been continuing to call those (r,g), but
in the _lowp code that won't be possible (r+g hold x together, b+a y)
but we'll have the ability to just give them proper names x and y.
Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f
Reviewed-on: https://skia-review.googlesource.com/60820
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's properly 16 today because of HSW/lowp stages handling 16 pixels at
a time, but it hasn't yet had an effect on lowp so we didn't notice.
As we add lowp shader stages this will start to matter,
so might as well bump it up to 16 now.
(One day _skx lowp stages could bump this up to 32.)
Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a
Reviewed-on: https://skia-review.googlesource.com/60565
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are non-zero values of a that make infinite 1.0f/a.
Let's just check for the real thing we care about, that
scale is finite.
Bug: skia:7123
Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1
Reviewed-on: https://skia-review.googlesource.com/58460
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today gradient mirror and repeat don't explicitly clamp. They work fine for
normal float values, but blow up with inputs like infinity and NaN, and
those aren't hard to construct with a combination of a funky matrix and
some squaring for xy -> radius.
So explicitly clamp in each of the three matrix tilers.
This should fix the fuzz at the associated bug.
Bug: skia:7093
Change-Id: Idd44e3c7a1ed95e2b1ace8eb953b62eddeb4e00e
Reviewed-on: https://skia-review.googlesource.com/55702
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I9a140e342e7b12b1cbb09503ca8fc03016717784
Reviewed-on: https://skia-review.googlesource.com/55701
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just makes sure all the plumbing is in place to use the Skylake
Xeon subset of AVX-512 instructions. So far,
- no Windows
- no lowp
- nothing explicitly making use of AVX-512 registers or instructions
This initial pass should run essentially identically to the _hsw AVX2
code we've been using previously. Clang _does_ use AVX-512-only
instructions to implement some of the higher-level concepts we've coded,
but it's really a pretty subtle difference.
Next steps will bump N from 8 to 16 and start threading through an
AVX-512-friendly mask instead of tail. I'll also want to take a harder
look at how we do blending like if_then_else()... the default codegen
here doesn't really take advantage of AVX-512 the way I'd like here.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug
Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2
Reviewed-on: https://skia-review.googlesource.com/54062
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something I came up with while writing _lowp.cpp.
This should all be a logical no-op, but there are some code generation
changes. I'm not exactly sure why.
Change-Id: Iaad36b5298b37fe26ebd375a147a48852f98e1e4
Reviewed-on: https://skia-review.googlesource.com/52003
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowp start_pipeline() always zeros, and with floats we always zero
when compiled as part of Skia, so this just makes the offline float
consistent with the others.
It's getting confusing to think about which code zeros and which
doesn't, and it'd be nicer to be able to rely on zeros.
This should change code generation only to the start_pipelines in
the .S files.
Change-Id: I1178b83c01e609e40dc7912d8d56df8e36eb339d
Reviewed-on: https://skia-review.googlesource.com/52001
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We look at t to create a mask in mask_2pt_conical_degenerates to be
applied later to the colors after the normal gradient stages have run.
But if t itself is NaN, that will wreak havoc in the normal gradient
stages. So in addition to building the mask to kill off degenerate
colors, let's also set degenerate t to zero, which should be a safe
value.
This fixes the fuzz mentioned in this bug.
BUG=skia:7078
Change-Id: I8301450c707bdbf941abd0339959f9e60d46d955
Reviewed-on: https://skia-review.googlesource.com/52763
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op in terms of generated code.
There is no longer a tail call here to be disabled,
not since we changed start_pipeline() to operate in 2D.
Change-Id: Ife92590eb059e28e4a84e3729180c7410a93b410
Reviewed-on: https://skia-review.googlesource.com/52020
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op refactor to make SkJumper_stages.cpp and
SkJumper_stages_lowp.cpp more similar.
Change-Id: Icb5dd415d105fbdc58ce0b9b63058c0a66ed4a13
Reviewed-on: https://skia-review.googlesource.com/52000
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All three image tile modes go through exclusive_clamp() and then a
gather today, so we can move the work of exclusive_clamp() into eac
gather_ stage, eliminating the need for clamp_{x,y} stages.
Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(),
which works out the pointer and vector of indices to load for gathers.
This deletes SkRasterPipeline_repeat_tiling unit test, which now
no longer exactly makes sense. It tests that repeat_x does that
clamp, but that's now done automatically outside that stage.
Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9
Reviewed-on: https://skia-review.googlesource.com/50260
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I2e24c990983ea93cbd7983c9c4e88120c2b7f358
Reviewed-on: https://skia-review.googlesource.com/49768
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Update a unit test to stop using out-of-range "colors".
Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow
Reviewed-on: https://skia-review.googlesource.com/47580
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- load_565 allows 565-src sprite blits
- scale_565 / lerp_565 allow subpixel text
- luminance_to_alpha is a color filter, and lets us write grey 8
And update CachedDecodingPixelRefTest with a yet more robust color.
Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526
Reviewed-on: https://skia-review.googlesource.com/47021
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30.
Reason for revert:
../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short'
Excellent new bot!
Original change's description:
> Bump stored lowp uniform color to 16-bit storage.
>
> This makes loading into 16-bit channels more natural in _lowp.cpp.
>
> Change-Id: I1ed393873654060ef52f4632d670465528006bbd
> Reviewed-on: https://skia-review.googlesource.com/47261
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,reed@google.com
Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/47540
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
| |
Bug: chromium:762167
Change-Id: Ia23f6dbfc0466aef4ca9d1a5b9ff343d79dc83bb
Reviewed-on: https://skia-review.googlesource.com/47460
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|