| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44
This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not
by defined(__clang__). The two are usually the same, except when we opt
Clang builds into JUMPER_IS_SCALAR artificially.
Some Google3 builds use compilers old enough that they barf when
compiling our NEON code. It's conceivably also possible to define
JUMPER_IS_SCALAR yourself, but I don't think anyone does that.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d
Reviewed-on: https://skia-review.googlesource.com/112744
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44.
Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112743
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
Now with fixed #include paths in SkRasterPipeline_opts.h,
and -ffp-contract=fast for the :hsw target to minimize
diffs on non-Windows Clang AVX2/AVX-512 bots.
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
Reviewed-on: https://skia-review.googlesource.com/112742
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4.
Reason for revert: wrong include path :/
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112741
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enough clients are using Clang now that we can say, use Clang to build
if you want these software pipeline stages to go fast.
This lets us drop the offline build aspect of SkJumper stages, instead
building as part of Skia using the SkOpts framework.
I think everything should work, except I've (temporarily) removed
AVX-512 support. I will put this back in a follow up.
I have had to drop Windows down to __vectorcall and our narrower
stage calling convention that keeps the d-registers on the stack.
I tried forcing sysv_abi, but that crashed Clang. :/
Added a TODO to up the same narrower stage calling convention
for lowp stages... we just *don't* today, for no good reason.
Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
Reviewed-on: https://skia-review.googlesource.com/110641
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
| |
Bug: skia:
Change-Id: If6481d202bf22a95f1dea0c5bf7d84698b63869a
Reviewed-on: https://skia-review.googlesource.com/109241
Commit-Queue: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plenty more to follow-up:
- gradients
- gpu impl
Bug: skia:7638
Change-Id: I8e54fd0e24921f040f178c793b36c7fb855b136e
Reviewed-on: https://skia-review.googlesource.com/107420
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Same sort of deal as before, now with all three new formats.
While I was at it, I made sure RGBA 8888 and BGRA 8888 both work too.
We don't want the 101010's in lowp, but 888x should be fine.
After looking at the DM images on monitors at work, I decided to
re-enable dither even on 10-bit images.
Looking at the GMs in 888x or 101010x is interesting... I think we must
not be clearing the memory allocated for layers? Seems like we want to
allocate layers as 8888?
Change-Id: I3a85b4f00877792a6425a7e7eb31eacb04ae9218
Reviewed-on: https://skia-review.googlesource.com/101640
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
dx and dy are already size_t, so no need to demote them to int,
and demoting to int gets dicey in terms of wrap-around.
Change-Id: I98eb31ef7aa35fa2c2aa5be27cdc0b4dc7dfd008
Reviewed-on: https://skia-review.googlesource.com/99500
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Bug: skia:7459
Change-Id: Iccc2588f80e22b13ed5d23656b8c75d7b7058a36
Reviewed-on: https://skia-review.googlesource.com/92700
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Yuqian Li <liyuqian@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The updated algorithm matches our new GPU algorithm
(https://skia.org/dev/design/conical) and it brings
about 7%-26% speedup. In the next CL, I'll simplify
the GPU code by reusing the CPU code in this CL.
7.20% faster in gradient_conical_clamp_hicolor
8.94% faster in gradient_conicalZero_clamp_hicolor
10.00% faster in gradient_conicalOut_clamp_hicolor
11.72% faster in gradient_conicalOutZero_clamp_hicolor
13.62% faster in gradient_conical_clamp_3color
16.52% faster in gradient_conicalZero_clamp_3color
17.48% faster in gradient_conical_clamp
17.70% faster in gradient_conical_clamp_shallow
20.60% faster in gradient_conicalOut_clamp_3color
20.98% faster in gradient_conicalOutZero_clamp_3color
21.79% faster in gradient_conicalZero_clamp
22.48% faster in gradient_conicalOut_clamp
26.13% faster in gradient_conicalOutZero_clamp
Bug: skia:
Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c
Reviewed-on: https://skia-review.googlesource.com/90262
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
This time, rewrite the loop over sample points to be a little
friendlier to 32-bit x86 code generation. The previous version
created an object file indirection feature build_stages.py can't handle.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb
Reviewed-on: https://skia-review.googlesource.com/89180
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85.
Reason for revert:
Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android
Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Original change's description:
> attempt 2: add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
> Reviewed-on: https://skia-review.googlesource.com/86860
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,liyuqian@google.com
Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/86601
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
Reviewed-on: https://skia-review.googlesource.com/86860
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trying to carefully manage the in-gamut / out-of-gamut state
of the pipeline, let's do what a GPU would do, clamping to representable
range in any float -> integer conversion.
Most effects doing table lookups now clamp themselves internally, and
the store_foo() methods clamp when the destination is fixed point. In
turn the from_srgb() conversions and all future transfer function stages
can care less about this stuff.
If I'm thinking right, the _lowp side of things need not change at all,
and that will soften the performance impact of this change. Anything
that was fast to begin with was probably running a _lowp pipeline.
Bug: skia:7419
Change-Id: Id2e080ac240a97b900a1ac131c85d9e15f70af32
Reviewed-on: https://skia-review.googlesource.com/85740
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The path involving approx_log2() and approx_pow2() does not produce 0.
And it's probably not a good idea to think about what approx_log2(0) is
anyway.
Change-Id: If5f48298c5bd5565ae808ebdfbd02649f4dd3046
Reviewed-on: https://skia-review.googlesource.com/85840
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some interesting things are starting to fall out already,
like the fact that I needed to add a gamma_dst stage to
be able to draw into gamma-transfer-fn destinations.
I've also had to pass an SkAlphaType through to the linearize
functions so that they can maintain premul invariants. I'm not
sure this is actually a good idea... if you can, please double-
check my logic at SkRasterPipeline.cpp:128?
If it's correct logic, I'm going to need to do it all over the place.
But I imagine you don't do this and somehow get away with it.
Change-Id: I42cd9b161b54287d674225103ad9e19f8b388959
Reviewed-on: https://skia-review.googlesource.com/84680
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here's the tiny performance gain:
$python tools/calmbench/calmbench.py firstsecond --extraarg "-m conic"
firstsecond (compared to master) is likely
4.23% faster in gradient_conicalOut_clamp_3color
4.23% faster in gradient_conicalOutZero_clamp_3color
4.79% faster in gradient_conical_clamp_shallow_dither
6.04% faster in gradient_conical_clamp_3color
6.04% faster in gradient_conicalZero_clamp_3color
6.42% faster in gradient_conicalOut_clamp
6.43% faster in gradient_conicalOutZero_clamp
6.74% faster in gradient_conical_clamp
6.98% faster in gradient_conical_clamp_shallow
6.98% faster in gradient_conicalZero_clamp
Bug: skia:
Change-Id: Id74866908b99753ed8b16a657d3f67c9255d0043
Reviewed-on: https://skia-review.googlesource.com/76561
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d.
Reason for revert: lots of crashing GPU bots.
Original change's description:
> add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
> Reviewed-on: https://skia-review.googlesource.com/82100
> Reviewed-by: Yuqian Li <liyuqian@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com
Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/83340
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
Reviewed-on: https://skia-review.googlesource.com/82100
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
705.32 -> 457.76 gradient_sweep_clamp_3color
609.38 -> 345.34 gradient_radial1_clamp_3color
Change-Id: I0165ac8f004ee095ada4f12b33db0a94ae39fca3
Reviewed-on: https://skia-review.googlesource.com/69902
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I15f83a72645fed0ed8dca9c9aad66c5db5eb247a
Reviewed-on: https://skia-review.googlesource.com/69920
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was originally going to add these to help test a lowp dither, but
after looking at diffs I don't think lowp dither is a good idea.
Non-dithered lowp gradients look fine to me so far.
I'd have done conics, but they scare me.
Change-Id: I8f5e75aec726983186214845ca38cfa0d54496b3
Reviewed-on: https://skia-review.googlesource.com/66460
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
The 4444 image in all_bitmap_configs now draws slightly different before
and after serialization. (It's serialized as 8888.) Still looks fine.
Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd
Reviewed-on: https://skia-review.googlesource.com/65960
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Move a couple stages around in the enum to places
that make more sense, and guass_a_to_rbga in the code too.
2) mirror the SkRasterPipeline stage enum with either:
LOWP(st): the stage is implemented in low precision
TODO(st): the stage should be lowp, but isn't
NOPE(st): the stage shouldn't be done in lowp.
3) statically enforce that all stages are covered by one of
LOWP, TODO, or NOPE.
Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767
Reviewed-on: https://skia-review.googlesource.com/63800
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really
doing them any good. Probably a good idea to cover both kN32 options
any time we specialize like this?
There's one small diff, so I've lazily guarded this by
SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway.
Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478
Reviewed-on: https://skia-review.googlesource.com/63301
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills out a couple more matrix and gather stages.
Deletes a not particularly important unit test that was using a
scale matrix in a weird, non-lowp compatible way.
This will require guards for Blink layout tests.
Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3
Reviewed-on: https://skia-review.googlesource.com/62520
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op refactor.
It's just always surprised me that the matrix_scale_translate
stage expects [tx ty sx sy], when scales precede the translates
in the names and in both normal row-major and column-major matrix
layouts.
This switches to [sx sy tx ty], scale then translate.
Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6
Reviewed-on: https://skia-review.googlesource.com/62541
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're going to want to assign types to the stages depending on their
inputs and outputs:
GG: x,y -> x,y
GP: x,y -> r,g,b,a
PP: r,g,b,a -> r,g,b,a
(There are a couple other degenerate cases here, where a stage ignores
its inputs or creates no outputs, but we can always just pretend their
null input or output is one type or the other arbitrarily.)
The GG stages will be pretty much entirely float code, and the GP stages
a mix of float math and byte stuff.
Since we've chosen U16 to match our register size in _lowp land,
we'll unpack each F register across two of those for transport between
stages. This is a notional, free operation in both directions.
Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715
Reviewed-on: https://skia-review.googlesource.com/60800
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
As this array grows longer it causes troublesome code generation
when we're compiling offline, but it's easy as an argument.
Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86
Reviewed-on: https://skia-review.googlesource.com/60564
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's properly 16 today because of HSW/lowp stages handling 16 pixels at
a time, but it hasn't yet had an effect on lowp so we didn't notice.
As we add lowp shader stages this will start to matter,
so might as well bump it up to 16 now.
(One day _skx lowp stages could bump this up to 32.)
Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a
Reviewed-on: https://skia-review.googlesource.com/60565
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are non-zero values of a that make infinite 1.0f/a.
Let's just check for the real thing we care about, that
scale is finite.
Bug: skia:7123
Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1
Reviewed-on: https://skia-review.googlesource.com/58460
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today gradient mirror and repeat don't explicitly clamp. They work fine for
normal float values, but blow up with inputs like infinity and NaN, and
those aren't hard to construct with a combination of a funky matrix and
some squaring for xy -> radius.
So explicitly clamp in each of the three matrix tilers.
This should fix the fuzz at the associated bug.
Bug: skia:7093
Change-Id: Idd44e3c7a1ed95e2b1ace8eb953b62eddeb4e00e
Reviewed-on: https://skia-review.googlesource.com/55702
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something I came up with while writing _lowp.cpp.
This should all be a logical no-op, but there are some code generation
changes. I'm not exactly sure why.
Change-Id: Iaad36b5298b37fe26ebd375a147a48852f98e1e4
Reviewed-on: https://skia-review.googlesource.com/52003
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowp start_pipeline() always zeros, and with floats we always zero
when compiled as part of Skia, so this just makes the offline float
consistent with the others.
It's getting confusing to think about which code zeros and which
doesn't, and it'd be nicer to be able to rely on zeros.
This should change code generation only to the start_pipelines in
the .S files.
Change-Id: I1178b83c01e609e40dc7912d8d56df8e36eb339d
Reviewed-on: https://skia-review.googlesource.com/52001
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We look at t to create a mask in mask_2pt_conical_degenerates to be
applied later to the colors after the normal gradient stages have run.
But if t itself is NaN, that will wreak havoc in the normal gradient
stages. So in addition to building the mask to kill off degenerate
colors, let's also set degenerate t to zero, which should be a safe
value.
This fixes the fuzz mentioned in this bug.
BUG=skia:7078
Change-Id: I8301450c707bdbf941abd0339959f9e60d46d955
Reviewed-on: https://skia-review.googlesource.com/52763
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All three image tile modes go through exclusive_clamp() and then a
gather today, so we can move the work of exclusive_clamp() into eac
gather_ stage, eliminating the need for clamp_{x,y} stages.
Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(),
which works out the pointer and vector of indices to load for gathers.
This deletes SkRasterPipeline_repeat_tiling unit test, which now
no longer exactly makes sense. It tests that repeat_x does that
clamp, but that's now done automatically outside that stage.
Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9
Reviewed-on: https://skia-review.googlesource.com/50260
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Update a unit test to stop using out-of-range "colors".
Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow
Reviewed-on: https://skia-review.googlesource.com/47580
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- load_565 allows 565-src sprite blits
- scale_565 / lerp_565 allow subpixel text
- luminance_to_alpha is a color filter, and lets us write grey 8
And update CachedDecodingPixelRefTest with a yet more robust color.
Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526
Reviewed-on: https://skia-review.googlesource.com/47021
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30.
Reason for revert:
../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short'
Excellent new bot!
Original change's description:
> Bump stored lowp uniform color to 16-bit storage.
>
> This makes loading into 16-bit channels more natural in _lowp.cpp.
>
> Change-Id: I1ed393873654060ef52f4632d670465528006bbd
> Reviewed-on: https://skia-review.googlesource.com/47261
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,reed@google.com
Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/47540
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Change-Id: I1ed393873654060ef52f4632d670465528006bbd
Reviewed-on: https://skia-review.googlesource.com/47261
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I4d4093fcfc839f6e7468b7d9f89bb903186ab68d
Reviewed-on: https://skia-review.googlesource.com/46761
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Guarding loads of 8-15 with defined(__AVX2__) should prevent errors
like these:
external/skia/src/jumper/SkJumper_stages_lowp.cpp:287:46: error:
'memcpy' called with size bigger than buffer
case 12: memcpy(&v, ptr, 12*sizeof(T)); break;
The loads of 8-15 were of course unreachable, given the &(N-1) == &7.
Change-Id: Ifcb5c177c6909e1df55cb564779a4d6610ff7b32
Reviewed-on: https://skia-review.googlesource.com/46521
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my
laptop. I'm feeling confident that we can make this perform well.
After looking at performance a bit more today, it looks like everything
is within what I'd consider comparable in performance, especially on
ARM. On x86-64 it looks like big bulk blits get a little slower and
small mask blits get a little faster.
Quality looks good, and maybe improved for 565.
There are fewer platform-specific differences now in _lowp, and I think
they're few enough now that we could even consider completing the
unification by folding the 8-bit and float code together. Rename
"div255()" to "rebias()", slap on a few coats of paint...
Guarded for Chrome with SK_JUMPER_LEGACY_LOWP.
Change-Id: I36309c07cf736f3cb31952cca66030ad56026318
Reviewed-on: https://skia-review.googlesource.com/45982
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To continue building stages, update Clang and update your GN args:
$ brew update
$ brew upgrade llvm
$ find . | grep args.gn | xargs sed -ie 's/clang-4.0/clang-5.0/g'
Some interesting codegen changes I noticed:
- ARMv7: generally better register assignment, tighter code
- ARMv7: dropped the 128-bit alignment hint when loading and storing dst "registers",
unclear why.
- HSW: now clearing the destination register before vgatherdps,
to break a dependency on the previous value
Change-Id: I4f804a4cbfcde530fad5ed535438174e852a9593
Reviewed-on: https://skia-review.googlesource.com/44241
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
Because floats are fun, the compiler cannot merge x + 0.5f +
[0,1,2,3,4...] into x + [0.5,1.5,2.5,3.5,4.5,...]. But we can.
Change-Id: I03b46c1ea0653877f35f6c888f29371b5f73d813
Reviewed-on: https://skia-review.googlesource.com/42480
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
approx 2.5x faster on arm64 for sprite 8888 --> 565 blits
Bug: skia:
Change-Id: I524f993fee16196385dc07cbec39ef378b1301e5
Reviewed-on: https://skia-review.googlesource.com/41162
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Shouldn't be anything tricky here.
Guarded by SK_JUMPER_LEGACY_X86_8BIT for (Win) layout tests.
Change-Id: I7580c7c18d1721f1301904c049ea2e59e9bda5d9
Reviewed-on: https://skia-review.googlesource.com/40692
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure why I wrote this to use a Params struct originally, but we
should have plenty of registers in _8bit to pass everything directly and
avoid the stack. Even once we enable the 8-bit pipeline on 32-bit x86,
we'll have 4 general purpose registers and 4 vector registers to use,
precisely what we're using here.
Change-Id: I3e51ab73186edcdcb8bfaa6cc99d9516db7c032a
Reviewed-on: https://skia-review.googlesource.com/40771
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only reason we were keeping SkJumper_constants around is that it was
hard to get float/integer iota vectors on arm64 without relocations.
Now that we're compiling arm64 normally as part of Skia, we don't have
to worry about relocations.
This means we can kill the struct and stop passing around that pointer.
Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce
Reviewed-on: https://skia-review.googlesource.com/40200
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|