| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 339133f82c30cd3080672db28e6f72c894cba05a
Original change's description:
> start cleaning up non-skcms SkColorSpaceXforms
>
> I think this gets rid of
> - SkColorSpaceXform_Base
> - SkColorSpaceXform_XYZ
> - SkColorSpaceXform_A2B
> and lots of support code. Might be more left to clean up?
>
> Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a
> Reviewed-on: https://skia-review.googlesource.com/129512
> Commit-Queue: Brian Osman <brianosman@google.com>
> Auto-Submit: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Brian Osman <brianosman@google.com>
Change-Id: I33ee0d8bcfd72c401823a2e7d5168c9ecc9a5181
Reviewed-on: https://skia-review.googlesource.com/129624
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 339133f82c30cd3080672db28e6f72c894cba05a.
Reason for revert: broke NinePatchDrawableTest.testGetPadding? stranger things have happened.
Original change's description:
> start cleaning up non-skcms SkColorSpaceXforms
>
> I think this gets rid of
> - SkColorSpaceXform_Base
> - SkColorSpaceXform_XYZ
> - SkColorSpaceXform_A2B
> and lots of support code. Might be more left to clean up?
>
> Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a
> Reviewed-on: https://skia-review.googlesource.com/129512
> Commit-Queue: Brian Osman <brianosman@google.com>
> Auto-Submit: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Brian Osman <brianosman@google.com>
TBR=mtklein@chromium.org,brianosman@google.com
Change-Id: I9e76195481b8658b34936aeece278d81c286c0fa
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/129680
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think this gets rid of
- SkColorSpaceXform_Base
- SkColorSpaceXform_XYZ
- SkColorSpaceXform_A2B
and lots of support code. Might be more left to clean up?
Change-Id: I560d974d1e879dfd6a63ee2244a3dd88bd495c8a
Reviewed-on: https://skia-review.googlesource.com/129512
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44
This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not
by defined(__clang__). The two are usually the same, except when we opt
Clang builds into JUMPER_IS_SCALAR artificially.
Some Google3 builds use compilers old enough that they barf when
compiling our NEON code. It's conceivably also possible to define
JUMPER_IS_SCALAR yourself, but I don't think anyone does that.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d
Reviewed-on: https://skia-review.googlesource.com/112744
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44.
Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close.
Original change's description:
> Reland "make SkJumper stages normal Skia code"
>
> This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
>
> Now with fixed #include paths in SkRasterPipeline_opts.h,
> and -ffp-contract=fast for the :hsw target to minimize
> diffs on non-Windows Clang AVX2/AVX-512 bots.
>
> Original change's description:
> > make SkJumper stages normal Skia code
> >
> > Enough clients are using Clang now that we can say, use Clang to build
> > if you want these software pipeline stages to go fast.
> >
> > This lets us drop the offline build aspect of SkJumper stages, instead
> > building as part of Skia using the SkOpts framework.
> >
> > I think everything should work, except I've (temporarily) removed
> > AVX-512 support. I will put this back in a follow up.
> >
> > I have had to drop Windows down to __vectorcall and our narrower
> > stage calling convention that keeps the d-registers on the stack.
> > I tried forcing sysv_abi, but that crashed Clang. :/
> >
> > Added a TODO to up the same narrower stage calling convention
> > for lowp stages... we just *don't* today, for no good reason.
> >
> > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> > Reviewed-on: https://skia-review.googlesource.com/110641
> > Commit-Queue: Mike Klein <mtklein@chromium.org>
> > Reviewed-by: Herb Derby <herb@google.com>
> > Reviewed-by: Florin Malita <fmalita@chromium.org>
>
> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
> Reviewed-on: https://skia-review.googlesource.com/112742
> Reviewed-by: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112743
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4
Now with fixed #include paths in SkRasterPipeline_opts.h,
and -ffp-contract=fast for the :hsw target to minimize
diffs on non-Windows Clang AVX2/AVX-512 bots.
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
Change-Id: I44f2c03d33958e3807747e40904b6351957dd448
Reviewed-on: https://skia-review.googlesource.com/112742
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4.
Reason for revert: wrong include path :/
Original change's description:
> make SkJumper stages normal Skia code
>
> Enough clients are using Clang now that we can say, use Clang to build
> if you want these software pipeline stages to go fast.
>
> This lets us drop the offline build aspect of SkJumper stages, instead
> building as part of Skia using the SkOpts framework.
>
> I think everything should work, except I've (temporarily) removed
> AVX-512 support. I will put this back in a follow up.
>
> I have had to drop Windows down to __vectorcall and our narrower
> stage calling convention that keeps the d-registers on the stack.
> I tried forcing sysv_abi, but that crashed Clang. :/
>
> Added a TODO to up the same narrower stage calling convention
> for lowp stages... we just *don't* today, for no good reason.
>
> Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
> Reviewed-on: https://skia-review.googlesource.com/110641
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> Reviewed-by: Herb Derby <herb@google.com>
> Reviewed-by: Florin Malita <fmalita@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org
Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/112741
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enough clients are using Clang now that we can say, use Clang to build
if you want these software pipeline stages to go fast.
This lets us drop the offline build aspect of SkJumper stages, instead
building as part of Skia using the SkOpts framework.
I think everything should work, except I've (temporarily) removed
AVX-512 support. I will put this back in a follow up.
I have had to drop Windows down to __vectorcall and our narrower
stage calling convention that keeps the d-registers on the stack.
I tried forcing sysv_abi, but that crashed Clang. :/
Added a TODO to up the same narrower stage calling convention
for lowp stages... we just *don't* today, for no good reason.
Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383
Reviewed-on: https://skia-review.googlesource.com/110641
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove SkColorSpaceXform_base from the inheritance chain. Move some
enums only used within XYZ to be class scoped. Eliminate redundant
context structs, and just use the SkJumper types directly.
SkColorSpaceXform_Base still exists, but just to hold a couple static
functions. Trying to untangle the dst gamma table mess next.
Bug: skia:
Change-Id: I6d2b7807c33e61a0d7df74e334356567d8a2c0e0
Reviewed-on: https://skia-review.googlesource.com/112601
Commit-Queue: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plenty more to follow-up:
- gradients
- gpu impl
Bug: skia:7638
Change-Id: I8e54fd0e24921f040f178c793b36c7fb855b136e
Reviewed-on: https://skia-review.googlesource.com/107420
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Bug: skia:7459
Change-Id: Iccc2588f80e22b13ed5d23656b8c75d7b7058a36
Reviewed-on: https://skia-review.googlesource.com/92700
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Yuqian Li <liyuqian@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The updated algorithm matches our new GPU algorithm
(https://skia.org/dev/design/conical) and it brings
about 7%-26% speedup. In the next CL, I'll simplify
the GPU code by reusing the CPU code in this CL.
7.20% faster in gradient_conical_clamp_hicolor
8.94% faster in gradient_conicalZero_clamp_hicolor
10.00% faster in gradient_conicalOut_clamp_hicolor
11.72% faster in gradient_conicalOutZero_clamp_hicolor
13.62% faster in gradient_conical_clamp_3color
16.52% faster in gradient_conicalZero_clamp_3color
17.48% faster in gradient_conical_clamp
17.70% faster in gradient_conical_clamp_shallow
20.60% faster in gradient_conicalOut_clamp_3color
20.98% faster in gradient_conicalOutZero_clamp_3color
21.79% faster in gradient_conicalZero_clamp
22.48% faster in gradient_conicalOut_clamp
26.13% faster in gradient_conicalOutZero_clamp
Bug: skia:
Change-Id: Ia159495e1c77658cb28e48c9edf84938464e501c
Reviewed-on: https://skia-review.googlesource.com/90262
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
This time, rewrite the loop over sample points to be a little
friendlier to 32-bit x86 code generation. The previous version
created an object file indirection feature build_stages.py can't handle.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android,Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Change-Id: I150b6af4a5b89e009dc04ca69e1857892e173deb
Reviewed-on: https://skia-review.googlesource.com/89180
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8a64e52a98d178be13fd137b3b3a3c6aff457d85.
Reason for revert:
Test-Android-Clang-NexusPlayer-CPU-Moorefield-x86-Release-All-Android
Test-Android-Clang-NexusPlayer-GPU-PowerVR-x86-Release-All-Android
Original change's description:
> attempt 2: add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
> Reviewed-on: https://skia-review.googlesource.com/86860
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,liyuqian@google.com
Change-Id: I34409a7b4aee4fd54baee44f7fc53bd0982500fe
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/86601
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
Reviewed-on: https://skia-review.googlesource.com/86860
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VFPv4 gives us two interesting features:
- FMA
- f16<->f32 conversions
Even without FMAs, NEON still has non-fused MLA instructions. We don't
really care about the fusedness of those mul-adds, so losing FMA here is
kind of no big deal.
We already maintain portable code to do f16<->f32 conversions, so it's
not much of a maintanence hit to use that instead of the native
instructions. To my knowledge software F16 rendering is not a
performance critical mode of operation for any of our users.
This drops our minimum requirement to basically just having NEON.
Devices like the Nexus 7 2012 will now take SkJumper fast paths
instead of portable code. (Though actually, we've only ever
required NEON for _lowp... only the float code also needed vfpv4).
The main file to look at here is actually SkJumper_vectors.h,
where you will see all the substantive changes. The rest just
kind of tears down most of the old complexity, add adds ABI
to put just a little of it back. :)
Change-Id: Ia9237117698729c91e5fa51126baf80748093bf4
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/83521
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a7fa3377d24643d86117159f8a58d2ee66880a4d.
Reason for revert: lots of crashing GPU bots.
Original change's description:
> add experimental bilerp_clamp_8888 stage
>
> It looks like we can specialize hot image shaders into their
> own single stages for a good speedup on both x86 and ARM.
>
> I've started here with bilerp_clamp_8888, and will
> follow up with bgra and 565, and lowp versions of those,
> and probably also the same for nearest neighbors.
>
> All pixels are identical in GMs.
>
> Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
> Reviewed-on: https://skia-review.googlesource.com/82100
> Reviewed-by: Yuqian Li <liyuqian@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,mtklein@google.com,liyuqian@google.com
Change-Id: If70abb91b69bcd781e395dd3ac05ff1eebb1169f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/83340
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.
I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.
All pixels are identical in GMs.
Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
Reviewed-on: https://skia-review.googlesource.com/82100
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's properly 16 today because of HSW/lowp stages handling 16 pixels at
a time, but it hasn't yet had an effect on lowp so we didn't notice.
As we add lowp shader stages this will start to matter,
so might as well bump it up to 16 now.
(One day _skx lowp stages could bump this up to 32.)
Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a
Reviewed-on: https://skia-review.googlesource.com/60565
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All three image tile modes go through exclusive_clamp() and then a
gather today, so we can move the work of exclusive_clamp() into eac
gather_ stage, eliminating the need for clamp_{x,y} stages.
Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(),
which works out the pointer and vector of indices to load for gathers.
This deletes SkRasterPipeline_repeat_tiling unit test, which now
no longer exactly makes sense. It tests that repeat_x does that
clamp, but that's now done automatically outside that stage.
Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9
Reviewed-on: https://skia-review.googlesource.com/50260
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Update a unit test to stop using out-of-range "colors".
Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow
Reviewed-on: https://skia-review.googlesource.com/47580
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30.
Reason for revert:
../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short'
Excellent new bot!
Original change's description:
> Bump stored lowp uniform color to 16-bit storage.
>
> This makes loading into 16-bit channels more natural in _lowp.cpp.
>
> Change-Id: I1ed393873654060ef52f4632d670465528006bbd
> Reviewed-on: https://skia-review.googlesource.com/47261
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,reed@google.com
Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/47540
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Change-Id: I1ed393873654060ef52f4632d670465528006bbd
Reviewed-on: https://skia-review.googlesource.com/47261
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my
laptop. I'm feeling confident that we can make this perform well.
After looking at performance a bit more today, it looks like everything
is within what I'd consider comparable in performance, especially on
ARM. On x86-64 it looks like big bulk blits get a little slower and
small mask blits get a little faster.
Quality looks good, and maybe improved for 565.
There are fewer platform-specific differences now in _lowp, and I think
they're few enough now that we could even consider completing the
unification by folding the 8-bit and float code together. Rename
"div255()" to "rebias()", slap on a few coats of paint...
Guarded for Chrome with SK_JUMPER_LEGACY_LOWP.
Change-Id: I36309c07cf736f3cb31952cca66030ad56026318
Reviewed-on: https://skia-review.googlesource.com/45982
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only reason we were keeping SkJumper_constants around is that it was
hard to get float/integer iota vectors on arm64 without relocations.
Now that we're compiling arm64 normally as part of Skia, we don't have
to worry about relocations.
This means we can kill the struct and stop passing around that pointer.
Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce
Reviewed-on: https://skia-review.googlesource.com/40200
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't really use anything very ARMv8 specific in the 8-bit NEON
stages, so we can just naturally extend what we're doing to ARMv7 too.
Note that unlike the float stages, we're not requiring VFPv4 either,
just NEON. VFPv4 is for FMA and F16<->F32 conversion, both of which are
unnecessary for the integer pipeline.
GMs and perf improvement are similar to the previous ARMv8 change.
Change-Id: Id618801ea1920564c1deee144a640a4133c4505f
Reviewed-on: https://skia-review.googlesource.com/39840
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whether JUMPER is defined is starting to get a little overloaded:
- are we compiling offline (defined) or as part of Skia (!defined)?
- are we using Clang vector extensions (defined) or scalars (!defined)?
This splits JUMPER into these two separate concerns:
- JUMPER_IS_OFFLINE
- JUMPER_IS_SCALAR, JUMPER_IS_NEON, JUMPER_IS_AVX2, etc.
The upshot is that we'll now use Clang vector extensions when available
for our "portable" baseline. On x86-64 and ARMv8 compiled by Clang,
we're guaranteed to pick up SSE2 and NEON respectively. Our -Fast
bot should even get all the way to AVX2.
Another CL will do some refactoring in SkJumper to remove the redundant
copies of guaranteed vector code on x86-64 and ARMv8. I didn't want to
do that here yet to demonstrate that there is zero effect on the .S
files from this CL.
Change-Id: Ib5e8f00b35e8721b2cc7180e294840ffaf9dddce
Reviewed-on: https://skia-review.googlesource.com/39500
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I tried to follow exactly the same strategy as a start.
(Though I did fix the off-by-one dimensions.)
It does rather look like we only need 3D and 4D now
that I've looked at the call sites.
Looks like about a 20% speedup.
Change-Id: I8b1af64750ad1750716ee1ab0767e64591c7206a
Reviewed-on: https://skia-review.googlesource.com/32842
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
|
|
|
|
|
|
|
|
|
| |
This makes loading them much simpler in 8-bit mode.
Change-Id: I35ff34ebd0b93425c4e39e055bf4ade8cf8561e1
Reviewed-on: https://skia-review.googlesource.com/30621
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gather_i8 is now unused, so we can remove it.
That in turn makes the ctable field of SkJumper_GatherCtx unused.
After removing ctable, SkJumper_GatherCtx and SkJumper_PtrStride look
identical, so I've now fused them into SkJumper_MemoryCtx, which will
eventually be used by everything loading from, gathering from, or
storing to memory.
Change-Id: Ia882d2dbd54c9fcf9a8250a1ce83304389dd284a
Reviewed-on: https://skia-review.googlesource.com/24085
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
- Add run_2d(x,y,w,h) and start_pipeline_2d().
- Add and test a 2d-compatible store_8888_2d stage.
Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a
Reviewed-on: https://skia-review.googlesource.com/24401
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most interesting part of this is getting the call to start_pipeline
to work. From there it should be just like the other x86 backend.
The 32-bit calling conventions are the same across Linux/Mac and
Windows, so that's nice. The tricky bit is that Linux and Mac
align the stack to 16 bytes, while Windows only to 4. I think
this force_align_arg_pointer attribute on start_pipeline does the trick.
This needs a guard for layout tests.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86-Debug;master.tryserver.blink:win10_blink_rel,win7_blink_rel;master.tryserver.chromium.win:win_chromium_rel_ng
Change-Id: Ia74d22e5a4ce5483c9817b8a8f89dd21885bbd14
Reviewed-on: https://skia-review.googlesource.com/20968
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A couple of annoyances here:
1) the prev vector_scale stage is not usable for masking, as NaN values can propagate through
=> switch to actual masking
2) for the outside case, we must select the min root when the gradient is flipped
=> split into two templated stages (_min, _max)
(I'm not convinced that we need to flip the gradient for RP at all; we can investigate later)
Change-Id: I0283812d613a53124f2987d1aea1f26e4533655e
Reviewed-on: https://skia-review.googlesource.com/21162
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the focal point is on the edge of the end circle, the quadratic
equation devolves to linear. Add a stage to handle this case.
As a complication, this case can produce "degenerate" values:
1) t == NaN
2) R(t) < 0
For these, we're supposed to draw transparent black - which means
overwriting the color from the gradient stage. To support this, build
a 0/1 vector mask in the context, and apply it post-gradient-stage.
Change-Id: Ice4e3243abfd8c784bb810f6c310aed7a4ac7dc8
Reviewed-on: https://skia-review.googlesource.com/21111
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initial impl, for the well-behaved case (focal point inside).
MBP numbers -
Before:
3365.87 ! gradient_conical_clamp_shallow srgb
3590.88 ! gradient_conical_clamp_shallow_dither srgb
3376.91 ! gradient_conical_clamp_3color srgb
3351.64 ! gradient_conical_clamp_hicolor srgb
3379.35 ! gradient_conical_clamp srgb
After:
648.93 ! gradient_conical_clamp_shallow srgb
665.12 ! gradient_conical_clamp_shallow_dither srgb
773.98 ! gradient_conical_clamp_3color srgb
1175.35 ! gradient_conical_clamp_hicolor srgb
619.17 ! gradient_conical_clamp srgb
Change-Id: I07b22a758363e1f340a6041bca53bdef74229eb9
Reviewed-on: https://skia-review.googlesource.com/20906
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
Bug: skia:
Change-Id: I5f88e7923fe204faba8dc5d87454805a4d470d52
Reviewed-on: https://skia-review.googlesource.com/20688
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There'll still be a little more refactoring after this, but this is the
main thing we want to do.
This makes y available in a general-purpose register in pipeline stages,
just like x. Stages that need y (seed_shader and dither) can just use
it rather than pulling it off a context pointer. seed_shader loses its
context pointer, and dither's gets simpler.
Change-Id: Ic2d1e13b03fb45b73e308b38aafbb3a14c29cf7f
Reviewed-on: https://skia-review.googlesource.com/18383
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 5373609d90d8f84b75718b15f3522f9d2f4226cb.
Reason for revert: doesn't look like we'll need this.
Original change's description:
> add knob to turn off fancy SkJumper features
>
> This is a new public API for testing (layout tests).
>
> Change-Id: I10345231bad373c741b1e9656e546000538121b3
> Reviewed-on: https://skia-review.googlesource.com/17712
> Reviewed-by: Florin Malita <fmalita@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
>
Change-Id: Ieed2576d7fc06528384b7476508610e0e29b894f
Reviewed-on: https://skia-review.googlesource.com/17719
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
This is a new public API for testing (layout tests).
Change-Id: I10345231bad373c741b1e9656e546000538121b3
Reviewed-on: https://skia-review.googlesource.com/17712
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
Using the float iota was just an expedient to write the stage...
this CL adds the U32 iota that dither really wants.
Change-Id: I7990b10afd0c5277186b6b8e730245d291bcef0c
Reviewed-on: https://skia-review.googlesource.com/17441
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I17ac13b9d1ea6765e2c1a2b53aa6975eab408856
Reviewed-on: https://skia-review.googlesource.com/16713
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think we can dither generically as a pipeline stage.
I'm not married to where the dither happens, or the implementation,
which is mostly cribbed from
https://en.wikipedia.org/wiki/Ordered_dithering.
BUG=skia:3302,skia:6224
Change-Id: If7f6b22a523ca0b34cb03c0aa97b6734c34e0133
Reviewed-on: https://skia-review.googlesource.com/15161
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For whatever reason, if I swap the condition in the if_then_else tests
from < to >= and swap the then/else values, I can use constants in
hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it
has something to do with SSE, IEEE, and NaN, but I don't care enough to
speculate any more concretely.
This does that, removes C() and _f, updates some comments, and adds a
guard in build_stages.py to yell if it sees trouble like LCPI40_4...
This reminds me to try -ffast-math soon. I think that was mostly held
back by constants.
Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3
Reviewed-on: https://skia-review.googlesource.com/14942
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like the color-space images have this well tested (even without
lab_to_xyz) and the diffs look like rounding/FMA.
The old plan to keep loads and stores outside callback was:
1) awkward, with too many pointers and pointers to pointers to track
2) misguided... load and store stages march ahead by x,
working at ptr+0, ptr+8, ptr+16, etc. while callback
always wants to be working at the same spot in the buffer.
I spent a frustrating day in lldb to understood 2). :/
So now the stage always store4's its pixels to a buffer in the context
before the callback, and when the callback returns it load4's them back
from a pointer in the context, defaulting to that same buffer.
Instead of passing a void* into the callback, we pass the context
itself. This lets us subclass the context and add our own data...
C-compatible object-oriented programming.
Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43
Reviewed-on: https://skia-review.googlesource.com/13985
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've tried a couple of ideas for approx_powf():
1) accumulate integer powers of x, then 4th roots, then 16th roots
2) continue 1) all the way to 256th roots
3) decompose into pow2 and log2, exploiting IEEE float layout
4) slightly tune constants used in 3)
5) accumulate integer powers of x, then 3+4) with different tuning
6) follow a source online, basically 5 with finesse
7) a new source quoting and improving on the method in 6).
7) seems perfect, enough that maybe we can explore improving its speed
at cost of precision. Might be nice to get rid of those divides. If we
allow a small tolerance (2-5) in our tests, we could use the very simple
fast forms from 3) (e.g. PS 5). I wish I had some images to look at!
Anything involving roots seems to be subverted by poor rsqrt precision.
This change of course affects the pipelines created by the tests for
exponential and full parametric gamma curves. What's less obvious is
that it also means SkJumper can now for the first time run the pipeline
created by the mixed gamma curves test. This means we now need to relax
our tolerance for the table-based channel, just like we did when
implementing table_{r,g,b,a}.
This took me an embarassingly long time to figure out. *face palm*
Change-Id: I451ee3c970a0a4a4e285f8aa8f6ef709a654d247
Reviewed-on: https://skia-review.googlesource.com/13656
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In testing, it didn't really seem like we're getting anything out of
doing an interpolated lookup, so this just does a single rounded lookup.
Change-Id: If85ba68675945b442076519dd7f1bf7540d1628d
Reviewed-on: https://skia-review.googlesource.com/13646
Commit-Queue: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I738da1dac2ef5b74ef34cca14938c0b6cce2fe16
Reviewed-on: https://skia-review.googlesource.com/13596
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us temporarily escape to piece of code outside
SkRasterPipeline. We should be able to use this to replace
- parametric_{r,g,b,a}
- table_{r,g,b,a}
- color_lookup_table
- shader_adapter*
* We want to obsolete shader_adapter for other reasons anyway,
but we _could_ replace it with this if we want to.
Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e
Reviewed-on: https://skia-review.googlesource.com/12102
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This splits SkImageShaderContext into three parts:
- SkJumper_GatherCtx: always, already done
- SkJumper_SamplerCtx: when bilinear or bicubic
- MiscCtx: other little bits (the matrix, paint color, tiling limits)
Thanks for the snazzy allocator that allows this Herb!
Both SkJumper and SkRasterPipeline_opts.h should be speaking all the
same types now.
I've copied the comments about bilinear/bicubic to SkJumper with little
typo fixes and clarifications.
Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6
Reviewed-on: https://skia-review.googlesource.com/13269
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
SkJumper_GatherCtx is a prefix of SkImageShaderContext, so
this is a no-op. It helps to keep things straight, and I
do want to split apart the GatherCtx from a new SamplingCtx.
Change-Id: I9c5f436b096624c2809e1f810e9bcd6c6b00b883
Reviewed-on: https://skia-review.googlesource.com/13264
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|