| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
created new file src/core/SkColorData.h for
internal consumption. Note that many of the
functions there are unused as well.
Bug: skia: 6898
R: reed@google.com
Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d
Reviewed-on: https://skia-review.googlesource.com/46848
Commit-Queue: Cary Clark <caryclark@google.com>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When Skia's built with an interestingly advanced instruction set
baseline like SSSE3 or SSE4.1, we end up with two distinct copies of
some SkOpts functions, one default in SkOpts.o and one specialization
from SkOpts_{ssse3,sse41}.o. These functions are static, and so are
technically unrelated, even though they're the same code compiled with
the same instructions available. They're going to be identical.
What we want here is to remove static but mark them as inline instead.
In this case inline means "if the linker sees multiple copies of this,
that's cool, just pick any one arbitrarily". That's just what we want.
Now, when I disassemble a binary before and after this change, I do see
the redundant routines removed. However, the file size change is
minimal... I suspect that this must mean the linker has noticed that we
had identical code and physically folded the two logically independent
routines. I don't know how prevalent this optimization is, though, so
it doesn't hurt to give it more of a "one copy please" hint with inline.
There may also be a difference here between the binary size (~unchanged)
and the in-memory layout of that binary?
Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857
Reviewed-on: https://skia-review.googlesource.com/37501
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#2 id:20001 of https://codereview.chromium.org/2045633002/ )
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Bench Runtime
Nexus 6P 0.14x
Dell Venue 8 0.12x
CMYK Jpeg Decode Runtime
Nexus 6P 0.81x
Dell Venue 8 0.85x
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676773003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1676773003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Runtime (Dell Venue 8)
Unpremul 0.17x
Premul 0.20x
PNG Decode Runtime on GrayAlpha Encoded PNGs (Dell Venue 8)
Unpremul Regular 0.91x
Unpremul ZeroInit 0.92x
Premul Regular 0.84x
Premul ZeroInit 0.85x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1666853002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PNG Decode Time Nexus 6P (for a test set of GrayAlpha encoded PNGs)
Regular Unpremul 0.91x
Zero Init Unpremul 0.92x
Regular Premul 0.84x
Zero Init Premul 0.86x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1663623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1663623002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Bench Runtime
Dell Venue 8 0.16x
HP z620 0.47x
PNG Decode Time (for test set of gray encoded PNGs)
Dell Venue 8 0.80x
HP z620 0.96x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1657393002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1657393002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Bench Runtime
Nexus 6P 0.32x
Nexus 9 0.89x
PNG Decode Time (for test set of gray encoded PNGs)
Nexus 6P 0.88x
Nexus 9 0.91x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1656383002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1656383002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Bench Runtime
z620 0.21x
Dell Venue 8 0.26x
RGB PNGs Decode Runtime
z620 0.91x
Dell Venus 8 0.96x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618603003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1618603003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Swizzle Bench Runtime Nexus 6P
xxx_xxxa 0.32x
xxx_swaprb_xxxa 0.31x
Swizzle Bench Runtime Nexus 9
xxx_xxxa 1.11x
xxx_swaprb_xxxa 1.14x
(This is a slow down.)
Swizzle Bench Runtime Nexus 5
xxx_xxxa 0.12x
xxx_swaprb 0.12x
RGB PNG Decode Runtime
Nexus 6P 0.94x
Nexus 9 0.98x
I don't know how to explain the fact that the Swizzle Bench was
slower on Nexus 9, but the decode times got faster.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1618003002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Plant a flag to say "pretend all the inputs are RGBA".
This is how libpng thinks.
This is the opposite of what the implementation had been doing,
so I've rearranged everything to reflect the new orientation.
- Rewrite the names to be less mysterious looking. No more Xs.
- Make the src type uniformly const void*, to allow for 888 (RGB) srcs.
This should be performance and pixel neutral. (Please revert if it's not.)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1626463002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves deocde performance for RGBA pngs.
Swizzler Time on z620 (clang):
SwapPremul 0.24x
Premul 0.24x
Swap 0.37x
Decode Time on z620 (clang):
Premul ZeroInit Decodes 0.88x
Unpremul ZeroInit Decodes 0.94x
Premul Regular Decodes 0.91x
Unpremul Regular Decodes 0.98x
Swizzler Time in Dell Venue 8 (gcc):
SwapPremul 0.14x
Premul 0.14x
Swap 0.08x
Decode Time on Dell Venus 8 (gcc):
Premul ZeroInit Decodes 0.79x
Premul Regular Decodes 0.77x
Note:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1601883002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1601883002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast
options (with the exception of conversions to 565).
Swizzle Time for swap_rb
0.94x Nexus 9
0.81x Nexus 6P
Unpremul Decode Time for RGBA PNGs***
ZeroInit 0.93x Nexus 9
Regular 0.94x Nexus 9
ZeroInit 0.97x Nexus 6P
ZeroInit 0.95x Nexus 6P
***Two Notes:
The improvements here are actually due to taking advantage of
memcpy() (no need to swap, the bytes are already in the proper
order).
ZeroInit skips writing zeros to zero initialized memory. This
is a memory use opt in Android.
BMP decodes should also benefit from these improvements.
I am relying on Gold to help test all possible cases.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1581933006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1581933006
|
|
Improves decode performance for RGBA encoded PNGs.
Swizzle Time on Nexus 9 (with clang):
SwapPremul 0.44x
Premul 0.44x
Decode Time On Nexus 9 (with clang):
ZeroInit Decodes 0.85x
Regular Decodes 0.86x
Swizzle Time on Nexus 6P (with clang)
SwapPremul 0.14x
Premul 0.14x
Decode Time On Nexus 6P (with clang):
ZeroInit Decodes 0.93x
Regular Decodes 0.95x
Notes:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).
A profile on Nexus 9 shows that the premultiplication step of PNG
decoding is now ~5% of decode time (down from ~20%).
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577703006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1577703006
|