| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
|
|
|
|
|
|
|
|
|
|
|
|
| |
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2147763002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast
Original issue's description:
> Expand _01 half<->float limitation to _finite. Simplify.
>
> It's become clear we need to sometimes deal with values <0 or >1.
> I'm not yet convinced we care about NaN or +-inf.
>
> We had some fairly clever tricks and optimizations here for NEON
> and SSE. I've thrown them out in favor of a single implementation.
> If we find the specializations mattered, we can certainly figure out
> how to extend them to this new range/domain.
>
> This happens to add a vectorized float -> half for ARMv7, which was
> missing from the _01 version. (The SSE strategy was not portable to
> platforms that flush denorm floats to zero.)
>
> I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2151023003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145663003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-uploading to see if I can get a CL number < 2^31.
patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001)
Already reviewed at the other crrev link.
TBR=
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2144573004
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130183003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2130183003
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2134753006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:140001 of https://codereview.chromium.org/2133413002/ )
Reason for revert:
Breaking the roll...
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio
Original issue's description:
> try to speed-up maprect + round2i + contains
>
> We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5
TBR=mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2136343002
|
|
|
|
|
|
|
|
|
|
| |
We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2133413002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.
We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.
Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.
This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2138073002
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors opt code to handle arbitrary src and dst
gammas that are specified by tables.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2130013002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes them a little easier to use outside SkColorXform code.
I've added some notes about how best to use them and their eccentricities, and added a test.
Ultimately any software sRGB <-> linear conversion should funnel somehow through here.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c
Review-Url: https://codereview.chromium.org/2128893002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(patchset #5 id:80001 of https://codereview.chromium.org/2128893002/ )
Reason for revert:
Monotonicity assert is failing on ARM. (Different rsqrt() and invert() precision?) Will investigate a bit tomorrow... might reland with the test TODO.
Original issue's description:
> Move sRGB <-> linear conversion components to their own files.
>
> This makes them a little easier to use outside SkColorXform code.
>
> I've added some notes about how best to use them and their eccentricities, and added a test.
>
> Ultimately any software sRGB <-> linear conversion should funnel somehow through here.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c
TBR=reed@google.com,msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2131793002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes them a little easier to use outside SkColorXform code.
I've added some notes about how best to use them and their eccentricities, and added a test.
Ultimately any software sRGB <-> linear conversion should funnel somehow through here.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2128893002
|
|
|
|
|
|
|
| |
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2087343002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2087343002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that baselines have changed a little since I
recently started using clang.
201295.jpg on HP z620 (300x280)
Skia Xform sRGB Dst Before 0.378 ms
Skia Xform sRGB Dst After 0.322 ms
1.17x
Skia Xform 2.2 Dst Before 0.428 ms
Skia Xform 2.2 Dst After 0.395 ms
1.08x
QCMS Xform 0.418 ms
sRGB Dst vs QCMS 1.30x
2.2 Dst vs QCMS 1.06x
--------------------------------------------
Nexus 6P:
Skia Xform sRGB Dst Before 1.58 ms
Skia Xform sRGB Dst After 1.43 ms
Skia Xform 2.2 Dst Before 2.69 ms
Skia Xform 2.2 Dst After 2.62 ms
Dell Venue 8:
Skia Xform sRGB Dst Before 2.78 ms
Skia Xform sRGB Dst After 2.74 ms
Skia Xform 2.2 Dst Before 3.73 ms
Skia Xform 2.2 Dst After 3.64 ms
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2081933005
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2081933005
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2084673002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2084673002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
201295.jpg on HP z620 (300x280)
QCMS Xform 0.418 ms
Skia NEW Xform 0.378 ms
Vs QCMS 1.11x
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2078623002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're somehow receiving non-premultiplied src inputs like 0x00ffffff to this
SrcOver blend. That's a bug I intend to follow up on. But for a quick
compatibility fix, go back to treating values like 0x00ffffff as transparent,
like we used to before crrev.com/1820313002.
This will not affect the correctness of code paths using properly premultiplied colors.
This should not change performance in any meaningful way.
The SIMD code paths (handling strides of 16 pixels at a time) happen to treat
invalid colors like 0x00fffff as transparent already.
BUG=chromium:611002
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2075173002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2075173002
|
|
|
|
|
|
|
|
|
|
| |
I have tested that this compiles.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2078913003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was running on our bots but never in Chrome.
That's a bad state to be in.
My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly
for perfect correctness but also for speed, then to spread what I learned there
to SSE2, AVX+, and NEON.
I have since lost interest in changing any aspect of how our legacy 8888 blits
work. There's not much point in making them a bit or two more correct when the
math is fundamentally wrong.
This will cause many diffs in Gold, none perceptible.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/6e472093009bf2fc4a8e53010b51040efcb71213
Review-Url: https://codereview.chromium.org/2062853002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
QCMS Xform 0.495 ms
Skia Old Xform 0.235 ms
Skia NEW Xform 0.423 ms
Vs Old Code 0.56x
Vs QCMS 1.17x
So to summarize, we are now much slower than before,
but still a bit faster than QCMS. And now we are also
far more accurate than QCMS :).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2060823003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/2062853002/ )
Reason for revert:
Breaks a couple Google3 goldens. I need to rebaseline google3 with -DSK_SUPPORT_LEGACY_X86_BLITS first, then reland this.
Original issue's description:
> Clean up two unlaunched SSE 4.1 8888 blits.
>
> This code was running on our bots but never in Chrome.
> That's a bad state to be in.
>
> My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly
> for perfect correctness but also for speed, then to spread what I learned there
> to SSE2, AVX+, and NEON.
>
> I have since lost interest in changing any aspect of how our legacy 8888 blits
> work. There's not much point in making them a bit or two more correct when the
> math is fundamentally wrong.
>
> This will cause many diffs in Gold, none perceptible.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/6e472093009bf2fc4a8e53010b51040efcb71213
TBR=reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2066453003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code was running on our bots but never in Chrome.
That's a bad state to be in.
My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly
for perfect correctness but also for speed, then to spread what I learned there
to SSE2, AVX+, and NEON.
I have since lost interest in changing any aspect of how our legacy 8888 blits
work. There's not much point in making them a bit or two more correct when the
math is fundamentally wrong.
This will cause many diffs in Gold, none perceptible.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2062853002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#2 id:20001 of https://codereview.chromium.org/2045633002/ )
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
addition, I found a stall on an execution unit for the lea instruction and rearranged to code to avoid that.
Before
1,362.01 LinearSrcOvericonstrip.pngVSkOptsSSE41
2,132.54 LinearSrcOvericonstrip.pngVSkOptsDefault
1,717.77 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
3,525.14 LinearSrcOvericonstrip.pngVSkOptsTrivial
11,181.78 LinearSrcOvericonstrip.pngVSkOptsBruteForce
644.77 LinearSrcOvermandrill_512.pngVSkOptsSSE41
682.51 LinearSrcOvermandrill_512.pngVSkOptsDefault
1,169.65 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
2,486.45 LinearSrcOvermandrill_512.pngVSkOptsTrivial
11,635.94 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
217.76 LinearSrcOverplane.pngVSkOptsSSE41
437.09 LinearSrcOverplane.pngVSkOptsDefault
275.91 LinearSrcOverplane.pngVSkOptsNonSimdCore
481.70 LinearSrcOverplane.pngVSkOptsTrivial
1,504.66 LinearSrcOverplane.pngVSkOptsBruteForce
323.90 LinearSrcOverbaby_tux.pngVSkOptsSSE41
497.49 LinearSrcOverbaby_tux.pngVSkOptsDefault
456.08 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
786.46 LinearSrcOverbaby_tux.pngVSkOptsTrivial
2,554.65 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
484.83 LinearSrcOveryellow_rose.pngVSkOptsSSE41
821.86 LinearSrcOveryellow_rose.pngVSkOptsDefault
655.37 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
1,323.80 LinearSrcOveryellow_rose.pngVSkOptsTrivial
5,802.61 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
After changes to sse2 and sse4.1
1,343.12 LinearSrcOvericonstrip.pngVSkOptsSSE41
1,441.17 LinearSrcOvericonstrip.pngVSkOptsDefault
1,679.97 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
3,481.05 LinearSrcOvericonstrip.pngVSkOptsTrivial
10,979.99 LinearSrcOvericonstrip.pngVSkOptsBruteForce
574.17 LinearSrcOvermandrill_512.pngVSkOptsSSE41
641.40 LinearSrcOvermandrill_512.pngVSkOptsDefault
1,169.44 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
2,359.84 LinearSrcOvermandrill_512.pngVSkOptsTrivial
12,106.02 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
209.95 LinearSrcOverplane.pngVSkOptsSSE41
249.12 LinearSrcOverplane.pngVSkOptsDefault
270.36 LinearSrcOverplane.pngVSkOptsNonSimdCore
466.30 LinearSrcOverplane.pngVSkOptsTrivial
1,431.14 LinearSrcOverplane.pngVSkOptsBruteForce
309.70 LinearSrcOverbaby_tux.pngVSkOptsSSE41
354.86 LinearSrcOverbaby_tux.pngVSkOptsDefault
442.69 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
764.12 LinearSrcOverbaby_tux.pngVSkOptsTrivial
2,756.16 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
457.70 LinearSrcOveryellow_rose.pngVSkOptsSSE41
500.50 LinearSrcOveryellow_rose.pngVSkOptsDefault
677.84 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
1,301.50 LinearSrcOveryellow_rose.pngVSkOptsTrivial
5,786.40 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1998373002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1998373002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41
2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault
1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial
9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce
633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41
684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault
1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial
10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
209.14 LinearSrcOverplane.pngVSkOptsSSE41
562.24 LinearSrcOverplane.pngVSkOptsDefault
272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore
436.46 LinearSrcOverplane.pngVSkOptsTrivial
1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce
318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41
529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault
441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial
2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41
1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault
668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial
4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a
Committed: https://skia.googlesource.com/skia/+/bc927548db17accec2195af6e15053f7918bb3f5
Review-Url: https://codereview.chromium.org/1939513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1939513002/ )
Reason for revert:
broke some debug bots:
Running LinearSrcOvericonstrip.pngVSkOptsSSE41 nonrendering
../../../bench/SkBlend_optsBench.cpp:118: fatal error: ""fPixmap.colorType() == kRGBA_8888_SkColorType""
Original issue's description:
> Add tests and benches to support the sRGB blitter for SkOpts
>
> 1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41
> 2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault
> 1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
> 3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial
> 9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce
>
> 633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41
> 684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault
> 1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
> 2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial
> 10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
>
> 209.14 LinearSrcOverplane.pngVSkOptsSSE41
> 562.24 LinearSrcOverplane.pngVSkOptsDefault
> 272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore
> 436.46 LinearSrcOverplane.pngVSkOptsTrivial
> 1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce
>
> 318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41
> 529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault
> 441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
> 720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial
> 2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
>
> 479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41
> 1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault
> 668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
> 1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial
> 4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
>
>
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a
>
> Committed: https://skia.googlesource.com/skia/+/bc927548db17accec2195af6e15053f7918bb3f5
TBR=mtklein@google.com,fmalita@chromium.org,herb@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/1986763002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41
2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault
1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial
9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce
633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41
684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault
1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial
10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
209.14 LinearSrcOverplane.pngVSkOptsSSE41
562.24 LinearSrcOverplane.pngVSkOptsDefault
272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore
436.46 LinearSrcOverplane.pngVSkOptsTrivial
1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce
318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41
529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault
441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial
2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41
1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault
668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial
4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a
Review-Url: https://codereview.chromium.org/1939513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 554784cd85029c05d9ed04b1aeb71520d196153a and
1956b4ae1c9a47833b174f31c054d347ea04db09
Reason for revert - ASAN failures, e.g. from https://uberchromegw.corp.google.com/i/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-MSAN/builds/2233/steps/perf_skia%20on%20Ubuntu/logs/stdio :
Uninitialized value was created by a heap allocation
0 0x7f69aa96f799 in operator new[](unsigned long) /b/work/skia/third_party/externals/llvm/out/../projects/compiler-rt/lib/msan/msan_new_delete.cc:37
1 0x7f69aaa315c1 in SkAutoTArray<unsigned int>::reset(int) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../include/private/../private/SkTemplates.h:137:22
2 0x7f69aaa34ee9 in LinearSrcOverBench<SrcOverVSkOptsSSE41>::LinearSrcOverBench(char const*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:108:9
3 0x7f69aaa30cf2 in $_24::operator()(void*) const /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
4 0x7f69aaa30c87 in $_24::__invoke(void*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
5 0x7f69aaa68856 in BenchmarkStream::rawNext() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:653:32
6 0x7f69aaa61467 in BenchmarkStream::next() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:642:25
7 0x7f69aaa5b703 in nanobench_main() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1119:27
8 0x7f69aaa5e10d in main /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1290:12
9 0x7f69a8c95ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
TBR=herb@google.com
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1969803002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1969803002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41
2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault
1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial
9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce
633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41
684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault
1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial
10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
209.14 LinearSrcOverplane.pngVSkOptsSSE41
562.24 LinearSrcOverplane.pngVSkOptsDefault
272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore
436.46 LinearSrcOverplane.pngVSkOptsTrivial
1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce
318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41
529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault
441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial
2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41
1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault
668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial
4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1939513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's always nice to kill off a synchronization primitive.
And while less terse, I think this new code reads more clearly.
... and, SkOncePtr's tests were the only thing now using sk_num_cores()
outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02
Review-Url: https://codereview.chromium.org/1953533002
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1952953004
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1952953004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1949313003/ )
Reason for revert:
ok, I have no idea what's going on
Original issue's description:
> Reland of SkOncePtr -> SkOnce (patchset #1 id:1 of https://codereview.chromium.org/1945293004/ )
>
> Reason for revert:
> previous revert unneeded (I think)
>
> Original issue's description:
> > Revert of SkOncePtr -> SkOnce (patchset #4 id:60001 of https://codereview.chromium.org/1953533002/ )
> >
> > Reason for revert:
> > broken the Mac and Linux builders, e.g.:
> >
> > https://build.chromium.org/p/chromium/builders/Mac/builds/15151
> > https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052
> >
> > Original issue's description:
> > > SkOncePtr -> SkOnce
> > >
> > > It's always nice to kill off a synchronization primitive.
> > > And while less terse, I think this new code reads more clearly.
> > >
> > > ... and, SkOncePtr's tests were the only thing now using sk_num_cores()
> > > outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
> > >
> > > BUG=skia:
> > > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
> > > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
> > >
> > > Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02
> >
> > TBR=herb@google.com,mtklein@chromium.org
> > # Skipping CQ checks because original CL landed less than 1 days ago.
> > NOPRESUBMIT=true
> > NOTREECHECKS=true
> > NOTRY=true
> > BUG=skia:
> >
> > Committed: https://skia.googlesource.com/skia/+/7eb33da7eede34c050b865dbb1b60c3dcea7191b
>
> TBR=herb@google.com,mtklein@chromium.org
> # Skipping CQ checks because original CL landed less than 1 days ago.
> NOPRESUBMIT=true
> NOTREECHECKS=true
> NOTRY=true
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/82da9a8aa0ce648f36882830765b42e0ada6c0fa
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/1948313002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1945293004/ )
Reason for revert:
previous revert unneeded (I think)
Original issue's description:
> Revert of SkOncePtr -> SkOnce (patchset #4 id:60001 of https://codereview.chromium.org/1953533002/ )
>
> Reason for revert:
> broken the Mac and Linux builders, e.g.:
>
> https://build.chromium.org/p/chromium/builders/Mac/builds/15151
> https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052
>
> Original issue's description:
> > SkOncePtr -> SkOnce
> >
> > It's always nice to kill off a synchronization primitive.
> > And while less terse, I think this new code reads more clearly.
> >
> > ... and, SkOncePtr's tests were the only thing now using sk_num_cores()
> > outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
> >
> > BUG=skia:
> > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
> > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
> >
> > Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02
>
> TBR=herb@google.com,mtklein@chromium.org
> # Skipping CQ checks because original CL landed less than 1 days ago.
> NOPRESUBMIT=true
> NOTREECHECKS=true
> NOTRY=true
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/7eb33da7eede34c050b865dbb1b60c3dcea7191b
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/1949313003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1953533002/ )
Reason for revert:
broken the Mac and Linux builders, e.g.:
https://build.chromium.org/p/chromium/builders/Mac/builds/15151
https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052
Original issue's description:
> SkOncePtr -> SkOnce
>
> It's always nice to kill off a synchronization primitive.
> And while less terse, I think this new code reads more clearly.
>
> ... and, SkOncePtr's tests were the only thing now using sk_num_cores()
> outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/1945293004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's always nice to kill off a synchronization primitive.
And while less terse, I think this new code reads more clearly.
... and, SkOncePtr's tests were the only thing now using sk_num_cores()
outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1953533002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Herb's really starting to get serious about tweaking this, which becomes
a lot easier when you've got SkOpts' runtime CPU detection. We should be
able to optimize this usefully for SSSE3, SSE4.1, AVX, AVX2, or NEON.
(We can of course implement a subset.)
This function takes two counts to give us flexibility to write src patterns:
nsrc >= ndst -> the usual srcover function
nsrc < ndst -> repeat src until it fills dst
nsrc << ndst -> possibly preprocess src into registers
nsrc == 1 -> equivalent of blitrow_color32, srcover_1, etc.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939783003
Review-Url: https://codereview.chromium.org/1939783003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663
Review URL: https://codereview.chromium.org/1891513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Review URL: https://codereview.chromium.org/1890483002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of https://codereview.chromium.org/1890483002/ )
Reason for revert:
many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.
Original issue's description:
> Move CPU feature detection to its own file.
>
> - Moves CPU feature detection to its own file.
> - Cleans up some redundant feature detection scattered around core/ and opts/.
> - Can now detect a few new CPU features:
> * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
> * FMA -> Intel FMA instructions, added at the same time as AVX2
> * VFP_FP16 -> ARM f16<->f32 instructions, quite common
> * NEON_FMA -> ARM FMA instructions, also quite common
> * SSE and SSE3... why not?
>
> This new internal API makes it very cheap to do fine-grained runtime CPU
> feature detection. Redundant calls to SkCpu::Supports() should be eliminated
> and it's hoistable out of loops. It compiles away entirely when we have the
> appropriate instructions available at compile time.
>
> This means we can call it to guard even a little snippet of 1 or 2 instructions
> right where needed and let inlining hoist the check (if any at all) up to
> somewhere that doesn't hurt performance. I've explained how I made this work
> in the private section of the new header.
>
> Once this lands and bakes a bit, I'll start following up with CLs to use it more
> and to add a bunch of those little 1-2 instruction snippets we've been wanting,
> e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
> (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1892643003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1891513002/ )
Reason for revert:
this depends on a CL I want to revert
Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
>
> Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1897433002
|