| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every type now nominally has Load4() and Store4() methods.
The ones that we use are implemented.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd
Reviewed-on: https://skia-review.googlesource.com/3046
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
Review-Url: https://codereview.chromium.org/2339233003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:140001 of https://codereview.chromium.org/2339233003/ )
Reason for revert:
Hitting an assert
Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
> new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2347473007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2339233003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(1) Performance is better or stays the same.
(2) Code is split into functions (RasterPipeline-ish
design). IMO, it's not really more or less readable.
But I think it's now much easier add capabilities,
apply optimizations, or do more refactors. Or to
actually use RasterPipeline. I help back from trying
any of these to try to keep this CL sane.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2194303002
|
|
|
|
|
|
|
|
|
|
| |
This lets us get at logical >> in a nicely principled way.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2197683002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Should feel very similar to Sk4h_store4:
NEON uses its native instruction, SSE unpacks manually.
Since we'll have our F16s in 4 Sk4h by the time we're done here,
this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2184753002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast
Original issue's description:
> Expand _01 half<->float limitation to _finite. Simplify.
>
> It's become clear we need to sometimes deal with values <0 or >1.
> I'm not yet convinced we care about NaN or +-inf.
>
> We had some fairly clever tricks and optimizations here for NEON
> and SSE. I've thrown them out in favor of a single implementation.
> If we find the specializations mattered, we can certainly figure out
> how to extend them to this new range/domain.
>
> This happens to add a vectorized float -> half for ARMv7, which was
> missing from the _01 version. (The SSE strategy was not portable to
> platforms that flush denorm floats to zero.)
>
> I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2151023003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145663003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-uploading to see if I can get a CL number < 2^31.
patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001)
Already reviewed at the other crrev link.
TBR=
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2144573004
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2134753006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:140001 of https://codereview.chromium.org/2133413002/ )
Reason for revert:
Breaking the roll...
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio
Original issue's description:
> try to speed-up maprect + round2i + contains
>
> We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5
TBR=mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2136343002
|
|
|
|
|
|
|
|
|
|
| |
We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2133413002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the code paths where we make SkCpu::Supports() calls
from within a tight loop. It keeps code paths using SkCpu::Supports()
to choose entire routines from src/opts/.
We can't rely on these hyper-local checks to be hoisted up reliably enough.
It worked pretty well with the first couple platforms we tried (e.g. Clang
on Linux/Mac) but we can't gaurantee it works everywhere.
Further, I'm not able to actually do anything fancy with those tests
outside of x86... I've not found a way to get, say, NEON+F16 conversion
code embedded into ordinary NEON code outside writing then entire function
in external assembly.
This whole idea becomes less important now that we've got a way to chain
separate function calls together efficiently. We can now, e.g., use an
AVX+F16C method to load some pixels, then chain that into an ordinary AVX
method to color filter them.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2138073002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
201295.jpg on HP z620 (300x280)
QCMS Xform 0.418 ms
Skia NEW Xform 0.378 ms
Vs QCMS 1.11x
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2078623002
|
|
|
|
|
|
|
|
|
|
| |
I have tested that this compiles.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2078913003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#2 id:20001 of https://codereview.chromium.org/2045633002/ )
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663
Review URL: https://codereview.chromium.org/1891513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1891513002/ )
Reason for revert:
this depends on a CL I want to revert
Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
>
> Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1897433002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
Review URL: https://codereview.chromium.org/1891513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1891513002/ )
Reason for revert:
Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC.
Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1891993002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1891513002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- rearrange a bit
- fewer macros
- hooks for all operators
- add left and right scalar operator overrides
- add +=, &=, <<=, etc.
- add SkNx_split() and SkNx_join()
- simplify the many rsqrt() and invert() options to just what we actually use
This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them. It seems nice that this is an error rather than silently falling back to serial code.
It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()). Time will tell?
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1812233003
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1746153002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1746153002
|
|
|
|
|
|
|
|
|
|
| |
Just some syntax cleanup. No real change: kth<...>() was calling [...] already.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1714363002
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1707673002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1707673002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1690633003/ )
Reason for revert:
Precautionary revert for chromium:586487
Original issue's description:
> SkNx refactoring
>
> - add back Sk4i typedef
> - define SSE casts in terms of Sk4i
> * uint8 <-> float becomes uint8 <-> int <-> float
> * uint16 <-> float becomes uint16 <-> int <-> float
>
> This has the nice side effect of specializing uint8 <-> int
> and uint16 <-> int, which are useful in their own right.
>
> There are many cast specializations now, some of which call each other.
> I have tried to arrange them in some sort of sensible order, subject to
> the constraint that those called must precede those who call.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/c1eb311f4e98934476f1b2ad5d6de772cf140d60
TBR=herb@google.com,mtklein@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=chromium:586487
Review URL: https://codereview.chromium.org/1696903002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add back Sk4i typedef
- define SSE casts in terms of Sk4i
* uint8 <-> float becomes uint8 <-> int <-> float
* uint16 <-> float becomes uint16 <-> int <-> float
This has the nice side effect of specializing uint8 <-> int
and uint16 <-> int, which are useful in their own right.
There are many cast specializations now, some of which call each other.
I have tried to arrange them in some sort of sensible order, subject to
the constraint that those called must precede those who call.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1690633003
|
|
|
|
|
|
|
|
|
|
| |
About 25% faster on both x86 and ARMv7.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1682953002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1682953002
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1685773002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1685773002/ )
Reason for revert:
build break must be this, right?
Original issue's description:
> Sk4f: add floor()
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614
TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1679343004
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1685773002
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679343003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1679343003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup)
- expand apis a little
* v[0] == v.kth<0>()
* SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f
- remove anonymous namespace
I believe it's safe to remove the anonymous namespace right now.
We're worried about violating the One Definition Rule; the anonymous namespace protected us from that.
In Release builds, this is mostly moot, as everything tends to inline completely.
In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken.
Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function. Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions. These are not subject to the ODR.
I plan to follow up with a tedious .kth<...>() -> [...] auto-replace.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1683543002
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679713002
patch from issue 1679713002 at patchset 1 (http://crrev.com/1679713002#ps1)
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1676893002
|
|
|
|
|
|
|
|
|
|
|
| |
- generalizes the bench to float <--> {u8,u16}
- must remember to implement NEON version at some point
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1676853002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1676853002
|
|
|
|
|
|
|
|
|
|
| |
This means we can remove a lot of explicit casts in code that uses SkNx.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1650653002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1650653002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All sizes approximately twice as fast.
Before:
micros bench
1649.35 mipmap_build_512x512 nonrendering
1824.42 mipmap_build_511x512 nonrendering
2100.66 ? mipmap_build_512x511 nonrendering
2375.94 mipmap_build_511x511 nonrendering
After:
micros bench
730.32 ! mipmap_build_512x512 nonrendering
922.12 mipmap_build_511x512 nonrendering
999.07 mipmap_build_512x511 nonrendering
1342.93 ! mipmap_build_511x511 nonrendering
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6
Review URL: https://codereview.chromium.org/1606013003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1606013003/ )
Reason for revert:
Paranoid revert to see if it helps skia:4823
Original issue's description:
> SkNx miplevel building
>
> All sizes approximately twice as fast.
>
> Before:
> micros bench
> 1649.35 mipmap_build_512x512 nonrendering
> 1824.42 mipmap_build_511x512 nonrendering
> 2100.66 ? mipmap_build_512x511 nonrendering
> 2375.94 mipmap_build_511x511 nonrendering
>
> After:
> micros bench
> 730.32 ! mipmap_build_512x512 nonrendering
> 922.12 mipmap_build_511x512 nonrendering
> 999.07 mipmap_build_512x511 nonrendering
> 1342.93 ! mipmap_build_511x511 nonrendering
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6
TBR=reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1607323002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All sizes approximately twice as fast.
Before:
micros bench
1649.35 mipmap_build_512x512 nonrendering
1824.42 mipmap_build_511x512 nonrendering
2100.66 ? mipmap_build_512x511 nonrendering
2375.94 mipmap_build_511x511 nonrendering
After:
micros bench
730.32 ! mipmap_build_512x512 nonrendering
922.12 mipmap_build_511x512 nonrendering
999.07 mipmap_build_512x511 nonrendering
1342.93 ! mipmap_build_511x511 nonrendering
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1606013003
|
|
|
|
|
|
|
|
|
|
|
| |
There's no reason we couldn't implement this for all ints and floats;
just don't want to land unused code.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1590843003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1590843003
|
|
|
|
|
|
|
|
|
|
| |
Given the autovectorization we've seen, I wouldn't expect big speedups
from this, but it does give us a point of control over what's going on.
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1526923003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- one base case and one N=1 case instead of two each (or three with doubles)
- use SkNx_cast instead of FromBytes/toBytes
- 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes
If I did everything right, this'll be perf- and pixel- neutral.
https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1526523003
|
|
|
|
|
|
|
|
|
|
|
| |
Also remove the SK_SUPPORT_LEGACY_LINEAR_GRADIENT_TABLE guard since it is no
longer used in Chromium.
BUG=chromium:563492
R=reed@google.com,mtklein@google.com
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1489233005
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a big speedup for float -> byte. E.g. gradient_linear_clamp_3color:
x86-64 147µs -> 103µs (Broadwell MBP)
arm64 2.03ms -> 648µs (Galaxy S6)
armv7 1.12ms -> 489µs (Galaxy S6, same device!)
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
Review URL: https://codereview.chromium.org/1483953002
|