aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Correct sRGB <-> linear everywhere.Gravatar mtklein2016-07-20
| | | | | | | | | | | | | | | | | | | | | | This trims the SkPM4fPriv methods down to just foolproof methods. (Anything trying to build these itself is probably wrong.) Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore, at least not efficiently, so this refactor is somewhat more invasive than you might think. Generally this means things using to_4f() are also making a misstep... that's gone too. It also does not make sense to try to play games with linear floats with 255 bias any more. That hack can't work with real sRGB coding. Rather than update them, I've removed a couple of L32 xfermode fast paths. I'd even rather drop it entirely... BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2163683002
* Tune linear->sRGB constants to round-trip all bytes.Gravatar mtklein2016-07-20
| | | | | | | | | | | | I basically just ran a big 5-deep for-loop over the five constants here. This is the first set of coefficients I found that round trips all bytes. I suspect there are many such sets. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2162063003
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Add capability for SkColorXform to output half floatsGravatar msarett2016-07-15
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2147763002
* Add a bench to measure the best way to pack from int to uint16_t with SSE.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* Update SkOpts namespaces.Gravatar mtklein2016-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | If we make sure all SkOpts functions are static, we can give the namespaces any name we like. This lets us drop the sk_ prefix and give a real indication of the default SIMD instruction set rather than just saying sk_default. Both of these changes help debugger, profiler, and crash report readability. Perhaps more importantly, keeping these functions static helps prevent accidentally linking in unused versions of functions, as you see here with sk_avx::srcover_srgb_srgb(). This requires we update SkBlend_opts tests and benches to call SkOpts functions through SkOpts rather than declaring the methods externally. In practice this drops testing of the SSE2 version on machines with SSE4. If we still really need to test/bench the compile time best SIMD level version of this method against the runtime detected best, we can include SkBlend_opts.h into the tests or benches directly, similar to what we do for the trivial, brute-force, or best non-SIMD versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145833002
* SkRasterPipeline preliminariesGravatar mtklein2016-07-12
| | | | | | | | | | | | | | Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004
* Remove bloat from SkBlend_opts.Gravatar herb2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130183003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130183003
* Add Sk4f_RoundToIntGravatar msarett2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2134753006
* Revert of try to speed-up maprect + round2i + contains (patchset #8 ↵Gravatar msarett2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2133413002/ ) Reason for revert: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > try to speed-up maprect + round2i + contains > > We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 TBR=mtklein@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2136343002
* try to speed-up maprect + round2i + containsGravatar reed2016-07-11
| | | | | | | | | | We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2133413002
* Clean up hyper-local SkCpu feature test experiment.Gravatar mtklein2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This removes the code paths where we make SkCpu::Supports() calls from within a tight loop. It keeps code paths using SkCpu::Supports() to choose entire routines from src/opts/. We can't rely on these hyper-local checks to be hoisted up reliably enough. It worked pretty well with the first couple platforms we tried (e.g. Clang on Linux/Mac) but we can't gaurantee it works everywhere. Further, I'm not able to actually do anything fancy with those tests outside of x86... I've not found a way to get, say, NEON+F16 conversion code embedded into ordinary NEON code outside writing then entire function in external assembly. This whole idea becomes less important now that we've got a way to chain separate function calls together efficiently. We can now, e.g., use an AVX+F16C method to load some pixels, then chain that into an ordinary AVX method to color filter them. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2138073002
* Make all color xforms 'fast' (step 1)Gravatar msarett2016-07-11
| | | | | | | | | | | This refactors opt code to handle arbitrary src and dst gammas that are specified by tables. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130013002
* Move sRGB <-> linear conversion components to their own files.Gravatar mtklein2016-07-08
| | | | | | | | | | | | | | | This makes them a little easier to use outside SkColorXform code. I've added some notes about how best to use them and their eccentricities, and added a test. Ultimately any software sRGB <-> linear conversion should funnel somehow through here. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c Review-Url: https://codereview.chromium.org/2128893002
* Revert of Move sRGB <-> linear conversion components to their own files. ↵Gravatar mtklein2016-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #5 id:80001 of https://codereview.chromium.org/2128893002/ ) Reason for revert: Monotonicity assert is failing on ARM. (Different rsqrt() and invert() precision?) Will investigate a bit tomorrow... might reland with the test TODO. Original issue's description: > Move sRGB <-> linear conversion components to their own files. > > This makes them a little easier to use outside SkColorXform code. > > I've added some notes about how best to use them and their eccentricities, and added a test. > > Ultimately any software sRGB <-> linear conversion should funnel somehow through here. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c TBR=reed@google.com,msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2131793002
* Move sRGB <-> linear conversion components to their own files.Gravatar mtklein2016-07-07
| | | | | | | | | | | | | | This makes them a little easier to use outside SkColorXform code. I've added some notes about how best to use them and their eccentricities, and added a test. Ultimately any software sRGB <-> linear conversion should funnel somehow through here. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2128893002
* Add stub for avx.Gravatar herb2016-06-23
| | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2087343002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2087343002
* Do loads and math in parallel in SkColorXform_optsGravatar msarett2016-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that baselines have changed a little since I recently started using clang. 201295.jpg on HP z620 (300x280) Skia Xform sRGB Dst Before 0.378 ms Skia Xform sRGB Dst After 0.322 ms 1.17x Skia Xform 2.2 Dst Before 0.428 ms Skia Xform 2.2 Dst After 0.395 ms 1.08x QCMS Xform 0.418 ms sRGB Dst vs QCMS 1.30x 2.2 Dst vs QCMS 1.06x -------------------------------------------- Nexus 6P: Skia Xform sRGB Dst Before 1.58 ms Skia Xform sRGB Dst After 1.43 ms Skia Xform 2.2 Dst Before 2.69 ms Skia Xform 2.2 Dst After 2.62 ms Dell Venue 8: Skia Xform sRGB Dst Before 2.78 ms Skia Xform sRGB Dst After 2.74 ms Skia Xform 2.2 Dst Before 3.73 ms Skia Xform 2.2 Dst After 3.64 ms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2081933005 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2081933005
* Use a table-based implementation of SkDefaultXformGravatar msarett2016-06-22
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2084673002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2084673002
* Support sRGB dsts in opt codeGravatar msarett2016-06-20
| | | | | | | | | | | | | | | 201295.jpg on HP z620 (300x280) QCMS Xform 0.418 ms Skia NEW Xform 0.378 ms Vs QCMS 1.11x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078623002
* Quick bandaid for chromium:611002.Gravatar mtklein2016-06-17
| | | | | | | | | | | | | | | | | | | | We're somehow receiving non-premultiplied src inputs like 0x00ffffff to this SrcOver blend. That's a bug I intend to follow up on. But for a quick compatibility fix, go back to treating values like 0x00ffffff as transparent, like we used to before crrev.com/1820313002. This will not affect the correctness of code paths using properly premultiplied colors. This should not change performance in any meaningful way. The SIMD code paths (handling strides of 16 pixels at a time) happen to treat invalid colors like 0x00fffff as transparent already. BUG=chromium:611002 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2075173002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2075173002
* port SkColorXform_opts to Sk4fGravatar mtklein2016-06-17
| | | | | | | | | | I have tested that this compiles. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078913003
* Clean up two unlaunched SSE 4.1 8888 blits.Gravatar mtklein2016-06-16
| | | | | | | | | | | | | | | | | | | | | | This code was running on our bots but never in Chrome. That's a bad state to be in. My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly for perfect correctness but also for speed, then to spread what I learned there to SSE2, AVX+, and NEON. I have since lost interest in changing any aspect of how our legacy 8888 blits work. There's not much point in making them a bit or two more correct when the math is fundamentally wrong. This will cause many diffs in Gold, none perceptible. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/6e472093009bf2fc4a8e53010b51040efcb71213 Review-Url: https://codereview.chromium.org/2062853002
* Implement fast, correct gamma conversion for color xformsGravatar msarett2016-06-16
| | | | | | | | | | | | | | | | | | | | | | 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2060823003
* Revert of Clean up two unlaunched SSE 4.1 8888 blits. (patchset #1 id:1 of ↵Gravatar mtklein2016-06-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/2062853002/ ) Reason for revert: Breaks a couple Google3 goldens. I need to rebaseline google3 with -DSK_SUPPORT_LEGACY_X86_BLITS first, then reland this. Original issue's description: > Clean up two unlaunched SSE 4.1 8888 blits. > > This code was running on our bots but never in Chrome. > That's a bad state to be in. > > My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly > for perfect correctness but also for speed, then to spread what I learned there > to SSE2, AVX+, and NEON. > > I have since lost interest in changing any aspect of how our legacy 8888 blits > work. There's not much point in making them a bit or two more correct when the > math is fundamentally wrong. > > This will cause many diffs in Gold, none perceptible. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/6e472093009bf2fc4a8e53010b51040efcb71213 TBR=reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2066453003
* Clean up two unlaunched SSE 4.1 8888 blits.Gravatar mtklein2016-06-13
| | | | | | | | | | | | | | | | | | | | | This code was running on our bots but never in Chrome. That's a bad state to be in. My plan here use to be to redesign how our 8888 blits worked in SSE 4.1, mainly for perfect correctness but also for speed, then to spread what I learned there to SSE2, AVX+, and NEON. I have since lost interest in changing any aspect of how our legacy 8888 blits work. There's not much point in making them a bit or two more correct when the math is fundamentally wrong. This will cause many diffs in Gold, none perceptible. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2062853002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2062853002
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a Review-Url: https://codereview.chromium.org/2045633002
* Optimize color xforms with 2.2 gammas for SSE2Gravatar msarett2016-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2046013002
* Revert of Move immintrin/arm_neon includes to where they are used. (patchset ↵Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #2 id:20001 of https://codereview.chromium.org/2045633002/ ) Reason for revert: Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways. Original issue's description: > Move immintrin/arm_neon includes to where they are used. > > On my Mac (so, immintrin), this improves compile time, both wall and cpu, > by about 16%. To test I ran this on an SSD with files hot in their caches: > > $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ > ninja -C out/Release -t clean && \ > time ninja -C out/Release > > Before: 159 wall / 3367 cpu > 159 wall / 3368 cpu > > After: 137 wall / 2860 cpu > 136 wall / 2863 cpu > > I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. > That made no signficant difference, so I've kept immintrin for its simplicity. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > TBR=reed@google.com > No public API changes. > > Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2046213002
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Review-Url: https://codereview.chromium.org/2045633002
* I have found a more efficient way of detecting 1 and 0 alpha in SSE2. In ↵Gravatar herb2016-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | addition, I found a stall on an execution unit for the lea instruction and rearranged to code to avoid that. Before 1,362.01 LinearSrcOvericonstrip.pngVSkOptsSSE41 2,132.54 LinearSrcOvericonstrip.pngVSkOptsDefault 1,717.77 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore 3,525.14 LinearSrcOvericonstrip.pngVSkOptsTrivial 11,181.78 LinearSrcOvericonstrip.pngVSkOptsBruteForce 644.77 LinearSrcOvermandrill_512.pngVSkOptsSSE41 682.51 LinearSrcOvermandrill_512.pngVSkOptsDefault 1,169.65 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore 2,486.45 LinearSrcOvermandrill_512.pngVSkOptsTrivial 11,635.94 LinearSrcOvermandrill_512.pngVSkOptsBruteForce 217.76 LinearSrcOverplane.pngVSkOptsSSE41 437.09 LinearSrcOverplane.pngVSkOptsDefault 275.91 LinearSrcOverplane.pngVSkOptsNonSimdCore 481.70 LinearSrcOverplane.pngVSkOptsTrivial 1,504.66 LinearSrcOverplane.pngVSkOptsBruteForce 323.90 LinearSrcOverbaby_tux.pngVSkOptsSSE41 497.49 LinearSrcOverbaby_tux.pngVSkOptsDefault 456.08 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore 786.46 LinearSrcOverbaby_tux.pngVSkOptsTrivial 2,554.65 LinearSrcOverbaby_tux.pngVSkOptsBruteForce 484.83 LinearSrcOveryellow_rose.pngVSkOptsSSE41 821.86 LinearSrcOveryellow_rose.pngVSkOptsDefault 655.37 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore 1,323.80 LinearSrcOveryellow_rose.pngVSkOptsTrivial 5,802.61 LinearSrcOveryellow_rose.pngVSkOptsBruteForce After changes to sse2 and sse4.1 1,343.12 LinearSrcOvericonstrip.pngVSkOptsSSE41 1,441.17 LinearSrcOvericonstrip.pngVSkOptsDefault 1,679.97 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore 3,481.05 LinearSrcOvericonstrip.pngVSkOptsTrivial 10,979.99 LinearSrcOvericonstrip.pngVSkOptsBruteForce 574.17 LinearSrcOvermandrill_512.pngVSkOptsSSE41 641.40 LinearSrcOvermandrill_512.pngVSkOptsDefault 1,169.44 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore 2,359.84 LinearSrcOvermandrill_512.pngVSkOptsTrivial 12,106.02 LinearSrcOvermandrill_512.pngVSkOptsBruteForce 209.95 LinearSrcOverplane.pngVSkOptsSSE41 249.12 LinearSrcOverplane.pngVSkOptsDefault 270.36 LinearSrcOverplane.pngVSkOptsNonSimdCore 466.30 LinearSrcOverplane.pngVSkOptsTrivial 1,431.14 LinearSrcOverplane.pngVSkOptsBruteForce 309.70 LinearSrcOverbaby_tux.pngVSkOptsSSE41 354.86 LinearSrcOverbaby_tux.pngVSkOptsDefault 442.69 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore 764.12 LinearSrcOverbaby_tux.pngVSkOptsTrivial 2,756.16 LinearSrcOverbaby_tux.pngVSkOptsBruteForce 457.70 LinearSrcOveryellow_rose.pngVSkOptsSSE41 500.50 LinearSrcOveryellow_rose.pngVSkOptsDefault 677.84 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore 1,301.50 LinearSrcOveryellow_rose.pngVSkOptsTrivial 5,786.40 LinearSrcOveryellow_rose.pngVSkOptsBruteForce BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1998373002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1998373002
* Add tests and benches to support the sRGB blitter for SkOptsGravatar herb2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41 2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault 1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore 3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial 9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce 633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41 684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault 1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore 2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial 10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce 209.14 LinearSrcOverplane.pngVSkOptsSSE41 562.24 LinearSrcOverplane.pngVSkOptsDefault 272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore 436.46 LinearSrcOverplane.pngVSkOptsTrivial 1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce 318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41 529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault 441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore 720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial 2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce 479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41 1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault 668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore 1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial 4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a Committed: https://skia.googlesource.com/skia/+/bc927548db17accec2195af6e15053f7918bb3f5 Review-Url: https://codereview.chromium.org/1939513002
* Revert of Add specialized sRGB blitter for SkOpts (patchset #21 id:400001 of ↵Gravatar reed2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1939513002/ ) Reason for revert: broke some debug bots: Running LinearSrcOvericonstrip.pngVSkOptsSSE41 nonrendering ../../../bench/SkBlend_optsBench.cpp:118: fatal error: ""fPixmap.colorType() == kRGBA_8888_SkColorType"" Original issue's description: > Add tests and benches to support the sRGB blitter for SkOpts > > 1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41 > 2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault > 1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore > 3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial > 9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce > > 633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41 > 684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault > 1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore > 2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial > 10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce > > 209.14 LinearSrcOverplane.pngVSkOptsSSE41 > 562.24 LinearSrcOverplane.pngVSkOptsDefault > 272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore > 436.46 LinearSrcOverplane.pngVSkOptsTrivial > 1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce > > 318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41 > 529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault > 441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore > 720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial > 2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce > > 479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41 > 1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault > 668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore > 1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial > 4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce > > > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a > > Committed: https://skia.googlesource.com/skia/+/bc927548db17accec2195af6e15053f7918bb3f5 TBR=mtklein@google.com,fmalita@chromium.org,herb@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/1986763002
* Add tests and benches to support the sRGB blitter for SkOptsGravatar herb2016-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41 2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault 1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore 3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial 9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce 633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41 684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault 1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore 2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial 10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce 209.14 LinearSrcOverplane.pngVSkOptsSSE41 562.24 LinearSrcOverplane.pngVSkOptsDefault 272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore 436.46 LinearSrcOverplane.pngVSkOptsTrivial 1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce 318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41 529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault 441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore 720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial 2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce 479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41 1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault 668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore 1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial 4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/554784cd85029c05d9ed04b1aeb71520d196153a Review-Url: https://codereview.chromium.org/1939513002
* Revert "Add tests and benches to support the sRGB blitter for SkOpts"Gravatar scroggo2016-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 554784cd85029c05d9ed04b1aeb71520d196153a and 1956b4ae1c9a47833b174f31c054d347ea04db09 Reason for revert - ASAN failures, e.g. from https://uberchromegw.corp.google.com/i/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-MSAN/builds/2233/steps/perf_skia%20on%20Ubuntu/logs/stdio : Uninitialized value was created by a heap allocation 0 0x7f69aa96f799 in operator new[](unsigned long) /b/work/skia/third_party/externals/llvm/out/../projects/compiler-rt/lib/msan/msan_new_delete.cc:37 1 0x7f69aaa315c1 in SkAutoTArray<unsigned int>::reset(int) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../include/private/../private/SkTemplates.h:137:22 2 0x7f69aaa34ee9 in LinearSrcOverBench<SrcOverVSkOptsSSE41>::LinearSrcOverBench(char const*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:108:9 3 0x7f69aaa30cf2 in $_24::operator()(void*) const /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1 4 0x7f69aaa30c87 in $_24::__invoke(void*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1 5 0x7f69aaa68856 in BenchmarkStream::rawNext() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:653:32 6 0x7f69aaa61467 in BenchmarkStream::next() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:642:25 7 0x7f69aaa5b703 in nanobench_main() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1119:27 8 0x7f69aaa5e10d in main /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1290:12 9 0x7f69a8c95ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 TBR=herb@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1969803002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1969803002
* Add tests and benches to support the sRGB blitter for SkOptsGravatar herb2016-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1,370.85 LinearSrcOvericonstrip.pngVSkOptsSSE41 2,359.69 LinearSrcOvericonstrip.pngVSkOptsDefault 1,828.72 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore 3,277.40 LinearSrcOvericonstrip.pngVSkOptsTrivial 9,862.34 LinearSrcOvericonstrip.pngVSkOptsBruteForce 633.55 LinearSrcOvermandrill_512.pngVSkOptsSSE41 684.29 LinearSrcOvermandrill_512.pngVSkOptsDefault 1,201.88 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore 2,382.63 LinearSrcOvermandrill_512.pngVSkOptsTrivial 10,888.74 LinearSrcOvermandrill_512.pngVSkOptsBruteForce 209.14 LinearSrcOverplane.pngVSkOptsSSE41 562.24 LinearSrcOverplane.pngVSkOptsDefault 272.64 LinearSrcOverplane.pngVSkOptsNonSimdCore 436.46 LinearSrcOverplane.pngVSkOptsTrivial 1,327.23 LinearSrcOverplane.pngVSkOptsBruteForce 318.01 LinearSrcOverbaby_tux.pngVSkOptsSSE41 529.05 LinearSrcOverbaby_tux.pngVSkOptsDefault 441.33 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore 720.50 LinearSrcOverbaby_tux.pngVSkOptsTrivial 2,191.10 LinearSrcOverbaby_tux.pngVSkOptsBruteForce 479.68 LinearSrcOveryellow_rose.pngVSkOptsSSE41 1,095.03 LinearSrcOveryellow_rose.pngVSkOptsDefault 668.60 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore 1,257.19 LinearSrcOveryellow_rose.pngVSkOptsTrivial 4,970.25 LinearSrcOveryellow_rose.pngVSkOptsBruteForce BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1939513002
* SkOncePtr -> SkOnceGravatar mtklein2016-05-05
| | | | | | | | | | | | | | | | It's always nice to kill off a synchronization primitive. And while less terse, I think this new code reads more clearly. ... and, SkOncePtr's tests were the only thing now using sk_num_cores() outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02 Review-Url: https://codereview.chromium.org/1953533002
* Remove NEON runtime detection support.Gravatar mtklein2016-05-05
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1952953004 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1952953004
* Revert of SkOncePtr -> SkOnce (patchset #1 id:1 of ↵Gravatar reed2016-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1949313003/ ) Reason for revert: ok, I have no idea what's going on Original issue's description: > Reland of SkOncePtr -> SkOnce (patchset #1 id:1 of https://codereview.chromium.org/1945293004/ ) > > Reason for revert: > previous revert unneeded (I think) > > Original issue's description: > > Revert of SkOncePtr -> SkOnce (patchset #4 id:60001 of https://codereview.chromium.org/1953533002/ ) > > > > Reason for revert: > > broken the Mac and Linux builders, e.g.: > > > > https://build.chromium.org/p/chromium/builders/Mac/builds/15151 > > https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052 > > > > Original issue's description: > > > SkOncePtr -> SkOnce > > > > > > It's always nice to kill off a synchronization primitive. > > > And while less terse, I think this new code reads more clearly. > > > > > > ... and, SkOncePtr's tests were the only thing now using sk_num_cores() > > > outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp. > > > > > > BUG=skia: > > > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002 > > > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > > > > > Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02 > > > > TBR=herb@google.com,mtklein@chromium.org > > # Skipping CQ checks because original CL landed less than 1 days ago. > > NOPRESUBMIT=true > > NOTREECHECKS=true > > NOTRY=true > > BUG=skia: > > > > Committed: https://skia.googlesource.com/skia/+/7eb33da7eede34c050b865dbb1b60c3dcea7191b > > TBR=herb@google.com,mtklein@chromium.org > # Skipping CQ checks because original CL landed less than 1 days ago. > NOPRESUBMIT=true > NOTREECHECKS=true > NOTRY=true > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/82da9a8aa0ce648f36882830765b42e0ada6c0fa TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/1948313002
* Reland of SkOncePtr -> SkOnce (patchset #1 id:1 of ↵Gravatar reed2016-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1945293004/ ) Reason for revert: previous revert unneeded (I think) Original issue's description: > Revert of SkOncePtr -> SkOnce (patchset #4 id:60001 of https://codereview.chromium.org/1953533002/ ) > > Reason for revert: > broken the Mac and Linux builders, e.g.: > > https://build.chromium.org/p/chromium/builders/Mac/builds/15151 > https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052 > > Original issue's description: > > SkOncePtr -> SkOnce > > > > It's always nice to kill off a synchronization primitive. > > And while less terse, I think this new code reads more clearly. > > > > ... and, SkOncePtr's tests were the only thing now using sk_num_cores() > > outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp. > > > > BUG=skia: > > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002 > > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > > > Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02 > > TBR=herb@google.com,mtklein@chromium.org > # Skipping CQ checks because original CL landed less than 1 days ago. > NOPRESUBMIT=true > NOTREECHECKS=true > NOTRY=true > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/7eb33da7eede34c050b865dbb1b60c3dcea7191b TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/1949313003
* Revert of SkOncePtr -> SkOnce (patchset #4 id:60001 of ↵Gravatar reed2016-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1953533002/ ) Reason for revert: broken the Mac and Linux builders, e.g.: https://build.chromium.org/p/chromium/builders/Mac/builds/15151 https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/19052 Original issue's description: > SkOncePtr -> SkOnce > > It's always nice to kill off a synchronization primitive. > And while less terse, I think this new code reads more clearly. > > ... and, SkOncePtr's tests were the only thing now using sk_num_cores() > outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/9bd3fc7fa9bb7cad050bd619aa93d4c48ebb5c02 TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/1945293004
* SkOncePtr -> SkOnceGravatar mtklein2016-05-04
| | | | | | | | | | | | | | It's always nice to kill off a synchronization primitive. And while less terse, I think this new code reads more clearly. ... and, SkOncePtr's tests were the only thing now using sk_num_cores() outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/1953533002
* Add a hook for CPU-optimized sRGB-sRGB srcover.Gravatar mtklein2016-05-02
| | | | | | | | | | | | | | | | | | Herb's really starting to get serious about tweaking this, which becomes a lot easier when you've got SkOpts' runtime CPU detection. We should be able to optimize this usefully for SSSE3, SSE4.1, AVX, AVX2, or NEON. (We can of course implement a subset.) This function takes two counts to give us flexibility to write src patterns: nsrc >= ndst -> the usual srcover function nsrc < ndst -> repeat src until it fills dst nsrc << ndst -> possibly preprocess src into registers nsrc == 1 -> equivalent of blitrow_color32, srcover_1, etc. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939783003 Review-Url: https://codereview.chromium.org/1939783003
* skcpu: sse4.1 floor, f16c f16<->f32Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 Review URL: https://codereview.chromium.org/1891513002
* Move CPU feature detection to its own file.Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 Review URL: https://codereview.chromium.org/1890483002
* Revert of Move CPU feature detection to its own file. (patchset #7 id:120001 ↵Gravatar mtklein2016-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1890483002/ ) Reason for revert: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places. Original issue's description: > Move CPU feature detection to its own file. > > - Moves CPU feature detection to its own file. > - Cleans up some redundant feature detection scattered around core/ and opts/. > - Can now detect a few new CPU features: > * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 > * FMA -> Intel FMA instructions, added at the same time as AVX2 > * VFP_FP16 -> ARM f16<->f32 instructions, quite common > * NEON_FMA -> ARM FMA instructions, also quite common > * SSE and SSE3... why not? > > This new internal API makes it very cheap to do fine-grained runtime CPU > feature detection. Redundant calls to SkCpu::Supports() should be eliminated > and it's hoistable out of loops. It compiles away entirely when we have the > appropriate instructions available at compile time. > > This means we can call it to guard even a little snippet of 1 or 2 instructions > right where needed and let inlining hoist the check (if any at all) up to > somewhere that doesn't hurt performance. I've explained how I made this work > in the private section of the new header. > > Once this lands and bakes a bit, I'll start following up with CLs to use it more > and to add a bunch of those little 1-2 instruction snippets we've been wanting, > e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps > (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1892643003
* Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #11 id:200001 of ↵Gravatar mtklein2016-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1891513002/ ) Reason for revert: this depends on a CL I want to revert Original issue's description: > skcpu: sse4.1 floor, f16c f16<->f32 > > - floor with roundps is about 4.5x faster when available > - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: > > +0x180 movups (%r15), %xmm1 > +0x184 vcvtph2ps (%rbx), %xmm2 > +0x189 movaps %xmm1, %xmm3 > +0x18c shufps $255, %xmm3, %xmm3 > +0x190 movaps %xmm0, %xmm4 > +0x193 subps %xmm3, %xmm4 > +0x196 mulps %xmm2, %xmm4 > +0x199 addps %xmm1, %xmm4 > +0x19c vcvtps2ph $0, %xmm4, (%rbx) > +0x1a2 addq $16, %r15 > +0x1a6 addq $8, %rbx > +0x1aa decl %r14d > +0x1ad jne +0x180 > > If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 > > Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1897433002