aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 Review-Url: https://codereview.chromium.org/2339233003
* Revert of Support Float32 output from SkColorSpaceXform (patchset #7 ↵Gravatar msarett2016-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2339233003/ ) Reason for revert: Hitting an assert Original issue's description: > Support Float32 output from SkColorSpaceXform > > * Adds Float32 support to SkColorSpaceXform > * Changes API to allows clients to ask for F32, updates clients to > new API > * Adds Sk4f_load4 and Sk4f_store4 to SkNx > * Make use of new xform in SkGr.cpp > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2347473007
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2339233003
* Apple devices do not support CRC32 instructions. Don't believe Clang's lies.Gravatar mtklein2016-09-08
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2322033002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2322033002
* clean up use of register storage classGravatar mtklein2016-08-30
| | | | | | | | | | | | | This fixes this warning-as-error, so we don't have to stifle it any more: error: 'register' storage class specifier is deprecated and incompatible with C++1z [-Werror,-Wdeprecated-register] Tested by building for mipsel via GN. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2289293002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2289293002
* compress_r11eac_blocks() required more alignment than dst has.Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | This shouldn't change any behavior except that the stores to dst will no longer require 8-byte alignment. Empirically it seems like we can use 4-byte alignment here, but u8 (i.e. 1-byte alignment) is always safe. BUG=skia:5637 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2264103002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2264103002
* Use ARMv8 CRC32 instructions for SkOpts::hash().Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | For large inputs, this runs ~11x faster than Murmur3. My bench drops from 1µs to 88ns. Like x86-64, this runs fastest if we work in 24 byte chunks. 16 byte chunks run at about 0.75x this speed, 8 byte chunks at about 0.4x (which would still be about 5x faster than Murmur3). This'll require plumbing support for opts_crc32 into Chrome first before it can roll. perf.skia.org charts we want to watch: https://perf.skia.org/#5490 Seach for compute_hash in these logs to see the difference: baseline: https://luci-milo.appspot.com/swarming/task/30ba22f3dfe30e10/steps/nanobench/0/stdout trybot: https://luci-milo.appspot.com/swarming/task/30bbc406cbf62d10/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2260823002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2260823002
* 32-bit fast hash, tidy up murmur3 a bitGravatar mtklein2016-08-16
| | | | | | | | | | | | | | Nothing too surprising in the new 32-bit x86 hash. It's about half speed of the 64-bit variant, just as you'd expect. Using unaligned_load like the others makes the may_alias parts of murmur3 moot. I've updated some of the terms in the murmur hash to read consistently with the others. The existing hashes are the same speed and produce the same hashes. In case this is not obvious, all three hash functions produce different hashes. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2251773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2251773002
* Use sse4.2 CRC32 instructions to hash when available.Gravatar mtklein2016-08-08
| | | | | | | | | | | | About 9x faster than Murmur3 for long inputs. Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot Review-Url: https://codereview.chromium.org/2208903002
* SkBlendARGB32 and S32[A]_Blend_BlitRow32 are currently formulated as: ↵Gravatar lsalzman2016-08-05
| | | | | | | | | | | | | | | | | | | | | | SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate. In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684 In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338. This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off. This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein adds...] No public API changes. TBR=reed@google.com Review-Url: https://codereview.chromium.org/2097883002
* SkRTConf: eliminateGravatar halcanary2016-08-04
| | | | | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 DOCS_PREVIEW= https://skia.org/?cl=2212473002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein] TBR=reed@google.com Only removing unused public API. Review-Url: https://codereview.chromium.org/2212473002
* Revert of SkRTConf: reduce functionality to what we use, increase simplicity ↵Gravatar mtklein2016-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #8 id:150001 of https://codereview.chromium.org/2212473002/ ) Reason for revert: missed GrVkPipelineStateCache Original issue's description: > SkRTConf: reduce functionality to what we use, increase simplicity > > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 > DOCS_PREVIEW= https://skia.org/?cl=2212473002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > [mtklein] > TBR=reed@google.com > Only removing unused public API. > > Committed: https://skia.googlesource.com/skia/+/ef59974708dade6fa72fb0218d4f8a9590175c47 TBR=halcanary@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review-Url: https://codereview.chromium.org/2215433003
* SkRTConf: reduce functionality to what we use, increase simplicityGravatar halcanary2016-08-03
| | | | | | | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2212473002 DOCS_PREVIEW= https://skia.org/?cl=2212473002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein] TBR=reed@google.com Only removing unused public API. Review-Url: https://codereview.chromium.org/2212473002
* Refactor of SkColorSpaceXformOptsGravatar msarett2016-08-02
| | | | | | | | | | | | | | | | | (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002
* Revert of Tidy up SkNx_neon. (patchset #3 id:40001 of ↵Gravatar mtklein2016-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/2196773002/ ) Reason for revert: https://luci-milo.appspot.com/swarming/task/3055149a25621b10 Not Nexus 5 specific. Reproduces on Pixel C with --gcc -t Debug -d arm_v7_neon. Not sure about other configs yet. Original issue's description: > Tidy up SkNx_neon. > > This takes advantage of the fact that all the compilers we use that > support NEON implement it with their own vector extensions. This means > normal things like c = a + b work on the underlying vector types already. > Odd instructions like min or saturated add need to stay intrinsics. > > Also, rearrange functions to a more consistent order. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/6ad22315eb6eacfcd35497cd118440a619d05b18 TBR=msarett@google.com,mtklein@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=skia: Review-Url: https://codereview.chromium.org/2196953002
* simplify neon shiftsGravatar mtklein2016-07-29
| | | | | | | | | | | | These still generate vshr/vshl with immediates with both GCC and Clang. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194953002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Based on https://codereview.chromium.org/2196773002 Review-Url: https://codereview.chromium.org/2194953002
* Tidy up SkNx_neon.Gravatar mtklein2016-07-29
| | | | | | | | | | | | | | | This takes advantage of the fact that all the compilers we use that support NEON implement it with their own vector extensions. This means normal things like c = a + b work on the underlying vector types already. Odd instructions like min or saturated add need to stay intrinsics. Also, rearrange functions to a more consistent order. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2196773002
* SkNx: add Sk4uGravatar mtklein2016-07-29
| | | | | | | | | | This lets us get at logical >> in a nicely principled way. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2197683002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-29
| | | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b Review-Url: https://codereview.chromium.org/2174493002
* Revert of Add color space xform support to SkJpegCodec (includes F16!) ↵Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #9 id:260001 of https://codereview.chromium.org/2174493002/ ) Reason for revert: Breaking MSAN Original issue's description: > Add color space xform support to SkJpegCodec (includes F16!) > > Also changes SkColorXform to support: > RGBA->RGBA > RGBA->BGRA > > Instead of: > RGBA->SkPMColor > > TBR=reed@google.com > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/73d55332e2846dd05e9efdaa2f017bcc3872884b TBR=mtklein@google.com,reed@google.com,herb@google.com,brianosman@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2195523002
* Add color space xform support to SkJpegCodec (includes F16!)Gravatar msarett2016-07-28
| | | | | | | | | | | | | | | | Also changes SkColorXform to support: RGBA->RGBA RGBA->BGRA Instead of: RGBA->SkPMColor TBR=reed@google.com BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2174493002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2174493002
* Fix alignment problems in NEON Sk4b.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | | As written at head, the compiler can assume these loads and stores are 4 byte aligned [1]. We want Sk4b to load from any 1-byte aligned address, to prevent crashes like [2]. [1] https://llvm.org/bugs/show_bug.cgi?id=24421 [2] https://luci-milo.appspot.com/swarming/task/304079e125b1b910/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183133002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2183133002
* Add Sk4h_load4 for loading F16.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | Should feel very similar to Sk4h_store4: NEON uses its native instruction, SSE unpacks manually. Since we'll have our F16s in 4 Sk4h by the time we're done here, this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2184753002
* Use sk_srgb_to_linear_trunc in SkColorXform_optsGravatar msarett2016-07-25
| | | | | | | | | | | | | | | This gives us a little more control over instruction order, allowing us to pipeline the muls and get better performance. Technically, clang should be able to do this for us anyway... Performance on HP z620 (201295.jpg): toSRGB: 371us -> 356us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2175413002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2175413002
* Delete SkDefaultXform, handle edge cases in SkColorSpaceXform_BaseGravatar msarett2016-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | "Edge" cases include: (1) Matrices with translation (2) colorLUTs Performance on HP z620: 201295.jpg to2Dot2: 386us -> 414us toSRGB: 346us -> 371us toHalf: 282us -> 302us strange0-translate.jpg toSRGB: 1060us -> 244us strange1-colorLUT.jpg toSRGB: 2.74ms -> 2.00ms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2177173003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2177173003
* Add clamp to sk_linear_to_srgb, reorder instructionsGravatar msarett2016-07-22
| | | | | | | | | | | | | | | | | | | Improves performance for xforms toSRGB and to2Dot2. Seems more optimal to save clamping until the end. That way we don't stall the mul pipeline with a min/max. toSRGB: 371us -> 346us to2Dot2: 404us -> 387us FWIW, it probably makes sense to clamp inside sk_linear_to_srgb anyway. If not, we should potentially provide two versions (one that clamps and one that doesn't). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2173803002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2173803002
* Miscellaneous color space refactorsGravatar msarett2016-07-21
| | | | | | | | | | | | | | | | (1) Use float matrix[16] everywhere (enables future code sharing). (2) SkColorLookUpTable refactors *** Store in a single allocation (like SkGammas) *** Eliminate fOutputChannels (we always require 3, and probably always will) (3) Change names of read_big_endian_* helpers BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2166093003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2166093003
* Correct sRGB <-> linear everywhere.Gravatar mtklein2016-07-20
| | | | | | | | | | | | | | | | | | | | | | This trims the SkPM4fPriv methods down to just foolproof methods. (Anything trying to build these itself is probably wrong.) Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore, at least not efficiently, so this refactor is somewhat more invasive than you might think. Generally this means things using to_4f() are also making a misstep... that's gone too. It also does not make sense to try to play games with linear floats with 255 bias any more. That hack can't work with real sRGB coding. Rather than update them, I've removed a couple of L32 xfermode fast paths. I'd even rather drop it entirely... BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2163683002
* Tune linear->sRGB constants to round-trip all bytes.Gravatar mtklein2016-07-20
| | | | | | | | | | | | I basically just ran a big 5-deep for-loop over the five constants here. This is the first set of coefficients I found that round trips all bytes. I suspect there are many such sets. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2162063003
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Add capability for SkColorXform to output half floatsGravatar msarett2016-07-15
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147763002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2147763002
* Add a bench to measure the best way to pack from int to uint16_t with SSE.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | I measured relative runtimes on my laptop: pack_int_uint16_t_ss… 1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think, so I'm going to exercise a little cowardice and leave that option disabled for now. The ssse3 version probably looks a little faster than it will be in practice. We'll usually need to load its mask, which here is hoisted out of the bench loop. The two sse2 variants are close enough in speed that I'm tie breaking them on other concerns: the <<16, >>16 version doesn't need any scratch registers or to load any constants, so it wins. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Review-Url: https://codereview.chromium.org/2150343002
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* Update SkOpts namespaces.Gravatar mtklein2016-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | If we make sure all SkOpts functions are static, we can give the namespaces any name we like. This lets us drop the sk_ prefix and give a real indication of the default SIMD instruction set rather than just saying sk_default. Both of these changes help debugger, profiler, and crash report readability. Perhaps more importantly, keeping these functions static helps prevent accidentally linking in unused versions of functions, as you see here with sk_avx::srcover_srgb_srgb(). This requires we update SkBlend_opts tests and benches to call SkOpts functions through SkOpts rather than declaring the methods externally. In practice this drops testing of the SSE2 version on machines with SSE4. If we still really need to test/bench the compile time best SIMD level version of this method against the runtime detected best, we can include SkBlend_opts.h into the tests or benches directly, similar to what we do for the trivial, brute-force, or best non-SIMD versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145833002
* SkRasterPipeline preliminariesGravatar mtklein2016-07-12
| | | | | | | | | | | | | | Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004
* Remove bloat from SkBlend_opts.Gravatar herb2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130183003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130183003
* Add Sk4f_RoundToIntGravatar msarett2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2134753006
* Revert of try to speed-up maprect + round2i + contains (patchset #8 ↵Gravatar msarett2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2133413002/ ) Reason for revert: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > try to speed-up maprect + round2i + contains > > We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 TBR=mtklein@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2136343002
* try to speed-up maprect + round2i + containsGravatar reed2016-07-11
| | | | | | | | | | We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2133413002
* Clean up hyper-local SkCpu feature test experiment.Gravatar mtklein2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This removes the code paths where we make SkCpu::Supports() calls from within a tight loop. It keeps code paths using SkCpu::Supports() to choose entire routines from src/opts/. We can't rely on these hyper-local checks to be hoisted up reliably enough. It worked pretty well with the first couple platforms we tried (e.g. Clang on Linux/Mac) but we can't gaurantee it works everywhere. Further, I'm not able to actually do anything fancy with those tests outside of x86... I've not found a way to get, say, NEON+F16 conversion code embedded into ordinary NEON code outside writing then entire function in external assembly. This whole idea becomes less important now that we've got a way to chain separate function calls together efficiently. We can now, e.g., use an AVX+F16C method to load some pixels, then chain that into an ordinary AVX method to color filter them. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2138073002
* Make all color xforms 'fast' (step 1)Gravatar msarett2016-07-11
| | | | | | | | | | | This refactors opt code to handle arbitrary src and dst gammas that are specified by tables. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2130013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2130013002
* Move sRGB <-> linear conversion components to their own files.Gravatar mtklein2016-07-08
| | | | | | | | | | | | | | | This makes them a little easier to use outside SkColorXform code. I've added some notes about how best to use them and their eccentricities, and added a test. Ultimately any software sRGB <-> linear conversion should funnel somehow through here. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c Review-Url: https://codereview.chromium.org/2128893002
* Revert of Move sRGB <-> linear conversion components to their own files. ↵Gravatar mtklein2016-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #5 id:80001 of https://codereview.chromium.org/2128893002/ ) Reason for revert: Monotonicity assert is failing on ARM. (Different rsqrt() and invert() precision?) Will investigate a bit tomorrow... might reland with the test TODO. Original issue's description: > Move sRGB <-> linear conversion components to their own files. > > This makes them a little easier to use outside SkColorXform code. > > I've added some notes about how best to use them and their eccentricities, and added a test. > > Ultimately any software sRGB <-> linear conversion should funnel somehow through here. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/45e58c8807179638980aae8503573b950b844e4c TBR=reed@google.com,msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2131793002
* Move sRGB <-> linear conversion components to their own files.Gravatar mtklein2016-07-07
| | | | | | | | | | | | | | This makes them a little easier to use outside SkColorXform code. I've added some notes about how best to use them and their eccentricities, and added a test. Ultimately any software sRGB <-> linear conversion should funnel somehow through here. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128893002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2128893002
* Add stub for avx.Gravatar herb2016-06-23
| | | | | | | GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2087343002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2087343002
* Do loads and math in parallel in SkColorXform_optsGravatar msarett2016-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that baselines have changed a little since I recently started using clang. 201295.jpg on HP z620 (300x280) Skia Xform sRGB Dst Before 0.378 ms Skia Xform sRGB Dst After 0.322 ms 1.17x Skia Xform 2.2 Dst Before 0.428 ms Skia Xform 2.2 Dst After 0.395 ms 1.08x QCMS Xform 0.418 ms sRGB Dst vs QCMS 1.30x 2.2 Dst vs QCMS 1.06x -------------------------------------------- Nexus 6P: Skia Xform sRGB Dst Before 1.58 ms Skia Xform sRGB Dst After 1.43 ms Skia Xform 2.2 Dst Before 2.69 ms Skia Xform 2.2 Dst After 2.62 ms Dell Venue 8: Skia Xform sRGB Dst Before 2.78 ms Skia Xform sRGB Dst After 2.74 ms Skia Xform 2.2 Dst Before 3.73 ms Skia Xform 2.2 Dst After 3.64 ms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2081933005 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2081933005
* Use a table-based implementation of SkDefaultXformGravatar msarett2016-06-22
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2084673002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2084673002
* Support sRGB dsts in opt codeGravatar msarett2016-06-20
| | | | | | | | | | | | | | | 201295.jpg on HP z620 (300x280) QCMS Xform 0.418 ms Skia NEW Xform 0.378 ms Vs QCMS 1.11x BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078623002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078623002