aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkNx_neon.h
Commit message (Collapse)AuthorAge
* Reland "Respect full precision for RGB16 PNGs" (part 2)Gravatar Matt Sarett2017-01-19
| | | | | | | | | | | | | | | | | | | | | This lands all the new xform hooks but no change to src/codec. So the new decode features are turned off. I'm relanding this in pieces to try to bisect a strange MSAN error. Original CL: https://skia-review.googlesource.com/c/7085/ BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-MSAN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD,Build-Ubuntu-Clang-x86_64-Release-Fast Change-Id: I451a2a29c73ca475e9e7a5ded58d4948d6b8be19 Reviewed-on: https://skia-review.googlesource.com/7277 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Respect full precision for RGB16 PNGs"Gravatar Matt Sarett2017-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7a090c403da1dad6a2e19f2011158bd894a62d91. Reason for revert: <INSERT REASONING HERE> Original change's description: > Respect full precision for RGB16 PNGs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 > Reviewed-on: https://skia-review.googlesource.com/7085 > Commit-Queue: Matt Sarett <msarett@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Reed <reed@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ibd9879bc4f65ca0c2457dd0bfb5eb008d9a8f672 Reviewed-on: https://skia-review.googlesource.com/7183 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* Respect full precision for RGB16 PNGsGravatar Matt Sarett2017-01-18
| | | | | | | | | | | | | BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: If58d201daae97bce2f8efbc453c2ec452e682493 Reviewed-on: https://skia-review.googlesource.com/7085 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Reed <reed@google.com>
* Use RasterPipeline to support full precision on 16-bit RGBA pngsGravatar Matt Sarett2017-01-13
| | | | | | | | | | | | Reland of Original Change: https://skia-review.googlesource.com/6260 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I809984dd9af225103bfbe83492a17c19da7c5e40 Reviewed-on: https://skia-review.googlesource.com/6980 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Use RasterPipeline to support full precision on 16-bit RGBA pngs"Gravatar Matt Sarett2017-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bb2339da39ab3ee59121acd911920dafcd4a2f72. Reason for revert: Breaks MSAN Original change's description: > Use RasterPipeline to support full precision on 16-bit RGBA pngs > > TODO: Support more precision on 16-bit RGB pngs > > BUG=skia: > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d > Reviewed-on: https://skia-review.googlesource.com/6260 > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Leon Scroggins <scroggo@google.com> > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Matt Sarett <msarett@google.com> > TBR=mtklein@chromium.org,mtklein@google.com,msarett@google.com,scroggo@google.com,reed@google.com,reviews@skia.org BUG=skia: NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I47579c20af033a75883e2b35567cb9c690ce54b0 Reviewed-on: https://skia-review.googlesource.com/6975 Commit-Queue: Matt Sarett <msarett@google.com> Reviewed-by: Matt Sarett <msarett@google.com>
* Use RasterPipeline to support full precision on 16-bit RGBA pngsGravatar Matt Sarett2017-01-12
| | | | | | | | | | | | | | | TODO: Support more precision on 16-bit RGB pngs BUG=skia: CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I89dfef3b4887b9c4895c17309933883ab90ffa4d Reviewed-on: https://skia-review.googlesource.com/6260 Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Matt Sarett <msarett@google.com>
* Revert "Revert "SkNx basically always is fast now.""Gravatar Mike Klein2016-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8ba64d1996ba6c9ecfb12132cdab7d5d99af7456. Reason for revert: does not appear to have been blocking the roll. Original change's description: > Revert "SkNx basically always is fast now." > > This reverts commit 21f783829619186442041de6008f7f58f4f6250d. > > Reason for revert: roll? > > Original change's description: > > SkNx basically always is fast now. > > > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > > Reviewed-on: https://skia-review.googlesource.com/5946 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Mike Reed <reed@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > > > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org > NOPRESUBMIT=true > NOTREECHECKS=true > NOTRY=true > > Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 > Reviewed-on: https://skia-review.googlesource.com/6040 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org,reviews@skia.org,fmalita@chromium.org,reed@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I230dd4c2abb2d14ffc302be5376b9eaacbbeafcc Reviewed-on: https://skia-review.googlesource.com/6026 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "SkNx basically always is fast now."Gravatar Mike Klein2016-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 21f783829619186442041de6008f7f58f4f6250d. Reason for revert: roll? Original change's description: > SkNx basically always is fast now. > > We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. > > CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD > > Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 > Reviewed-on: https://skia-review.googlesource.com/5946 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Mike Reed <reed@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> > TBR=mtklein@chromium.org,fmalita@chromium.org,reed@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I0e57285c68eae0a64213fe29ea4cca5519777954 Reviewed-on: https://skia-review.googlesource.com/6040 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkNx basically always is fast now.Gravatar Mike Klein2016-12-13
| | | | | | | | | | | | We had this SKNX_IS_FAST hanging around from before Chrome always built with NEON. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia5cc0323b3ef052192e2903f961aee11eb3f82d8 Reviewed-on: https://skia-review.googlesource.com/5946 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* SkNx_abi is unused.Gravatar Mike Klein2016-11-29
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I082c34a1f484715cd2dca55a8d23101235755e6a Reviewed-on: https://skia-review.googlesource.com/5233 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Be careful about types in SkNx_neon.h.Gravatar Mike Klein2016-11-17
| | | | | | | | | | | | The Google3 Android ARM build is still using GCC, which isn't so loosey-goosey as Clang about bit-casting back and forth between int32x4_t and uint32x4_t. BUG=skia: Change-Id: I0d54f6859b8e03be4936c51bbaa1967db4261bd4 Reviewed-on: https://skia-review.googlesource.com/4974 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Support SkImageShader in SkRasterPipeline blitterGravatar Mike Klein2016-11-17
| | | | | | | | | | | | | | First of many CLs, I'm sure. This handles 8888 or sRGB sources with an affine matrix, clamp/clamp tiling, and nearest-neighbor sampling only. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4906 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I99f7508852b3d44b6f52f7a0bee29a793af35c48 Reviewed-on: https://skia-review.googlesource.com/4906 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Implement SkNx_fma() for Sk4f on ARMv8.Gravatar Mike Klein2016-11-03
| | | | | | | | | | | | | | I was looking at the disassembly of matrix_4x5() and noticed it didn't have any FMAs. This makes things that call SkNx_fma() actually use the FMA instruction. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4400 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ia353a77b0ca14385a43b564997b05586f9472996 Reviewed-on: https://skia-review.googlesource.com/4400 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkNx: use SK_ALWAYS_INLINE thoroughly.Gravatar Mike Klein2016-10-19
| | | | | | | | | | | | | | | | | | MSVC's not so good at inlining. So tell it where to. It won't hurt the others. This has nothing directly to do with ODR safety. The anonymous namespaces and 'static' on freestanding functions provide the correctness we need there. But this change can help to mechanically prevent the sort of problems ODR violations can lead to. I may follow up by extending this strategy further to Sk4px, which is used to implement a lot of the legacy xfermodes. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3608 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I927334c40910ce43da1fbabdf243c9cd5438bea6 Reviewed-on: https://skia-review.googlesource.com/3608 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkNx_abi for passing Sk4f as function arguments, etc.Gravatar Mike Klein2016-10-15
| | | | | | | | | | | | CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3422 Change-Id: Idc0a192faa7ff843aef023229186580c69baf1f7 Reviewed-on: https://skia-review.googlesource.com/3422 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Wrap SkNx types in anonymous namespace again.Gravatar Mike Klein2016-10-14
| | | | | | | | | | | | | | | | | | | | | This should make each compilation unit's SkNx types distinct from each other's as far as C++ cares. This keeps us from violating the One Definition Rule with different implementations for the same function. Here's an example I like. Sk4i SkNx_cast(Sk4b) has at least 4 different sensible implementations: - SSE2: punpcklbw xmm, zero; punpcklbw xmm, zero - SSSE3: load mask; pshufb xmm, mask - SSE4.1: pmovzxbd - AVX2: vpmovzxbd We really want all these to inline, but if for some reason they don't (Debug build, poor inliner) and they're compiled in SkOpts.cpp, SkOpts_ssse3.cpp, SkOpts_sse41.cpp, SkOpts_hsw.cpp... boom! BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3461 Change-Id: I0088ebfd7640c1b0de989738ed43c81b530dc0d9 Reviewed-on: https://skia-review.googlesource.com/3461 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Make load4 and store4 part of SkNx properly.Gravatar Mike Klein2016-10-06
| | | | | | | | | | | | | | | Every type now nominally has Load4() and Store4() methods. The ones that we use are implemented. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd Reviewed-on: https://skia-review.googlesource.com/3046 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 Review-Url: https://codereview.chromium.org/2339233003
* Revert of Support Float32 output from SkColorSpaceXform (patchset #7 ↵Gravatar msarett2016-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2339233003/ ) Reason for revert: Hitting an assert Original issue's description: > Support Float32 output from SkColorSpaceXform > > * Adds Float32 support to SkColorSpaceXform > * Changes API to allows clients to ask for F32, updates clients to > new API > * Adds Sk4f_load4 and Sk4f_store4 to SkNx > * Make use of new xform in SkGr.cpp > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47 TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2347473007
* Support Float32 output from SkColorSpaceXformGravatar msarett2016-09-16
| | | | | | | | | | | | | | * Adds Float32 support to SkColorSpaceXform * Changes API to allows clients to ask for F32, updates clients to new API * Adds Sk4f_load4 and Sk4f_store4 to SkNx * Make use of new xform in SkGr.cpp BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2339233003
* Refactor of SkColorSpaceXformOptsGravatar msarett2016-08-02
| | | | | | | | | | | | | | | | | (1) Performance is better or stays the same. (2) Code is split into functions (RasterPipeline-ish design). IMO, it's not really more or less readable. But I think it's now much easier add capabilities, apply optimizations, or do more refactors. Or to actually use RasterPipeline. I help back from trying any of these to try to keep this CL sane. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194303002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2194303002
* Revert of Tidy up SkNx_neon. (patchset #3 id:40001 of ↵Gravatar mtklein2016-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/2196773002/ ) Reason for revert: https://luci-milo.appspot.com/swarming/task/3055149a25621b10 Not Nexus 5 specific. Reproduces on Pixel C with --gcc -t Debug -d arm_v7_neon. Not sure about other configs yet. Original issue's description: > Tidy up SkNx_neon. > > This takes advantage of the fact that all the compilers we use that > support NEON implement it with their own vector extensions. This means > normal things like c = a + b work on the underlying vector types already. > Odd instructions like min or saturated add need to stay intrinsics. > > Also, rearrange functions to a more consistent order. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 > CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/6ad22315eb6eacfcd35497cd118440a619d05b18 TBR=msarett@google.com,mtklein@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=skia: Review-Url: https://codereview.chromium.org/2196953002
* simplify neon shiftsGravatar mtklein2016-07-29
| | | | | | | | | | | | These still generate vshr/vshl with immediates with both GCC and Clang. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2194953002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Based on https://codereview.chromium.org/2196773002 Review-Url: https://codereview.chromium.org/2194953002
* Tidy up SkNx_neon.Gravatar mtklein2016-07-29
| | | | | | | | | | | | | | | This takes advantage of the fact that all the compilers we use that support NEON implement it with their own vector extensions. This means normal things like c = a + b work on the underlying vector types already. Odd instructions like min or saturated add need to stay intrinsics. Also, rearrange functions to a more consistent order. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2196773002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2196773002
* SkNx: add Sk4uGravatar mtklein2016-07-29
| | | | | | | | | | This lets us get at logical >> in a nicely principled way. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2197683002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2197683002
* Fix alignment problems in NEON Sk4b.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | | As written at head, the compiler can assume these loads and stores are 4 byte aligned [1]. We want Sk4b to load from any 1-byte aligned address, to prevent crashes like [2]. [1] https://llvm.org/bugs/show_bug.cgi?id=24421 [2] https://luci-milo.appspot.com/swarming/task/304079e125b1b910/steps/nanobench/0/stdout BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183133002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2183133002
* Add Sk4h_load4 for loading F16.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | Should feel very similar to Sk4h_store4: NEON uses its native instruction, SSE unpacks manually. Since we'll have our F16s in 4 Sk4h by the time we're done here, this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2184753002
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* SkRasterPipeline preliminariesGravatar mtklein2016-07-12
| | | | | | | | | | | | | | Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004
* Add Sk4f_RoundToIntGravatar msarett2016-07-12
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134753006 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2134753006
* Revert of try to speed-up maprect + round2i + contains (patchset #8 ↵Gravatar msarett2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | id:140001 of https://codereview.chromium.org/2133413002/ ) Reason for revert: Breaking the roll... https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/253294/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > try to speed-up maprect + round2i + contains > > We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/b42b785d1cbc98bd34aceae338060831b974f9c5 TBR=mtklein@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2136343002
* try to speed-up maprect + round2i + containsGravatar reed2016-07-11
| | | | | | | | | | We call roundOut in a few places. If we can get SkNx::Ceil we could efficiently implement that as well. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2133413002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2133413002
* port SkColorXform_opts to Sk4fGravatar mtklein2016-06-17
| | | | | | | | | | I have tested that this compiles. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2078913003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2078913003
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a Review-Url: https://codereview.chromium.org/2045633002
* Revert of Move immintrin/arm_neon includes to where they are used. (patchset ↵Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #2 id:20001 of https://codereview.chromium.org/2045633002/ ) Reason for revert: Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways. Original issue's description: > Move immintrin/arm_neon includes to where they are used. > > On my Mac (so, immintrin), this improves compile time, both wall and cpu, > by about 16%. To test I ran this on an SSD with files hot in their caches: > > $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ > ninja -C out/Release -t clean && \ > time ninja -C out/Release > > Before: 159 wall / 3367 cpu > 159 wall / 3368 cpu > > After: 137 wall / 2860 cpu > 136 wall / 2863 cpu > > I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. > That made no signficant difference, so I've kept immintrin for its simplicity. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > TBR=reed@google.com > No public API changes. > > Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2046213002
* Move immintrin/arm_neon includes to where they are used.Gravatar mtklein2016-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | On my Mac (so, immintrin), this improves compile time, both wall and cpu, by about 16%. To test I ran this on an SSD with files hot in their caches: $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \ ninja -C out/Release -t clean && \ time ninja -C out/Release Before: 159 wall / 3367 cpu 159 wall / 3368 cpu After: 137 wall / 2860 cpu 136 wall / 2863 cpu I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc. That made no signficant difference, so I've kept immintrin for its simplicity. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot TBR=reed@google.com No public API changes. Review-Url: https://codereview.chromium.org/2045633002
* SkNx refreshGravatar mtklein2016-03-21
| | | | | | | | | | | | | | | | | | | | - rearrange a bit - fewer macros - hooks for all operators - add left and right scalar operator overrides - add +=, &=, <<=, etc. - add SkNx_split() and SkNx_join() - simplify the many rsqrt() and invert() options to just what we actually use This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them. It seems nice that this is an error rather than silently falling back to serial code. It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()). Time will tell? BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1812233003
* Add swizzle for rgb8888.Gravatar herb2016-03-01
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1746153002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1746153002
* SkNx: kth<...>() -> [...]Gravatar mtklein2016-02-21
| | | | | | | | | | Just some syntax cleanup. No real change: kth<...>() was calling [...] already. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1714363002
* NEON f32 <-> f16 and f32 <-> u16Gravatar mtklein2016-02-19
| | | | | | | | | | | | | | | | | | Adds f32 <-> f16 ARMv7 and ARMv8 NEON code. Also adds NEON f32 <-> u16 code to make the comparison fair. The NDK GCC does not support the ARMv8 NEON intrinsics needed to go fastest, so we use a tiny amount of inline assembly. The ARMv7 half -> float is different enough from the SSE version that it does not make sense to use SkNx. Still TODO: ARMv7 float -> half. Naively translating the SSE version results in 0x0000 where we'd expect a denormal output. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1700473003 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1700473003
* Sk4f: floor() via int32_t roundtrip.Gravatar mtklein2016-02-10
| | | | | | | | | | About 25% faster on both x86 and ARMv7. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1682953002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1682953002
* Sk4f: add floor()Gravatar mtklein2016-02-09
| | | | | | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1685773002
* Revert of Sk4f: add floor() (patchset #3 id:40001 of ↵Gravatar mtklein2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1685773002/ ) Reason for revert: build break must be this, right? Original issue's description: > Sk4f: add floor() > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614 TBR=herb@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1679343004
* Sk4f: add floor()Gravatar mtklein2016-02-09
| | | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1685773002
* sknx refactoringGravatar mtklein2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | - trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup) - expand apis a little * v[0] == v.kth<0>() * SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f - remove anonymous namespace I believe it's safe to remove the anonymous namespace right now. We're worried about violating the One Definition Rule; the anonymous namespace protected us from that. In Release builds, this is mostly moot, as everything tends to inline completely. In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken. Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function. Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions. These are not subject to the ODR. I plan to follow up with a tedious .kth<...>() -> [...] auto-replace. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1683543002
* SkNx Load/store: take any pointer.Gravatar mtklein2016-01-31
| | | | | | | | | | This means we can remove a lot of explicit casts in code that uses SkNx. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1650653002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1650653002
* SkNx miplevel buildingGravatar mtklein2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | All sizes approximately twice as fast. Before: micros bench 1649.35 mipmap_build_512x512 nonrendering 1824.42 mipmap_build_511x512 nonrendering 2100.66 ? mipmap_build_512x511 nonrendering 2375.94 mipmap_build_511x511 nonrendering After: micros bench 730.32 ! mipmap_build_512x512 nonrendering 922.12 mipmap_build_511x512 nonrendering 999.07 mipmap_build_512x511 nonrendering 1342.93 ! mipmap_build_511x511 nonrendering BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1606013003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/3bd5aba2a0e165997f683cf3aa306661e71464f6 Review URL: https://codereview.chromium.org/1606013003