aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkOpts_hsw.cpp
Commit message (Collapse)AuthorAge
* Reland "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 78cb579f33943421afc8423a39867fcfd69fed44 This time, lowp stages are controlled by !defined(JUMPER_IS_SCALAR), not by defined(__clang__). The two are usually the same, except when we opt Clang builds into JUMPER_IS_SCALAR artificially. Some Google3 builds use compilers old enough that they barf when compiling our NEON code. It's conceivably also possible to define JUMPER_IS_SCALAR yourself, but I don't think anyone does that. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> Change-Id: I3d71197d4bbb19ca4a94961a97fa2e54d5cbfb0d Reviewed-on: https://skia-review.googlesource.com/112744 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Revert "Reland "make SkJumper stages normal Skia code""Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 78cb579f33943421afc8423a39867fcfd69fed44. Reason for revert: lowp should be controlled by defined(JUMPER_IS_SCALAR), not defined(__clang__). So close. Original change's description: > Reland "make SkJumper stages normal Skia code" > > This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 > > Now with fixed #include paths in SkRasterPipeline_opts.h, > and -ffp-contract=fast for the :hsw target to minimize > diffs on non-Windows Clang AVX2/AVX-512 bots. > > Original change's description: > > make SkJumper stages normal Skia code > > > > Enough clients are using Clang now that we can say, use Clang to build > > if you want these software pipeline stages to go fast. > > > > This lets us drop the offline build aspect of SkJumper stages, instead > > building as part of Skia using the SkOpts framework. > > > > I think everything should work, except I've (temporarily) removed > > AVX-512 support. I will put this back in a follow up. > > > > I have had to drop Windows down to __vectorcall and our narrower > > stage calling convention that keeps the d-registers on the stack. > > I tried forcing sysv_abi, but that crashed Clang. :/ > > > > Added a TODO to up the same narrower stage calling convention > > for lowp stages... we just *don't* today, for no good reason. > > > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > > Reviewed-on: https://skia-review.googlesource.com/110641 > > Commit-Queue: Mike Klein <mtklein@chromium.org> > > Reviewed-by: Herb Derby <herb@google.com> > > Reviewed-by: Florin Malita <fmalita@chromium.org> > > Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 > Reviewed-on: https://skia-review.googlesource.com/112742 > Reviewed-by: Mike Klein <mtklein@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: Ie64da98f5187d44e03c0ce05d7cb189d4a6e6663 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112743 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
* Reland "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reland of 22e536e3a1a09405d1c0e6f071717a726d86e8d4 Now with fixed #include paths in SkRasterPipeline_opts.h, and -ffp-contract=fast for the :hsw target to minimize diffs on non-Windows Clang AVX2/AVX-512 bots. Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> Change-Id: I44f2c03d33958e3807747e40904b6351957dd448 Reviewed-on: https://skia-review.googlesource.com/112742 Reviewed-by: Mike Klein <mtklein@chromium.org>
* Revert "make SkJumper stages normal Skia code"Gravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 22e536e3a1a09405d1c0e6f071717a726d86e8d4. Reason for revert: wrong include path :/ Original change's description: > make SkJumper stages normal Skia code > > Enough clients are using Clang now that we can say, use Clang to build > if you want these software pipeline stages to go fast. > > This lets us drop the offline build aspect of SkJumper stages, instead > building as part of Skia using the SkOpts framework. > > I think everything should work, except I've (temporarily) removed > AVX-512 support. I will put this back in a follow up. > > I have had to drop Windows down to __vectorcall and our narrower > stage calling convention that keeps the d-registers on the stack. > I tried forcing sysv_abi, but that crashed Clang. :/ > > Added a TODO to up the same narrower stage calling convention > for lowp stages... we just *don't* today, for no good reason. > > Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 > Reviewed-on: https://skia-review.googlesource.com/110641 > Commit-Queue: Mike Klein <mtklein@chromium.org> > Reviewed-by: Herb Derby <herb@google.com> > Reviewed-by: Florin Malita <fmalita@chromium.org> TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org Change-Id: I2bdc709c80cdfa6b13ff24e024b3721bef887f46 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/112741 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* make SkJumper stages normal Skia codeGravatar Mike Klein2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Enough clients are using Clang now that we can say, use Clang to build if you want these software pipeline stages to go fast. This lets us drop the offline build aspect of SkJumper stages, instead building as part of Skia using the SkOpts framework. I think everything should work, except I've (temporarily) removed AVX-512 support. I will put this back in a follow up. I have had to drop Windows down to __vectorcall and our narrower stage calling convention that keeps the d-registers on the stack. I tried forcing sysv_abi, but that crashed Clang. :/ Added a TODO to up the same narrower stage calling convention for lowp stages... we just *don't* today, for no good reason. Change-Id: Iaaa792ffe4deab3508d2dc5d0008c163c24b3383 Reviewed-on: https://skia-review.googlesource.com/110641 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>
* Assume HQ is handled by pipeline, delete legacy code-pathGravatar Mike Reed2017-07-20
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Bug: skia: Change-Id: If6f0d0a57463bf99a66d674e65a62ce3931d0116 Reviewed-on: https://skia-review.googlesource.com/24644 Commit-Queue: Mike Reed <reed@google.com> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Really use vpmaddwd in hsw::convolve_vertical().Gravatar Mike Klein2017-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No pixel diffs. Performance on 8888 looks like an overall win. Before: micros bench 222.41 bitmap_scale_filter_64_256 40.06 bitmap_scale_filter_256_64 8.17 bitmap_scale_filter_90_10 10.32 bitmap_scale_filter_90_30 22.50 bitmap_scale_filter_90_80 1.80 bitmap_scale_filter_90_90 57.51 bitmap_scale_filter_80_90 41.99 bitmap_scale_filter_30_90 31.51 bitmap_scale_filter_10_90 After: micros bench 193.60 bitmap_scale_filter_64_256 46.26 bitmap_scale_filter_256_64 7.81 bitmap_scale_filter_90_10 9.99 bitmap_scale_filter_90_30 22.05 bitmap_scale_filter_90_80 1.96 bitmap_scale_filter_90_90 52.07 bitmap_scale_filter_80_90 37.73 bitmap_scale_filter_30_90 27.63 bitmap_scale_filter_10_90 Change-Id: I2f29366b0fd503176c5af4d825fa524e632da21b Reviewed-on: https://skia-review.googlesource.com/7630 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Graceful degredation for SkOpts_hsw.Gravatar Mike Klein2017-01-26
| | | | | | | | | | | | | __AVX2__ will not be defined if you omit -mavx2. Android does this intentionally for x86 builds. (No mobile CPU supports AVX2 AFAIK.) This should fix the Android roll. Change-Id: Ib94c862641abc11fbb46863afc53bcc049f362ad Reviewed-on: https://skia-review.googlesource.com/7633 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* Fix variable names in convolve_vertical().Gravatar Mike Klein2017-01-26
| | | | | | | | | These new names reflect the actual pixels stored in each register. Change-Id: I8e626196cd8bcbef622e4fb87ac3566a79d3573a Reviewed-on: https://skia-review.googlesource.com/7624 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* SkOpts_hsw ODR paranoiaGravatar Mike Klein2017-01-26
| | | | | | | | | | | | | | | | | | I'm warming back up to the idea of very careful use of SkOpts_hsw. But if we're going to do that, we need a strict header discipline. No header can be assumed to be safe without vetting, and most aren't. Today there's only one function defined in SkOpts_hsw, so this CL mostly rewrites that convolve_vertically() to use no headers beyond immintrin.h and stdint.h, both safe. It shared very little code with the others anyway, so we're not losing anything by putting it directly into SkOpts_hsw.cpp. I have also streamlined the implementation considerably to improve maintainability and readability. Change-Id: Ia03daae660e54125a0d2e2988464cfc930349e80 Reviewed-on: https://skia-review.googlesource.com/7611 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* disable runtime detected AVX2 raster pipelinesGravatar Mike Klein2017-01-12
| | | | | | | | | | | | | It's proving too difficult to keep on top of all the ways we might cause ODR violations that crash Chrome. I'd rather focus on other ways of running the pipelines that won't have that particular problem. Our -Fast bots will keep testing and benchmarking AVX2 raster pipelines. BUG=chromium:679147,chromium:654213,chromium:664864,chromium:666707,etc. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I35ba8f5f4303107237fd78a6ce442d7c26e5fbef Reviewed-on: https://skia-review.googlesource.com/6827 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Retry trim another instruction off SkRasterPipeline overheadGravatar Mike Klein2017-01-04
| | | | | | | | | | | | | | | | | | | This time, with manual program memory management instead of std::vector<void*>. Using STL types from SkOpts_hsw.cpp is not safe. Things like std::vector<void*> are inlined but not anonymous, so they're deduped by the linker arbitrarily. This is bad when we pick the version compiled with AVX instructions on a machine that doesn't support AVX... std::vector<Stage> was safe before because Stage itself was anonymous. While not anonymous, std::vector<Stage> is unique to the compilation unit, because you can only refer to the anonymous Stage in the compilation unit. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_chromium_asan_rel_ng;skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I015e27583b6b6ff06b5e9f63e3f40ee6b27d6dbd Reviewed-on: https://skia-review.googlesource.com/6550 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add AVX2 version of ConvolveVerticallyGravatar xiangze.zhang2016-12-07
| | | | | | | | | | | | | | | | ConvolveVertically time is reduced about 60% using haswell cpu. Nanobench results: before after bitmap_scale_filter_64_256 611us 302us bitmap_scale_filter_80_90 101us 64.9us bitmap_scale_filter_30_90 82.3us 51.4us bitmap_scale_filter_10_90 73.6us 42.4us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2526733002 CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Review-Url: https://codereview.chromium.org/2526733002
* Bring back SkRasterPipeline::run() for one-off uses.Gravatar Mike Klein2016-11-30
| | | | | | | | | CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984 Reviewed-on: https://skia-review.googlesource.com/5353 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Guard against buggy ucrt\math.h.Gravatar Mike Klein2016-11-28
| | | | | | | | | | | | | | BUG=666707 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=5089 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3bebfdf635d541d92fb84236f0f6fae2da39d691 Reviewed-on: https://skia-review.googlesource.com/5089 Reviewed-by: Bruce Dawson <brucedawson@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline::compile().Gravatar Mike Klein2016-10-25
| | | | | | | | | | | I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Move SkRasterPipeline further into SkOpts.Gravatar Mike Klein2016-10-25
| | | | | | | | | | | | | | | | | | The portable code now becomes entirely focused on enum+ptr descriptions, leaving the concrete implementation of the pipeline to SkOpts::run_pipeline(). As implemented, the concrete implementation is basically the same, with a little more type safety. Speed is essentially unchanged on my laptop, and that's having run_pipeline() rebuild its concrete state every call. There's room for improvement there if we split this into a compile_pipeline() / run_pipeline() sort of thing, which is my next planned CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3920 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ie4c554f51040426de7c5c144afa5d9d9d8938012 Reviewed-on: https://skia-review.googlesource.com/3920 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline refactorGravatar Mike Klein2016-10-20
| | | | | | | | | | | | | | | | | | | | | | - Give body and tail functions separate types. This frees a register in body functions, especially important for Windows. - Fill out default, SSE4.1, and HSW versions of all functions. This means we don't have to mess around with SkNf_abi... all functions come from the same compilation unit where SkNf is a single consistent type. - Move Stage::next() into SkRasterPipeline_opts.h as a static inline function. - Remove Stage::ctx() entirely... fCtx is literally the same thing. This is a step along the way toward building the entire pipeline in src/opts, removing the need for all the stages to be functions living in SkOpts. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3680 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot Change-Id: I7de78ffebc15b9bad4eda187c9f50369cd7e5e42 Reviewed-on: https://skia-review.googlesource.com/3680 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Try again with SkOpts_hsw and 8x pipelines, attempt 2.Gravatar Mike Klein2016-10-19
| | | | | | | | | | | | | | | Originally reviewed: https://skia-review.googlesource.com/3667 This time around, don't forget swap_src_dst. CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3676 Change-Id: I127e7fb2bf9d3bfee61c3749fc1c334c9476cb4e Reviewed-on: https://skia-review.googlesource.com/3676 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "Try again with SkOpts_hsw and 8x pipelines."Gravatar Mike Klein2016-10-19
| | | | | | | | | | | | | | | | | This reverts commit 4f02ce7995554d899cdde2b7d82f600fc8017fcc. Reason for revert: missed a stage TBR=mtklein@chromium.org,msarett@google.com,herb@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I1dc1229183d67fe72977e492977a97b19dc630d2 Reviewed-on: https://skia-review.googlesource.com/3675 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* Try again with SkOpts_hsw and 8x pipelines.Gravatar Mike Klein2016-10-19
| | | | | | | | | | | | | | | All SkNx are now in anonymous namespaces and all their methods are force-inlined. We should not have any ODR problems. This is still a near 2x speedup, more so for f16. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3667 Change-Id: I6db9a46f7164f49827ab4d7983e80bf8cea99995 Reviewed-on: https://skia-review.googlesource.com/3667 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines, attempt 2"Gravatar Mike Klein2016-10-10
| | | | | | | | | | | | This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, attempt 2Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines"Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add an SkOpts target for Haswell+ Intel chips.Gravatar Mike Klein2016-09-30
Haswell brought a whole slew of handy new instructions for us (AVX2, FMA, BMI1+BMI2) and also feature F16C, which came one generation earlier on Ivybridge. We work with integers often enough that we really want to target AVX2 instead of AVX, and this means it's pretty practical to ask for all those other goodies along with it. Chrome's GN files and Google3's BUILD file will need an update, before or after this CL. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2840 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I826daf77b5104664c5d31ddaabee347e287b87a2 Reviewed-on: https://skia-review.googlesource.com/2840 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>