aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/splicer/SkSplicer_stages.cpp
Commit message (Collapse)AuthorAge
* SkSplicer: move armv7 ABI settings into build_stages.py.Gravatar Mike Klein2017-01-21
| | | | | | | | | | | | | | | This just tidies up the .cpp files a bit, and makes it easier to make sure all exported functions use the aapcs-vfp callig convention, which hard-float implies. As a small simplification, fold -march=armv7-a into --target. No generated code changes. Change-Id: I2694970a6e48bd69c41dd280a44ddd0029e52ae8 Reviewed-on: https://skia-review.googlesource.com/7371 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkSplicer stage parityGravatar Mike Klein2017-01-20
| | | | | | | | | | | | | | | | I noticed scale_u8 is implemented in SkSplicer_stages but not _lowp. That's not for any good reason... scale_u8 makes fine sense in _lowp. All other stages missing in _lowp are nuts to attempt without floats. This also renames the to_fixed15 lambdas to from_u8 functions. Everything in the file converts to or from fixed15; the interesting question is the other format. Similarly, from_fixed15 becomes to_u8. Change-Id: I10616b6772c65bd1acb9857f4f5b5f70a4f01bf4 Reviewed-on: https://skia-review.googlesource.com/7323 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicer: lowp hackingGravatar Mike Klein2017-01-19
| | | | | | | | | | | | | | | | | | | | | | | | Add lowp variants for most stages in SkSplicer. These double the number of pixels handled by representing each channel with 16 bits, ranging from 0x0000 as 0 to 0x8000 as 1. This format lets us use the Q15 multiply instructions available in NEON and SSSE3 at full register width, with a little platform-specific fix up to smooth over the fact that these aren't quite Q15 values. When a lowp stage is unavailable, the entire pipeline upgrades to floats. So by simply not implementing sRGB, f16, matrix multiplication, etc, we naturally express that they're best handled with floats. These lowp stages ended up different enough that I've found it clearer to have them live in their own files, noting where they differ from the float stages. HSW, aarch64, and armv7 are all supported. I've seen very good things performance-wise on all platforms. Change-Id: Ib4f820c6665f2c9020f7449a2b51bbaf6c408a63 Reviewed-on: https://skia-review.googlesource.com/7098 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicer: no need for AI. Clang is good at this.Gravatar Mike Klein2017-01-17
| | | | | | | Change-Id: I1d5b82c0c2748b4d206d8d104fdd5dc04dc2693b Reviewed-on: https://skia-review.googlesource.com/7116 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkSplicer: fix a typo in srcover stage.Gravatar Mike Klein2017-01-17
| | | | | | | | Change-Id: Iafd23c860395587c77cd412a3b522ba851b4570d Reviewed-on: https://skia-review.googlesource.com/7107 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* SkSplicer: implement load_tables and matrix_3x4Gravatar Mike Klein2017-01-13
| | | | | | | | | | | | | | These are enough to splice interesting SkColorSpaceXform pipelines. SkSplicer_stages.cpp is similar to but still intentionally distinct from SkRasterPipeline_opts. I hope to unify them next week. unaligned_load() is nothing tricky... just a little refactor. Change-Id: I05d0fc38dac985aa351d88776ecc14d2457f2124 Reviewed-on: https://skia-review.googlesource.com/7022 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* some armv7 hackingGravatar Mike Klein2017-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | We can splice these stages if we drop them down to 2 at a time. Turns out this is significantly (2-3x) faster than the status quo. SkRasterPipeline_… …f16_compile 1x …srgb_compile 2.06x …f16_run 3.08x …srgb_run 4.61x Added a couple ways to detect (likely) the required VFPv4 support: - use hwcap when available (NDK ≥21, Android framework) - use cpu-features when not (NDK <21) The code in SkSplicer_generated.h is ARM, not Thumb2. SkSplicer seems to be blx'ing into it, so that's great, and we bx lr out. There's no point in attempting to use Thumb2 in vector heavy code... it'll all be 4 byte anyway. Follow ups: - vpush {d8-d9} before the loop, vpop {d8-d9} afterwards, skip these instructions when splicing; - (probably) drop jumping stages down to 2-at-a-time also. Change-Id: If151394ec10e8cbd6a05e2d81808488d743bfe15 Reviewed-on: https://skia-review.googlesource.com/6940 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicer: no need to set_k() manually.Gravatar Mike Klein2017-01-10
| | | | | | | | | set_k() is simulating the effect of loading a pointer constant into the register used to pass the 4th function argument. An easier way to do this is to pass that pointer constant as the 4th function argument... duh. Change-Id: I5604d6bbadd86eaaa82f8c4391080f6489b1927f Reviewed-on: https://skia-review.googlesource.com/6843 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicer: test and fix loop logicGravatar Mike Klein2017-01-10
| | | | | | | | | The new test would fail without the the change in SkSplicer.cpp to call fSpliced(x,x+body) instead of fSpliced(x,body). The rest of the changes are cosmetic, mostly renaming n to limit. Change-Id: Iae28802d0adb91e962ed3ee60fa5a4334bd140f9 Reviewed-on: https://skia-review.googlesource.com/6837 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicer: start on arm64Gravatar Mike Klein2017-01-10
| | | | | | | | | | | | | Seems to be working. The jump to loop_start might be a little off, but not by much. Correctness is really still a big TODO. $ adb shell 'cd /data/local/tmp; ./monobench SkRasterPipeline 200' SkRasterPipeline_… 200 …f16_compile 1x …f16_run 1.42x …srgb_compile 2.21x …srgb_run 2.59x⏎ Change-Id: I0e1acc6404cf3ce8084d9ef8011cbe0b5f1fd6e3 Reviewed-on: https://skia-review.googlesource.com/6811 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkSplicerGravatar Mike Klein2017-01-09
I think I may have cracked the compile-ahead-of-time-splice-at-runtime nut. This compiles stages ahead of time using clang, then splices them together at runtime. This means the stages can be written in simple C++, with some mild restrictions. This performs identically to our Xbyak experiment, and already supports more stages. As written this stands alone from SkRasterPipeline_opts.h, but I'm fairly confident that the bulk (the STAGE implementations) can ultimately be shared. As of PS 25 or so, this also supports all the stages used by bench/SkRasterPipelineBench.cpp: SkRasterPipeline_… 400 …f16_compile 1x …f16_run 1.38x …srgb_compile 1.89x …srgb_run 2.21x That is, ~30% faster than baseline for f16, ~15% faster for sRGB. Change-Id: I1ec7dcb769613713ce56978c58038f606f87d63d Reviewed-on: https://skia-review.googlesource.com/6733 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>