| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just tidies up the .cpp files a bit, and makes it easier to make
sure all exported functions use the aapcs-vfp callig convention, which
hard-float implies.
As a small simplification, fold -march=armv7-a into --target.
No generated code changes.
Change-Id: I2694970a6e48bd69c41dd280a44ddd0029e52ae8
Reviewed-on: https://skia-review.googlesource.com/7371
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed scale_u8 is implemented in SkSplicer_stages but not _lowp.
That's not for any good reason... scale_u8 makes fine sense in _lowp.
All other stages missing in _lowp are nuts to attempt without floats.
This also renames the to_fixed15 lambdas to from_u8 functions.
Everything in the file converts to or from fixed15; the interesting
question is the other format. Similarly, from_fixed15 becomes to_u8.
Change-Id: I10616b6772c65bd1acb9857f4f5b5f70a4f01bf4
Reviewed-on: https://skia-review.googlesource.com/7323
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add lowp variants for most stages in SkSplicer. These double the number
of pixels handled by representing each channel with 16 bits, ranging from
0x0000 as 0 to 0x8000 as 1. This format lets us use the Q15 multiply
instructions available in NEON and SSSE3 at full register width, with
a little platform-specific fix up to smooth over the fact that these
aren't quite Q15 values.
When a lowp stage is unavailable, the entire pipeline upgrades to
floats. So by simply not implementing sRGB, f16, matrix multiplication,
etc, we naturally express that they're best handled with floats.
These lowp stages ended up different enough that I've found it clearer
to have them live in their own files, noting where they differ from the
float stages. HSW, aarch64, and armv7 are all supported.
I've seen very good things performance-wise on all platforms.
Change-Id: Ib4f820c6665f2c9020f7449a2b51bbaf6c408a63
Reviewed-on: https://skia-review.googlesource.com/7098
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I1d5b82c0c2748b4d206d8d104fdd5dc04dc2693b
Reviewed-on: https://skia-review.googlesource.com/7116
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Change-Id: Iafd23c860395587c77cd412a3b522ba851b4570d
Reviewed-on: https://skia-review.googlesource.com/7107
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are enough to splice interesting SkColorSpaceXform pipelines.
SkSplicer_stages.cpp is similar to but still intentionally distinct from
SkRasterPipeline_opts. I hope to unify them next week.
unaligned_load() is nothing tricky... just a little refactor.
Change-Id: I05d0fc38dac985aa351d88776ecc14d2457f2124
Reviewed-on: https://skia-review.googlesource.com/7022
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can splice these stages if we drop them down to 2 at a time.
Turns out this is significantly (2-3x) faster than the status quo.
SkRasterPipeline_…
…f16_compile 1x …srgb_compile 2.06x …f16_run 3.08x …srgb_run 4.61x
Added a couple ways to detect (likely) the required VFPv4 support:
- use hwcap when available (NDK ≥21, Android framework)
- use cpu-features when not (NDK <21)
The code in SkSplicer_generated.h is ARM, not Thumb2. SkSplicer seems
to be blx'ing into it, so that's great, and we bx lr out. There's no
point in attempting to use Thumb2 in vector heavy code... it'll all be
4 byte anyway.
Follow ups:
- vpush {d8-d9} before the loop, vpop {d8-d9} afterwards,
skip these instructions when splicing;
- (probably) drop jumping stages down to 2-at-a-time also.
Change-Id: If151394ec10e8cbd6a05e2d81808488d743bfe15
Reviewed-on: https://skia-review.googlesource.com/6940
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
set_k() is simulating the effect of loading a pointer constant into the register used to pass the 4th function argument. An easier way to do this is to pass that pointer constant as the 4th function argument... duh.
Change-Id: I5604d6bbadd86eaaa82f8c4391080f6489b1927f
Reviewed-on: https://skia-review.googlesource.com/6843
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
The new test would fail without the the change in SkSplicer.cpp to call fSpliced(x,x+body) instead of fSpliced(x,body). The rest of the changes are cosmetic, mostly renaming n to limit.
Change-Id: Iae28802d0adb91e962ed3ee60fa5a4334bd140f9
Reviewed-on: https://skia-review.googlesource.com/6837
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seems to be working. The jump to loop_start might be a little off, but not by much. Correctness is really still a big TODO.
$ adb shell 'cd /data/local/tmp; ./monobench SkRasterPipeline 200'
SkRasterPipeline_…
200 …f16_compile 1x …f16_run 1.42x …srgb_compile 2.21x …srgb_run 2.59x⏎
Change-Id: I0e1acc6404cf3ce8084d9ef8011cbe0b5f1fd6e3
Reviewed-on: https://skia-review.googlesource.com/6811
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
I think I may have cracked the compile-ahead-of-time-splice-at-runtime nut.
This compiles stages ahead of time using clang, then splices them together at runtime. This means the stages can be written in simple C++, with some mild restrictions.
This performs identically to our Xbyak experiment, and already supports more stages. As written this stands alone from SkRasterPipeline_opts.h, but I'm fairly confident that the bulk (the STAGE implementations) can ultimately be shared.
As of PS 25 or so, this also supports all the stages used by bench/SkRasterPipelineBench.cpp:
SkRasterPipeline_…
400 …f16_compile 1x …f16_run 1.38x …srgb_compile 1.89x …srgb_run 2.21x
That is, ~30% faster than baseline for f16, ~15% faster for sRGB.
Change-Id: I1ec7dcb769613713ce56978c58038f606f87d63d
Reviewed-on: https://skia-review.googlesource.com/6733
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|