| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
histogram of test skps:
black: 1/7
white: 2/7
other: 4/7
Bug: skia:
Change-Id: I3a092899d31ce87837e66e5c8ea9ec5e0f239361
Reviewed-on: https://skia-review.googlesource.com/21408
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a start to eliminating swap_rb as a stage.
I've just hit the main hot spots here. Going to look into
the ~dozen other spots to see how they should work next.
Change-Id: I26fb46a042facf7bd6fff3b47c9fcee86d7142fd
Reviewed-on: https://skia-review.googlesource.com/20982
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf
Reviewed-on: https://skia-review.googlesource.com/20988
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Bug: skia:
Change-Id: I75d82ef2226c5f116b7de2208c4e914739414b6d
Reviewed-on: https://skia-review.googlesource.com/20984
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally stages take care of state setup themselves, either with
seed_shader, constant_color, a load, etc. I think these zeros may
be unnecessarily cautious.
This can't make anything draw more correctly, but it could make things
- draw wrong
- draw more slowly
- draw more quickly
so it's an interesting thing to try and keep an eye on.
Change-Id: I7e5ea3cd79e55a65e1dbd214601e147ba3815b87
Reviewed-on: https://skia-review.googlesource.com/20976
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Debug-MSAN
Change-Id: Id53279c17589b3434629bb644358ee238af8649f
Reviewed-on: https://skia-review.googlesource.com/20269
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No reason to keep going one at a time when we know there are generally
better ways to handle loading a power-of-two number of low lanes.
This strategy scales up too, with quick answers for 8 (one 8 byte load),
12 (one 8 byte, one 4 byte), etc.
$ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300
Before: 46.946ns
After: 43.341ns
(This happens to be _lowp. Expect similar small speedups elsewhere.)
Change-Id: I08f87769ea3c9f06ad13d2b1d5326e542b9b63a8
Reviewed-on: https://skia-review.googlesource.com/20903
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors {from,to}_{byte,8888} to lean a bit more on the compiler,
and to share code between the two. The algorithm is not exactly the
same, but it's comparable, and the results of course are identical.
This new algorithm is a lot easier to generalize to AVX2, and parallels
the full-precision {from,to}_{byte,8888} functions in _stages.cpp.
Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8
Reviewed-on: https://skia-review.googlesource.com/20828
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't change the generated code (no .S files change),
but it does rephrase what we're trying to do to make it
generalize to AVX2 better:
- load 4 floats
- add 256.0f to each
- splat out the low 2 bytes of each 4 byte lane as r,g,b,a
Change-Id: Iadc5bc1f2a268679d1ccadd31cd24949a71e0aa4
Reviewed-on: https://skia-review.googlesource.com/20270
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
JUMPER is always defined in that file;
we never use it as a portable fallback.
Change-Id: Ic7caf726191599d4058adbf80084ede9f80676ee
Reviewed-on: https://skia-review.googlesource.com/20271
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have figured out how to implement lowp clamp_1/clamp_a, and
implementing clamp_1 would make lowp plus active.
But... the way we have factored blend modes requires us to be able to
lerp between the dst and possibly-out-of-range src values. This is not
possible in lowp. If we try to multiply with values in [0x8001,0xffff],
we'll just get garbage. We'll clamp them back in range, but sadly
clamped garbage is still garbage.
So the simplest thing to do is keep plus blends in floats. This CL
doesn't even change that... we'd use floats before and after it. It
just removes the lowp plus stage code that is both dead and buggy.
As far as I can tell, no other drawing is currently gated by lowp
missing clamp_1 or clamp_a.
Change-Id: I55b73c840614f1bff9cd610dff90ca5e2b5c73e5
Reviewed-on: https://skia-review.googlesource.com/19909
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the same logic from constant_color, covering all the other
places where we convert from float to fixed, e.g. scale_1_float.
This isn't quite ideal yet. We replace mulss+cvttss2si for addss+movd,
which is great, but this leads to a silly sequence of code:
addss %xmm2, %xmm0
movd %xmm0, %r9d
movd %r9d, %xmm0
pshuflw $0x0, %xmm0, %xmm0
Those two movd are pointless...
Again, all diffs due to switching from truncation to rounding.
Change-Id: Icf6f3b6eb370fe41cea0cebcfda0b8907e055f41
Reviewed-on: https://skia-review.googlesource.com/18846
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This is as good as we can get without switching away from float inputs.
All diffs due to rounding (from the +256.0f).
Change-Id: I0d314f111d313577ce9078660178be17e865f11e
Reviewed-on: https://skia-review.googlesource.com/18845
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I8a292bc98135b41ceedb4242451436c3657616fc
Reviewed-on: https://skia-review.googlesource.com/18722
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
| |
Change-Id: Id62e989d4278f273c040b159ed4d2fd6a2f209e0
Reviewed-on: https://skia-review.googlesource.com/18627
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
srcover_rgba_8888, lerp_u8, lerp_1_float, scale_u8, scale_1_float...
this is enough for _lots_ of drawing.
Change-Id: Ibe42adb8b1da6c66db3085851561dc9070556ee3
Reviewed-on: https://skia-review.googlesource.com/18622
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
This is enough for us to do some really simple draws.
Also add some debug tools to help prioritize porting.
Change-Id: I334f8fd2133be1aeec3f3406371a81aa6c184776
Reviewed-on: https://skia-review.googlesource.com/18597
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is enough to run the bench SkRasterPipeline_compile.
$ ninja -C out monobench; and out/monobench SkRasterPipeline_compile 300
Before: 300 SkRasterPipeline_compile 48.4858ns
After: 300 SkRasterPipeline_compile 37.5801ns
Change-Id: Icb80348908dfb016826700a44566222c9f7a853c
Reviewed-on: https://skia-review.googlesource.com/18595
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
We can use 2 pshufb to replace 4 unpacks when deinterlacing the colors.
Change-Id: I713fbbc94f5cb9eaf14f85323b0ec76dc2246e98
Reviewed-on: https://skia-review.googlesource.com/18531
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is spooky.
I don't quite yet understand why, but this makes things much faster.
Performance regressed across the board when we no longer needed the
value and changed it to return void:
https://perf.skia.org/e/?begin=1496176469&keys=6994&xbaroffset=28513
You can see similar regressions following this Chromium bug link.
BUG=chromium:729237
Change-Id: I68371b0456014f909acf819aca52aa4f4f187460
Reviewed-on: https://skia-review.googlesource.com/18580
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
Just 3 stages implemented so far:
load_8888
swap_rb
store_8888
That's enough to make the shortest non-trivial pipeline
that you see in the new unit test.
Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc
Reviewed-on: https://skia-review.googlesource.com/18458
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|