| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
The 4444 image in all_bitmap_configs now draws slightly different before
and after serialization. (It's serialized as 8888.) Still looks fine.
Change-Id: I1396cf1550b6769a1734ed25d59bd5b1866dfacd
Reviewed-on: https://skia-review.googlesource.com/65960
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Move a couple stages around in the enum to places
that make more sense, and guass_a_to_rbga in the code too.
2) mirror the SkRasterPipeline stage enum with either:
LOWP(st): the stage is implemented in low precision
TODO(st): the stage should be lowp, but isn't
NOPE(st): the stage shouldn't be done in lowp.
3) statically enforce that all stages are covered by one of
LOWP, TODO, or NOPE.
Change-Id: I06c7a7e470663ef73bf652c1b65c0d3c89f0d767
Reviewed-on: https://skia-review.googlesource.com/63800
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I5629e74c4c13ddb9217fd3c2df3388030fa03f0c
Reviewed-on: https://skia-review.googlesource.com/63780
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chrome generally uses BGRA buffers, so srcover_rgba_8888 isn't really
doing them any good. Probably a good idea to cover both kN32 options
any time we specialize like this?
There's one small diff, so I've lazily guarded this by
SK_LEGACY_LOWP_STAGES, which I want to rebaseline today anyway.
Change-Id: Ice672aa01a3fc83be0798580d6730a54df075478
Reviewed-on: https://skia-review.googlesource.com/63301
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills out a couple more matrix and gather stages.
Deletes a not particularly important unit test that was using a
scale matrix in a weird, non-lowp compatible way.
This will require guards for Blink layout tests.
Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3
Reviewed-on: https://skia-review.googlesource.com/62520
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method is a little simpler macro-wise,
and makes it easier to guard new lowp stages:
LOWP(foo)
LOWP(bar)
#ifndef SK_LEGACY_LOWP_BAZ
LOWP(baz)
#endif
Change-Id: I06392f5cf7a04651e7bf47e79f10f7da8520f5ab
Reviewed-on: https://skia-review.googlesource.com/63141
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op refactor.
It's just always surprised me that the matrix_scale_translate
stage expects [tx ty sx sy], when scales precede the translates
in the names and in both normal row-major and column-major matrix
layouts.
This switches to [sx sy tx ty], scale then translate.
Change-Id: I2d88701121ae8013facd5a28bb0ff520211db5a6
Reviewed-on: https://skia-review.googlesource.com/62541
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're going to want to assign types to the stages depending on their
inputs and outputs:
GG: x,y -> x,y
GP: x,y -> r,g,b,a
PP: r,g,b,a -> r,g,b,a
(There are a couple other degenerate cases here, where a stage ignores
its inputs or creates no outputs, but we can always just pretend their
null input or output is one type or the other arbitrarily.)
The GG stages will be pretty much entirely float code, and the GP stages
a mix of float math and byte stuff.
Since we've chosen U16 to match our register size in _lowp land,
we'll unpack each F register across two of those for transport between
stages. This is a notional, free operation in both directions.
Change-Id: I605311d0dc327a1a3a9d688173d9498c1658e715
Reviewed-on: https://skia-review.googlesource.com/60800
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
As this array grows longer it causes troublesome code generation
when we're compiling offline, but it's easy as an argument.
Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86
Reviewed-on: https://skia-review.googlesource.com/60564
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today (x,y) are the integer coordinates of the first destination pixel
we're working on. By renaming them (dx,dy), we free up the names (x,y)
for working (i.e. _source_) x and y.
Until now we've generally just been continuing to call those (r,g), but
in the _lowp code that won't be possible (r+g hold x together, b+a y)
but we'll have the ability to just give them proper names x and y.
Change-Id: Id5faa09c4406116df5df7494efc6cb23659e9a2f
Reviewed-on: https://skia-review.googlesource.com/60820
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's properly 16 today because of HSW/lowp stages handling 16 pixels at
a time, but it hasn't yet had an effect on lowp so we didn't notice.
As we add lowp shader stages this will start to matter,
so might as well bump it up to 16 now.
(One day _skx lowp stages could bump this up to 32.)
Change-Id: Idd8185c08e12dc657389a35bf659662c9670f98a
Reviewed-on: https://skia-review.googlesource.com/60565
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are non-zero values of a that make infinite 1.0f/a.
Let's just check for the real thing we care about, that
scale is finite.
Bug: skia:7123
Change-Id: If97574c9f3f2f0b73c749d0bea9aa19e6114f4d1
Reviewed-on: https://skia-review.googlesource.com/58460
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today gradient mirror and repeat don't explicitly clamp. They work fine for
normal float values, but blow up with inputs like infinity and NaN, and
those aren't hard to construct with a combination of a funky matrix and
some squaring for xy -> radius.
So explicitly clamp in each of the three matrix tilers.
This should fix the fuzz at the associated bug.
Bug: skia:7093
Change-Id: Idd44e3c7a1ed95e2b1ace8eb953b62eddeb4e00e
Reviewed-on: https://skia-review.googlesource.com/55702
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I9a140e342e7b12b1cbb09503ca8fc03016717784
Reviewed-on: https://skia-review.googlesource.com/55701
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just makes sure all the plumbing is in place to use the Skylake
Xeon subset of AVX-512 instructions. So far,
- no Windows
- no lowp
- nothing explicitly making use of AVX-512 registers or instructions
This initial pass should run essentially identically to the _hsw AVX2
code we've been using previously. Clang _does_ use AVX-512-only
instructions to implement some of the higher-level concepts we've coded,
but it's really a pretty subtle difference.
Next steps will bump N from 8 to 16 and start threading through an
AVX-512-friendly mask instead of tail. I'll also want to take a harder
look at how we do blending like if_then_else()... the default codegen
here doesn't really take advantage of AVX-512 the way I'd like here.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug
Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2
Reviewed-on: https://skia-review.googlesource.com/54062
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is something I came up with while writing _lowp.cpp.
This should all be a logical no-op, but there are some code generation
changes. I'm not exactly sure why.
Change-Id: Iaad36b5298b37fe26ebd375a147a48852f98e1e4
Reviewed-on: https://skia-review.googlesource.com/52003
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowp start_pipeline() always zeros, and with floats we always zero
when compiled as part of Skia, so this just makes the offline float
consistent with the others.
It's getting confusing to think about which code zeros and which
doesn't, and it'd be nicer to be able to rely on zeros.
This should change code generation only to the start_pipelines in
the .S files.
Change-Id: I1178b83c01e609e40dc7912d8d56df8e36eb339d
Reviewed-on: https://skia-review.googlesource.com/52001
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We look at t to create a mask in mask_2pt_conical_degenerates to be
applied later to the colors after the normal gradient stages have run.
But if t itself is NaN, that will wreak havoc in the normal gradient
stages. So in addition to building the mask to kill off degenerate
colors, let's also set degenerate t to zero, which should be a safe
value.
This fixes the fuzz mentioned in this bug.
BUG=skia:7078
Change-Id: I8301450c707bdbf941abd0339959f9e60d46d955
Reviewed-on: https://skia-review.googlesource.com/52763
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op in terms of generated code.
There is no longer a tail call here to be disabled,
not since we changed start_pipeline() to operate in 2D.
Change-Id: Ife92590eb059e28e4a84e3729180c7410a93b410
Reviewed-on: https://skia-review.googlesource.com/52020
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
This is a no-op refactor to make SkJumper_stages.cpp and
SkJumper_stages_lowp.cpp more similar.
Change-Id: Icb5dd415d105fbdc58ce0b9b63058c0a66ed4a13
Reviewed-on: https://skia-review.googlesource.com/52000
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All three image tile modes go through exclusive_clamp() and then a
gather today, so we can move the work of exclusive_clamp() into eac
gather_ stage, eliminating the need for clamp_{x,y} stages.
Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(),
which works out the pointer and vector of indices to load for gathers.
This deletes SkRasterPipeline_repeat_tiling unit test, which now
no longer exactly makes sense. It tests that repeat_x does that
clamp, but that's now done automatically outside that stage.
Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9
Reviewed-on: https://skia-review.googlesource.com/50260
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I2e24c990983ea93cbd7983c9c4e88120c2b7f358
Reviewed-on: https://skia-review.googlesource.com/49768
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Update a unit test to stop using out-of-range "colors".
Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow
Reviewed-on: https://skia-review.googlesource.com/47580
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- load_565 allows 565-src sprite blits
- scale_565 / lerp_565 allow subpixel text
- luminance_to_alpha is a color filter, and lets us write grey 8
And update CachedDecodingPixelRefTest with a yet more robust color.
Change-Id: I8af499c43f0f28093744d9c2993af553e36c9526
Reviewed-on: https://skia-review.googlesource.com/47021
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d286bfbd96f8b7ccf1cbce74f07d2f3917dbec30.
Reason for revert:
../../../src/core/SkRasterPipeline.cpp:98:34: runtime error: 4.87906e+09 is outside the range of representable values of type 'unsigned short'
Excellent new bot!
Original change's description:
> Bump stored lowp uniform color to 16-bit storage.
>
> This makes loading into 16-bit channels more natural in _lowp.cpp.
>
> Change-Id: I1ed393873654060ef52f4632d670465528006bbd
> Reviewed-on: https://skia-review.googlesource.com/47261
> Reviewed-by: Mike Reed <reed@google.com>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,reed@google.com
Change-Id: Ia65645c1261a7b31588c4ddaf2b1b3b327d265b0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/47540
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
|
|
|
|
|
|
|
|
| |
Bug: chromium:762167
Change-Id: Ia23f6dbfc0466aef4ca9d1a5b9ff343d79dc83bb
Reviewed-on: https://skia-review.googlesource.com/47460
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
This makes loading into 16-bit channels more natural in _lowp.cpp.
Change-Id: I1ed393873654060ef52f4632d670465528006bbd
Reviewed-on: https://skia-review.googlesource.com/47261
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've got independent definitions of SI, LazyCtx/Ctx, and load_and_inc()
in _stages.cpp and _lowp.cpp. It's a good time to centralize them,
taking _stages.cpp's SI and load_and_inc(), and _lowp's Ctx.
SI and load_and_inc() are uninterestingly different. But using _lowp's
Ctx will let us get its prettier typed stage definitions into
_stages.cpp, but that is not not done here.
This is a pure refactor with no generated code changes.
Change-Id: I53260b0fdc71a77bf9e3ed6f3df3a2a4cbd2392b
Reviewed-on: https://skia-review.googlesource.com/47181
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
created new file src/core/SkColorData.h for
internal consumption. Note that many of the
functions there are unused as well.
Bug: skia: 6898
R: reed@google.com
Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d
Reviewed-on: https://skia-review.googlesource.com/46848
Commit-Queue: Cary Clark <caryclark@google.com>
Reviewed-by: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I4d4093fcfc839f6e7468b7d9f89bb903186ab68d
Reviewed-on: https://skia-review.googlesource.com/46761
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Guarding loads of 8-15 with defined(__AVX2__) should prevent errors
like these:
external/skia/src/jumper/SkJumper_stages_lowp.cpp:287:46: error:
'memcpy' called with size bigger than buffer
case 12: memcpy(&v, ptr, 12*sizeof(T)); break;
The loads of 8-15 were of course unreachable, given the &(N-1) == &7.
Change-Id: Ifcb5c177c6909e1df55cb564779a4d6610ff7b32
Reviewed-on: https://skia-review.googlesource.com/46521
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have text_16_AA_FF -> 8888 (forcing RP) faster than head now on my
laptop. I'm feeling confident that we can make this perform well.
After looking at performance a bit more today, it looks like everything
is within what I'd consider comparable in performance, especially on
ARM. On x86-64 it looks like big bulk blits get a little slower and
small mask blits get a little faster.
Quality looks good, and maybe improved for 565.
There are fewer platform-specific differences now in _lowp, and I think
they're few enough now that we could even consider completing the
unification by folding the 8-bit and float code together. Rename
"div255()" to "rebias()", slap on a few coats of paint...
Guarded for Chrome with SK_JUMPER_LEGACY_LOWP.
Change-Id: I36309c07cf736f3cb31952cca66030ad56026318
Reviewed-on: https://skia-review.googlesource.com/45982
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To continue building stages, update Clang and update your GN args:
$ brew update
$ brew upgrade llvm
$ find . | grep args.gn | xargs sed -ie 's/clang-4.0/clang-5.0/g'
Some interesting codegen changes I noticed:
- ARMv7: generally better register assignment, tighter code
- ARMv7: dropped the 128-bit alignment hint when loading and storing dst "registers",
unclear why.
- HSW: now clearing the destination register before vgatherdps,
to break a dependency on the previous value
Change-Id: I4f804a4cbfcde530fad5ed535438174e852a9593
Reviewed-on: https://skia-review.googlesource.com/44241
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I26c22c085efd70b65de927a9a8a041d03c170f2a
Reviewed-on: https://skia-review.googlesource.com/42760
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
| |
Because floats are fun, the compiler cannot merge x + 0.5f +
[0,1,2,3,4...] into x + [0.5,1.5,2.5,3.5,4.5,...]. But we can.
Change-Id: I03b46c1ea0653877f35f6c888f29371b5f73d813
Reviewed-on: https://skia-review.googlesource.com/42480
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
approx 2.5x faster on arm64 for sprite 8888 --> 565 blits
Bug: skia:
Change-Id: I524f993fee16196385dc07cbec39ef378b1301e5
Reviewed-on: https://skia-review.googlesource.com/41162
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Shouldn't be anything tricky here.
Guarded by SK_JUMPER_LEGACY_X86_8BIT for (Win) layout tests.
Change-Id: I7580c7c18d1721f1301904c049ea2e59e9bda5d9
Reviewed-on: https://skia-review.googlesource.com/40692
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not sure why I wrote this to use a Params struct originally, but we
should have plenty of registers in _8bit to pass everything directly and
avoid the stack. Even once we enable the 8-bit pipeline on 32-bit x86,
we'll have 4 general purpose registers and 4 vector registers to use,
precisely what we're using here.
Change-Id: I3e51ab73186edcdcb8bfaa6cc99d9516db7c032a
Reviewed-on: https://skia-review.googlesource.com/40771
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to make two changes to keep all values in registers:
1) pass raw U8 values instead of V structs that wrap them
2) switch to aapcs-vfp, which allows exactly 8x U8 arguments
Code generation goes from ridiculous looking to lovely.
Change-Id: Ibd53bdd86345b59bd987a1f79205645d80c5cbc3
Reviewed-on: https://skia-review.googlesource.com/40021
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can approximate (xy + 127) / 255 with (xy + 255) / 256.
On ARM this divide-by-255 is a single instruction, one of the two we use today
to do a perfect divide-by-255 (#if 0). This cuts div-255 in half, or a full
mul-div-255 by a third. The U16(255) constant can even be created in a single
instruction without hitting memory, which is as good as it gets.
Here's a nice little example:
0000000000000000 <sk_premul_8bit>:
0: f8408404 ldr x4, [x0], #8 // Load the next stage.
4: 2e23c000 umull v0.8h, v0.8b, v3.8b // r = r * a
8: 6f02e6b0 movi v16.2d, #0xff00ff00ff00ff // create U16(255)
c: 2e23c021 umull v1.8h, v1.8b, v3.8b // g = g * a
10: 2e23c042 umull v2.8h, v2.8b, v3.8b // b = b * a
14: 0e304000 addhn v0.8b, v0.8h, v16.8h // r = div255(r)
18: 0e304021 addhn v1.8b, v1.8h, v16.8h // g = div255(g)
1c: 0e304042 addhn v2.8b, v2.8h, v16.8h // b = div255(b)
20: d61f0080 br x4 // JUMP!
Change-Id: I4224ed3844abf6c67d9e42b67444a60f4aee8f08
Reviewed-on: https://skia-review.googlesource.com/40121
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only reason we were keeping SkJumper_constants around is that it was
hard to get float/integer iota vectors on arm64 without relocations.
Now that we're compiling arm64 normally as part of Skia, we don't have
to worry about relocations.
This means we can kill the struct and stop passing around that pointer.
Change-Id: I013c6a735947f3db2bc87f2bfa38b7520d2e2fce
Reviewed-on: https://skia-review.googlesource.com/40200
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our interlaced approach works pretty well for x86, but on ARM we're a
lot better off deinterlacing in loads and reinterlacing in stores.
This leaves the stages mostly looking like the float stages, and cuts
out some awkward parts from the code generation.
Diffs are all invisible. Performance is noticeably better for some
blend modes like Overlay.
Change-Id: Ie599e823602bfd14552de78df44a621aea66e1a2
Reviewed-on: https://skia-review.googlesource.com/40100
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't really use anything very ARMv8 specific in the 8-bit NEON
stages, so we can just naturally extend what we're doing to ARMv7 too.
Note that unlike the float stages, we're not requiring VFPv4 either,
just NEON. VFPv4 is for FMA and F16<->F32 conversion, both of which are
unnecessary for the integer pipeline.
GMs and perf improvement are similar to the previous ARMv8 change.
Change-Id: Id618801ea1920564c1deee144a640a4133c4505f
Reviewed-on: https://skia-review.googlesource.com/39840
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 6d13575108299951ecdfba6d85c915fcec2bc028.
Now with guards for "errors" like this:
external/skia/src/jumper/SkJumper_stages_8bit.cpp:240:50: error:
'memcpy' called with size bigger than buffer
case 12: memcpy(&v, src, 12*sizeof(T)); break;
This code is unreachable and generally removed by Clang's optimizer
anyway... as far as I can tell the code generation diff is arbitrary.
Change-Id: I6216567caaa6166f71258bd25343a09e93892a10
Reviewed-on: https://skia-review.googlesource.com/39961
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 08133583d5e1cdfdcc41b4bb078fcfb64137f058.
Reason for revert: Blocking Android Autoroller on compile error.
Original change's description:
> 8-bit jumper on armv8
>
> The GM diffs are all minor and what you'd expect.
>
> I did a quick performance sanity check, which also looks fine.
>
> $ out/ok bench rp filter:search=Modulate
> [blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100
> [blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100
> ~~~>
> [blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100
> [blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100
>
> This isn't even really the fastest we can make 8-bit go on ARMv8;
> it's actually much more natural to work de-interlaced there. Lots
> of room to follow up.
>
> Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7
> Reviewed-on: https://skia-review.googlesource.com/39740
> Reviewed-by: Florin Malita <fmalita@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
TBR=mtklein@chromium.org,herb@google.com,fmalita@chromium.org,reed@google.com
Change-Id: I71425d8b7fbb66be5cb50025871dd81358111da4
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/39980
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Derek Sollenberger <djsollen@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GM diffs are all minor and what you'd expect.
I did a quick performance sanity check, which also looks fine.
$ out/ok bench rp filter:search=Modulate
[blendmode_rect_Modulate] 30.2ms @0 32ms @95 32ms @100
[blendmode_mask_Modulate] 12.6ms @0 12.6ms @95 14.5ms @100
~~~>
[blendmode_rect_Modulate] 11.2ms @0 11.7ms @95 12.4ms @100
[blendmode_mask_Modulate] 10.5ms @0 23.6ms @95 23.9ms @100
This isn't even really the fastest we can make 8-bit go on ARMv8;
it's actually much more natural to work de-interlaced there. Lots
of room to follow up.
Change-Id: I86b1099f6742bcb0b8b4fa153e85eaba9567cbf7
Reviewed-on: https://skia-review.googlesource.com/39740
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Replace a couple commas with semicolons.
2) Make sure to zero a couple vectors.
1) has no effect on code generation.
2) does add a bunch of self-vxorps, but they're cheap and we already do
the equivalent for <AVX SSE code, and they're in not very
performance-critical routines. We could circle back and guard these
with !defined(JUMPER_IS_OFFLINE) if we really need the vectors to start
uninitialized for speed.
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Debian9-Clang-x86_64-Release-Fast
Change-Id: I1a13f3eb28d664dbc345d71c3adbc62be5ff7c45
Reviewed-on: https://skia-review.googlesource.com/39661
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The baseline compiled into Skia is now pretty much identical.
Minor diffs due to the offline code using -ffp-contract=fast, and the
baseline not. Explicit calls to fma() are still FMAs, but we're no
longer letting the compiler uncover FMAs we didn't explicitly call out.
If this goes well, we should be able to turn on the 8-bit pipeline.
Change-Id: I8f73157cfce7373574c20f6435fe86b46477afa9
Reviewed-on: https://skia-review.googlesource.com/39520
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whether JUMPER is defined is starting to get a little overloaded:
- are we compiling offline (defined) or as part of Skia (!defined)?
- are we using Clang vector extensions (defined) or scalars (!defined)?
This splits JUMPER into these two separate concerns:
- JUMPER_IS_OFFLINE
- JUMPER_IS_SCALAR, JUMPER_IS_NEON, JUMPER_IS_AVX2, etc.
The upshot is that we'll now use Clang vector extensions when available
for our "portable" baseline. On x86-64 and ARMv8 compiled by Clang,
we're guaranteed to pick up SSE2 and NEON respectively. Our -Fast
bot should even get all the way to AVX2.
Another CL will do some refactoring in SkJumper to remove the redundant
copies of guaranteed vector code on x86-64 and ARMv8. I didn't want to
do that here yet to demonstrate that there is zero effect on the .S
files from this CL.
Change-Id: Ib5e8f00b35e8721b2cc7180e294840ffaf9dddce
Reviewed-on: https://skia-review.googlesource.com/39500
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most interesting parts of this are how plus interacts with partial
coverage. Plus needs its clamp to happen after the lerp.
Luckily, some of its math folds away:
d' = clamp[ d*(1-c) + (s+d)*c ] ==
clamp[ d - dc + sc + dc ] ==
clamp[ d + sc ]
What's nice there is that coverage can be folded into the src term.
This suggests that we can re-write the plus stage to clamp internally
(and thus, be viable for 8-bit) if we always pre-scale with coverage.
We don't have a way to pre-scale with 565 coverage until now, but
it's only a step or two away from there. We can use the alternate
formulation we derived for alpha for lerp_565, calculating the alpha
coverage from red, green, and blue coverages _and_ the values of src
and dst alpha.
While we already pre-scale srcover today for 8-bit or constant coverage,
we cannot do the same for 565. When evaluating the expression
d' = s + (1-a)d
we need the a term to be pre-scaled with red's coverage when calculating
dr', with blue's when calculating db', etc. Essentially we need to
carry around a bunch of extra values, and we've got no way to do that.
So instead, we'll just carefully pre-scale plus with any coverage, and
keep post-lerping srcover when we have 565 coverage.
Change-Id: I7a7a52eec7d482e1b98bb8a01ea0a3d5e67bef65
Reviewed-on: https://skia-review.googlesource.com/38300
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
|