| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1317233005/ )
Reason for revert:
http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.8-Clang-Arm7-Debug-Android/builds/4627
Original issue's description:
> SkPx: new approach to fixed-point SIMD
>
> SkPx is like Sk4px, except each platform implementation of SkPx can declare
> a different sweet spot of N pixels, with extra loads and stores to handle the
> ragged edge of 0<n<N pixels.
>
> In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so
> we can now use NEON's transposing loads and stores, and _none is just 1.
> This makes operations involving alpha considerably more efficient on NEON,
> as alpha is its own distinct 8x8 bit plane that's easy to toss around.
>
> This incorporates a few other improvements I've been wanting:
> - no requirement that we're dealing with SkPMColor. SkColor works too.
> - no anonymous namespace hack to differentiate implementations.
>
> Codegen and perf look good on Clang/x86-64 and GCC/ARMv7.
> The NEON code looks very similar to the old NEON code, as intended.
> No .skp or GM diffs on my laptop. Don't expect any.
>
> I intend this to replace Sk4px. Plan after landing:
> - port SkXfermode_opts.h
> - port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other
> SkOpts code)
> - delete all Sk4px-related code
> - clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.)
> leaving Sk2f, Sk4f (and Sk2s, Sk4s).
> - find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels
> at a time.
>
> In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels.
>
> BUG=skia:4117
>
> Committed: https://skia.googlesource.com/skia/+/82c93b45ed6ac0b628adb8375389c202d1f586f9
TBR=mtklein@google.com,msarett@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1336423002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SkPx is like Sk4px, except each platform implementation of SkPx can declare
a different sweet spot of N pixels, with extra loads and stores to handle the
ragged edge of 0<n<N pixels.
In this case, _sse's sweet spot remains 4 pixels. _neon jumps up to 8 so
we can now use NEON's transposing loads and stores, and _none is just 1.
This makes operations involving alpha considerably more efficient on NEON,
as alpha is its own distinct 8x8 bit plane that's easy to toss around.
This incorporates a few other improvements I've been wanting:
- no requirement that we're dealing with SkPMColor. SkColor works too.
- no anonymous namespace hack to differentiate implementations.
Codegen and perf look good on Clang/x86-64 and GCC/ARMv7.
The NEON code looks very similar to the old NEON code, as intended.
No .skp or GM diffs on my laptop. Don't expect any.
I intend this to replace Sk4px. Plan after landing:
- port SkXfermode_opts.h
- port Color32 in SkBlitRow_D32.cpp (and move to SkBlitRow_opts.h like other
SkOpts code)
- delete all Sk4px-related code
- clean up evolutionary dead ends in SkNx (Sk16b, Sk16h, Sk4i, Sk4d, etc.)
leaving Sk2f, Sk4f (and Sk2s, Sk4s).
- find a machine with AVX2 to work on, write SkPx_avx2.h handling 8 pixels
at a time.
In the end we'll have Sk4f for float pixels, SkPx for fixed-point pixels.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1317233005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:40001 of https://codereview.chromium.org/1333983002/ )
Reason for revert:
Unexpected perf impact, and a whole bunch of new images in gold (mostly invisibly different).
Original issue's description:
> use new shuffle to speed up affine matrix mappts
>
> sse: 25 -> 18
> neon: 95 -> 86
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e70afc9f48d00828ee6b707899a8ff542b0e8b98
TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1335003002
|
|
|
|
|
|
|
|
|
| |
sse: 25 -> 18
neon: 95 -> 86
BUG=skia:
Review URL: https://codereview.chromium.org/1333983002
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to express shuffles more directly in code while also giving us a
convenient point to platform-specify particular shuffles for particular types.
No specializations yet. Everyone just uses the (pretty good) default option.
BUG=skia:
Review URL: https://codereview.chromium.org/1301413006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No changes to the code, just moved around.
This will have the effect of enabling vectorized code on ARMv7.
Should be no effect on ARMv8 or x86, which would have been vectorized already.
nanobench --match mappoints changes on Nexus 5 (ARMv7):
_affine: 132 -> 95
_scale: 118 -> 47
_trans: 60 -> 37
A teaser:
We should next look at the ABCD->BADC shuffle we've noted that we need in _affine. A quick hack showed doing that optimally is another ~35% speedup on x86. Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there. Good reason to resurrect that SkNx_shuffle() CL!
(I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.)
BUG=skia:4117
Review URL: https://codereview.chromium.org/1320673014
|
|
|
|
|
|
|
|
|
|
| |
This was a pre-SkOpts attempt that we can bring under its wing now.
This should be a perf no-op, deo volente.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1314863006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gives SkOncePtr a non-trivial destructor that uses std::default_delete
by default. This is overrideable, as seen in SkColorTable.
SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.
BUG=skia:
No public API changes.
TBR=reed@google.com
Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c
Review URL: https://codereview.chromium.org/1322933005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1322933005/ )
Reason for revert:
Breaks Chrome roll.
obj/skia/ext/skia_chrome.skia_memory_dump_provider.o
does not have -I include/private on its include path, but transitively includes SkMessageBus.h.
Original issue's description:
> Port uses of SkLazyPtr to SkOncePtr.
>
> This gives SkOncePtr a non-trivial destructor that uses std::default_delete
> by default. This is overrideable, as seen in SkColorTable.
>
> SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.
>
> BUG=skia:
>
> No public API changes.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/a1254acdb344174e761f5061c820559dab64a74c
TBR=herb@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1334523002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gives SkOncePtr a non-trivial destructor that uses std::default_delete
by default. This is overrideable, as seen in SkColorTable.
SK_DECLARE_STATIC_ONCE_PTR still just leaves its pointers hanging at EOP.
BUG=skia:
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1322933005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As you'll see from the BUG line, we have a strong indication that the new Sk4px
methods regress some devices. This restores the old code back as literally as possible
while still fitting in SkOpts framework.
This is ideally temporary breathing room.
We should get an early indication of if those bugs will improve by watching https://perf.skia.org/#4004
BUG=skia:4117,525844,519596,524149
Review URL: https://codereview.chromium.org/1312763009
|
|
|
|
|
|
|
|
| |
(Sk4f(float) is statically initializable, unlike the old SkPMFlor(SkPMColor).)
BUG=skia:4117
Review URL: https://codereview.chromium.org/1317593007
|
|
|
|
|
|
|
|
| |
BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1312053004
|
|
|
|
|
|
|
|
|
|
|
|
| |
This switches over SkXfermodes_opts.h and SkColorMatrixFilter to use Sk4f,
and converts the SkPMFloat benches to Sk4f benches.
No pixels should change here, and no code beyond the Sk4f_ benches should change speed.
The benches are faster than the old versions.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1324743002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lets us avoid conversions to [0.0, 1.0] space and rounding that aren't necessary
for SkColorCubeFilter_opts.h.
Dropping rounding on the way back to bytes means we'll see a bunch of off-by-1 diffs.
Rough perf effect:
SSSE3: 110 -> 93 (~15%)
NEON: 465 -> 375 (~20%)
This is the beginning of the end for SkPMFloat as an entity distinct from Sk4f.
I've kept it for now so I can convert sites one by one and think about how things
that really want to keep PM color order will work.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1319413003
|
|
|
|
|
|
| |
DOCS_PREVIEW= https://skia.org/?cl=1316233002
Review URL: https://codereview.chromium.org/1316233002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SkPMFloat(0) and SkPMFloat(0,0,0,0) end up with the same value,
but the first goes through math to get there. The second is a lot more
transparent to the compiler, and should compile all the way down to
just `xorps xmmN,xmmN` or even be optimized away.
Didn't measure any additional benefit from hoisting the zero outside
the loop and writing `SkPMFloat color = zero;`.
Perf win is <2%.
BUG=skia:
Review URL: https://codereview.chromium.org/1314763007
|
|
|
|
|
|
| |
DOCS_PREVIEW= https://skia.org/?cl=1316123003
Review URL: https://codereview.chromium.org/1316123003
|
|
|
|
|
|
|
|
|
|
|
| |
We used to split the NEON code this way, and just had one path for SSE.
It's unclear to me testing locally if there's any major win here, but there's at least a small one.
No pixel diffs or even any math changes, just folding constants through.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1304373006
|
|
|
|
|
|
|
|
| |
This is about a 12% improvement on my desktop, from 134 to 118ms on our bench.
BUG=skia:
Review URL: https://codereview.chromium.org/1295873004
|
|
|
|
|
|
|
|
|
| |
Annoyingly our test bot that forces SkPMFloat_none is a Linux bot using
BGRA SkPMColors, so we'd never notice the bug there.
BUG=skia:
Review URL: https://codereview.chromium.org/1296383006
|
|
|
|
|
|
|
|
|
|
| |
patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001)
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/2d141ba2df8f7506848aa9369f502944e837cd09
Review URL: https://codereview.chromium.org/1288323004
|
|
|
|
|
|
|
|
| |
Remember failed attempt https://codereview.chromium.org/1286093004/ ? I think this one is simpler and safer and even technically legal C++.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1296183004
|
|
|
|
|
|
|
|
| |
portable -> default, and everyone gets an sk_ prefix.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1299013003
|
|
|
|
|
|
|
|
| |
patch from issue 1273033005 at patchset 120001 (http://crrev.com/1273033005#ps120001)
BUG=skia:
Review URL: https://codereview.chromium.org/1288323004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's sort of pointless: all our clients that will have SSE2 at runtime have it
unconditionally at compile time, so the functions in namespace portable will
pick up the SSE2 code. The procs in SkOpts_sse2.o were just duplicate code.
A couple of the procs we had in _sse2.cpp can benefit slightly when compiled
with SSSE3. I've moved those to _ssse3.cpp. This should lead to small speedups
on platforms like Linux and Windows that have a baseline of SSE2.
Similarly, I've removed the call to Init_neon() when NEON is available globally... it's a no-op.
Renaming namespace portable to something clearer is TBD.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1294213002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa]
so we ended up having to write weird little lambda shims to match impedance.
There's no reason for these to disagree, and d,s[,aa] is the One True Order
(because no matter what you're doing in graphics, there's always a dst).
Should be no perf or image diff, though I'm suspicious it might help MSVC code generation.
BUG=skia:4117
Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f
CQ_EXTRA_TRYBOTS=client.skia:Test-Win8-MSVC-ShuttleB-CPU-AVX2-x86-Release-Trybot
Review URL: https://codereview.chromium.org/1289903002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#1 id:1 of https://codereview.chromium.org/1289903002/ )
Reason for revert:
?
Original issue's description:
> Normalize SkXfermode_opts.h argument order as d,s[,aa].
>
> At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa]
> so we ended up having to write weird little lambda shims to match impedance.
>
> There's no reason for these to disagree, and d,s[,aa] is the One True Order
> (because no matter what you're doing in graphics, there's always a dst).
>
> Should be no perf or image diff, though I'm suspicious it might help MSVC code generation.
>
> BUG=skia:4117
>
> Committed: https://skia.googlesource.com/skia/+/6028a8476504022fe40b6870b1460b5e4a80969f
TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1284363002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
id:1 of https://codereview.chromium.org/1286093004/ )
Reason for revert:
Maybe causing test / gold problems?
Original issue's description:
> Refactor to put SkXfermode_opts inside SK_OPTS_NS.
>
> Without this refactor I was getting warnings previously about having code
> inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
> code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
>
> That low-level code was in an anonymous namespace to allow multiple independent
> copies of its methods to be instantiated without the linker getting confused /
> offended about violating the One Definition Rule. This was only happening in
> Debug mode where the methods were not being inlined.
>
> To fix this all, I've force-inlined the methods of the low-level code and
> removed the anonymous namespace.
>
> BUG=skia:4117
>
>
> [1] Here is what those errors looked like:
>
> In file included from ../../../../src/core/SkOpts.cpp:18:0:
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
> class Sk4pxXfermode : public SkProcCoeffXfermode {
> ^
> ../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
> ../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
> class SkPMFloatXfermode : public SkProcCoeffXfermode {
> ^
> cc1plus: all warnings being treated as errors
>
> Committed: https://skia.googlesource.com/skia/+/b07bee3121680b53b98b780ac08d14d374dd4c6f
TBR=djsollen@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1284333002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At head they're s,d[,aa] in SkXfermode_opts.h but Sk4px::Map* expect d,s[,aa]
so we ended up having to write weird little lambda shims to match impedance.
There's no reason for these to disagree, and d,s[,aa] is the One True Order
(because no matter what you're doing in graphics, there's always a dst).
Should be no perf or image diff, though I'm suspicious it might help MSVC code generation.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1289903002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this refactor I was getting warnings previously about having code
inside namespace SK_OPTS_NS (e.g. namespace sse2, namespace neon) referring to
code inside an anonymous namespace (Sk4px, SkPMFloat, Sk4f, etc) [1].
That low-level code was in an anonymous namespace to allow multiple independent
copies of its methods to be instantiated without the linker getting confused /
offended about violating the One Definition Rule. This was only happening in
Debug mode where the methods were not being inlined.
To fix this all, I've force-inlined the methods of the low-level code and
removed the anonymous namespace.
BUG=skia:4117
[1] Here is what those errors looked like:
In file included from ../../../../src/core/SkOpts.cpp:18:0:
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fProc4' whose type uses the anonymous namespace [-Werror]
class Sk4pxXfermode : public SkProcCoeffXfermode {
^
../../../../src/opts/SkXfermode_opts.h:193:7: error: 'portable::Sk4pxXfermode' has a field 'portable::Sk4pxXfermode::fAAProc4' whose type uses the anonymous namespace [-Werror]
../../../../src/opts/SkXfermode_opts.h:235:7: error: 'portable::SkPMFloatXfermode' has a field 'portable::SkPMFloatXfermode::fProcF' whose type uses the anonymous namespace [-Werror]
class SkPMFloatXfermode : public SkProcCoeffXfermode {
^
cc1plus: all warnings being treated as errors
Review URL: https://codereview.chromium.org/1286093004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Local SKP nanobenching ranges SSE between 1.05x and 0.87x, much more heavily weighted toward <1.0x ratios (speedups).
I profiled the top five regressions (1.05x-1.01x) and they look like noise. Will follow up after broad bot results.
NEON looks similar but less extreme than SSE changes, ranging between 1.02x and 0.95x, again mostly speedups in 0.99x-0.97x range.
The old code trifurcated into black, opaque-but-not-black, and general versions as a function of the constant src color. I did not see a significant difference between general and opaque-but-not-black, and I don't think a black version would be faster using SIMD. So we have here just one version of the code, the general version.
Somewhat fantastically, I see no pixel diffs on GMs or SKPs.
I will be following up with more CLs for the other procs called by SkBlitMask.
BUG=skia:
Review URL: https://codereview.chromium.org/1278253003
|
|
|
|
|
|
| |
BUG=skia:4117
Review URL: https://codereview.chromium.org/1273203002
|
|
|
|
|
|
|
|
| |
As I begin to wade in here, it's nice to remove as much code as possible.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1277953002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pretty vanilla translation. I cleaned up who calls whom a little.
Used to be utils -> opts -> utils, now it's just utils -> opts.
I may follow up with a pass over the NEON code for readability
and to clean up dead code.
This turns on NEON A8->R11EAC conversion for ARMv8.
Unit tests which now hit the NEON code still pass.
I can't find any related bench.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1273103002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Procedure:
$ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -dM -E - < /dev/null | sort > vanilla.mips
$ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -mdsp -dM -E - < /dev/null | sort > dsp.mips
$ platform_tools/android/toolchains/ndk-r10c-mips-darwin_v14/bin/mipsel-linux-android-gcc -mdspr2 -dM -E - < /dev/null | sort > dspr2.mips
$ diff vanilla.mips dsp.mips
239a240,241
> #define __mips_dsp 1
> #define __mips_dsp_rev 1
$ diff vanilla.mips dspr2.mips
239a240,242
> #define __mips_dsp 1
> #define __mips_dsp_rev 2
> #define __mips_dspr2 1
So, defined(__mips_dsp) -> SK_MIPS_HAS_DSP, defined(__mips_dspr2) -> SK_MIPS_HAS_DSPR2.
BUG=skia:
Review URL: https://codereview.chromium.org/1274873002
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nothing too fancy.
Direction enums become enum classes so they don't get all confused. An
alternative is to create one single Direction enum that both blur and
morphology opts use.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1267343004
|
|
|
|
|
|
|
|
|
| |
This organizes memset16, memset32, and rsqrt the same way as the other code. No functional change.
BUG=skia:4117
R=djsollen@google.com
Review URL: https://codereview.chromium.org/1264423002 .
|
|
|
|
|
|
|
| |
BUG=skia:4117
R=mtklein@google.com
Review URL: https://codereview.chromium.org/1262213005 .
|
|
|
|
|
|
|
|
|
|
|
|
| |
+268 -535 lines
I also rearranged the code a little bit to encapsulate itself better,
mostly replacing static helper functions with lambdas. This also
let me merge the SSE2 and SSE4.1 code paths.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1264103004
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Renames Sk4pxXfermode.h to SkXfermode_opts.h,
and refactors it a tiny bit internally.
This moves xfermode optimization from being "compile-time everywhere but NEON"
to simply "runtime everywhere". I don't anticipate any effect on perf or
correctness.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1264543006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this new arrangement, the benefits of inlining sk_memset16/32 have changed.
On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.
We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.
So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1270573002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables the NEON sk_float_rsqrt() code for configurations that have NEON at run-time but not compile-time.
These devices will see about a 2x (1.26 -> 2.33) slowdown in sk_float_rsqrt(), but it should be more precise than our portable fallback.
(When inlined, the portable fallback and the NEON code are almost identical in speed. The only difference is precision. Going through a function pointer is causing all this slowdown. This is a good example of a place where Skia really benefits from compile-time NEON.)
BUG=skia:4117,skia:4114
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1264893002
|
|
|
|
|
|
|
|
|
|
| |
This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.
BUG=skia:4117
Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827
Review URL: https://codereview.chromium.org/1255193002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(patchset #2 id:20001 of https://codereview.chromium.org/1229673008/)
Reason for revert:
This doesn't draw correctly, e.g. our GM test named dashcubics.
Good: https://gold.skia.org/img/images/0f7e8e226379afbad8a700e0a80fd8f1.png
Bad:
https://gold.skia.org/img/images/56ce15fc67436065a3db4b8ee31f13ae.png
Original issue's description:
> Optimize RGB16 blitH functions with NEON for ARM platform.
>
> Here are some performance resultsi on Nexus 9:
> SkRGB16BlitterBlitH_neon:
> +--------+-----------+
> |height | C/NEON |
> +--------+-----------+
> |1 | 0.888531 |
> +--------+-----------+
> |8 | 1.231800 |
> +--------+-----------+
> |18 | 1.073327 |
> +--------+-----------+
> |32 | 1.136991 |
> +--------+-----------+
> |76 | 1.174638 |
> +--------+-----------+
> |85 | 1.188551 |
> +--------+-----------+
> |120 | 1.180261 |
> +--------+-----------+
> |128 | 1.183726 |
> +--------+-----------+
> |512 | 1.220806 |
> +--------+-----------+
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/6c72d5740231f47c664a8e765a8df05cd124c88c
TBR=djsollen@google.com,caryclark@google.com,reed@google.com,bero@linaro.com,yang.zhang@linaro.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1268513003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here are some performance resultsi on Nexus 9:
SkRGB16BlitterBlitH_neon:
+--------+-----------+
|height | C/NEON |
+--------+-----------+
|1 | 0.888531 |
+--------+-----------+
|8 | 1.231800 |
+--------+-----------+
|18 | 1.073327 |
+--------+-----------+
|32 | 1.136991 |
+--------+-----------+
|76 | 1.174638 |
+--------+-----------+
|85 | 1.188551 |
+--------+-----------+
|120 | 1.180261 |
+--------+-----------+
|128 | 1.183726 |
+--------+-----------+
|512 | 1.220806 |
+--------+-----------+
BUG=skia:
Review URL: https://codereview.chromium.org/1229673008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://codereview.chromium.org/1255193002/)
Reason for revert:
Chromium doesn't call SkGraphics::Init(). This setup won't work.
Original issue's description:
> Lay groundwork for SkOpts.
>
> This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.
>
> BUG=skia:4117
>
> Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827
TBR=djsollen@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117
Review URL: https://codereview.chromium.org/1261743002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's only implemented on x86, where the exisiting benchmark says memcpy() is
faster for all cases:
Timer overhead: 24ns
curr/maxrss loops min median mean max stddev samples config bench
10/10 MB 1 35.9µs 36.2µs 36.2µs 36.6µs 1% ▁▂▄▅▅▃█▄▄▅ nonrendering sk_memcpy32_100000
10/10 MB 13 2.27µs 2.28µs 2.28µs 2.29µs 0% █▄▃▅▃▁▃▅▁▄ nonrendering sk_memcpy32_10000
11/11 MB 677 91.6ns 95.9ns 94.5ns 99.4ns 3% ▅▅▅▅▅█▁▁▁▁ nonrendering sk_memcpy32_1000
11/11 MB 1171 20ns 20.9ns 21.3ns 23.4ns 6% ▁▁▇▃▃▃█▇▃▃ nonrendering sk_memcpy32_100
11/11 MB 1952 14ns 14ns 14.3ns 15.2ns 3% ▁▁██▁▁▁▁▁▁ nonrendering sk_memcpy32_10
11/11 MB 5 33.6µs 33.7µs 34.1µs 35.2µs 2% ▆▇█▁▁▁▁▁▁▁ nonrendering memcpy32_memcpy_100000
11/11 MB 18 2.12µs 2.22µs 2.24µs 2.39µs 5% ▂█▄▇█▄▇▁▁▁ nonrendering memcpy32_memcpy_10000
11/11 MB 1112 87.3ns 87.3ns 89.1ns 93.7ns 3% ▄██▄▁▁▁▁▁▁ nonrendering memcpy32_memcpy_1000
11/11 MB 2124 12.8ns 13.3ns 13.5ns 14.8ns 6% ▁▁▁█▃▃█▇▃▃ nonrendering memcpy32_memcpy_100
11/11 MB 3077 9ns 9.41ns 9.52ns 10.2ns 4% ▃█▁█▃▃▃▃▃▃ nonrendering memcpy32_memcpy_10
(Why? One fewer thing to port to SkOpts.)
BUG=skia:4117
Review URL: https://codereview.chromium.org/1256763003
|
|
|
|
|
|
|
|
| |
This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1255193002
|
|
|
|
|
|
|
|
| |
Nothing seems to run any faster or slower, but it is terser.
BUG=skia:
Review URL: https://codereview.chromium.org/1255913004
|