skia - 2D graphics library

	Commit message (Collapse)	Author	Age
*	Color dodge and burn with SkPMFloat.	mtklein	2015-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both 25-35% faster with SSE. With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement. The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content. Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows. In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%. Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight. BUG=skia: Review URL: https://codereview.chromium.org/1214443002
*	Convert SkPMFloat to [0,1] range and prune its API.	mtklein	2015-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that Sk4px exists, there's a lot less sense in eeking out every cycle of speed from SkPMFloat: if we need to go _really_ fast, we should use Sk4px. SkPMFloat's going to be used for things that are already slow: large-range intermediates, divides, sqrts, etc. A [0,1] range is easier to work with, and can even be faster if we eliminate enough 255 and 1/255 steps. This is particularly true on ARM, where NEON can do the *255 and /255 steps for us while converting float<->int. We have lots of experimental SkPMFloat <-> SkPMColor APIs that I'm now removing. Of the existing APIs, roundClamp() is the sanest, so I've kept only that, now called round(). The 4-at-a-time APIs never panned out, so they're gone. There will be small diffs on: colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter BUG=skia: Review URL: https://codereview.chromium.org/1201343004
*	Everyone gets a namespace {}.	mtklein	2015-05-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we include Sk4px.h, SkPMFloat.h, or SkNx.h into files with different SIMD flags, that could cause different definitions of the same method. Normally that's moot, because all the code inlines, but in Debug it tends not to. So in Debug, the linker picks one definition for us. That breaks _someone_. Wrapping everything in a namespace {} keeps the definitions separate. Tested locally, it fixes this bug. BUG=skia:3861 This code is not yet enabled in Chrome, so shouldn't affect the roll. NOTREECHECKS=true Review URL: https://codereview.chromium.org/1154523004
*	Code's more readable when SkPMFloat is an Sk4f.	mtklein	2015-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	#floats BUG=skia: BUG=skia:3592 Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1061603002
*	Revert of Code's more readable when SkPMFloat is an Sk4f. (patchset #3 ↵	mtklein	2015-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	id:40001 of https://codereview.chromium.org/1061603002/) Reason for revert: missed some neon code Original issue's description: > Code's more readable when SkPMFloat is an Sk4f. > #floats > > BUG=skia: > BUG=skia:3592 > > Committed: https://skia.googlesource.com/skia/+/6b5dab889579f1cc9e1b5278f4ecdc4c63fe78c9 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1056143004
*	Code's more readable when SkPMFloat is an Sk4f.	mtklein	2015-04-03
\| \| \| \| \| \| \| \| \|	#floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1061603002
*	New names for SkPMFloat methods.	mtklein	2015-04-03
\| \| \| \| \| \|	BUG=skia: Review URL: https://codereview.chromium.org/1055123002
*	Use switch operator[](int) to kth<int>() so we can use vget_lane.	mtklein	2015-04-03
\| \| \| \| \| \| \| \| \|	#floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1059743002
*	SkPMFloat: fewer internal this->isValid() assertions.	mtklein	2015-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	Each of these conversion functions now only asserts is output is valid. For SkPMColor -> SkPMFloat, we assert isValid(). For SkPMFloat -> SkPMColor, we SkPMColorAssert. #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1055093002
*	back to Sk4f for SkPMColor	mtklein	2015-03-31
\| \| \| \| \| \| \| \| \|	#floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1047823002
*	Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>	mtklein	2015-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc. This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h. This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h. To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful. You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel: - Sk4f, Sk4s, Sk2d: feel awesome - Sk2f, Sk2s, Sk4d: feel pretty good No public API changes. TBR=reed@google.com BUG=skia:3592 Review URL: https://codereview.chromium.org/1048593002
*	SkPMFloat::trunc()	mtklein	2015-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add and test trunc(), which is what get() used to be before rounding. Using trunc() is a ~40% speedup on our linear gradient bench. #neon #floats BUG=skia:3592 #n5 #n9 CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus5-Adreno330-Arm7-Debug-Trybot;client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1032243002
*	Update 4-at-a-time APIs.	mtklein	2015-03-25
\| \| \| \| \| \| \| \| \| \| \|	There is no reason to require the 4 SkPMFloats (registers) to be adjacent. The only potential win in loads and stores comes from the SkPMColors being adjacent. Makes no difference to existing bench. BUG=skia: Review URL: https://codereview.chromium.org/1035583002
*	Go back to storeAligned / LoadAligned for SkPMFloat <->Sk4f.	mtklein	2015-03-24
\| \| \| \| \| \| \| \| \| \|	This seems to fix the miscompilation bug on ARM64 / Release / GCC 4.9. We switched this over originally for perf issues with NEON, but I can't see any now. Will keep an eye out. BUG=skia:3570 Review URL: https://codereview.chromium.org/1026403002
*	Specialize Sk2d for ARM64	mtklein	2015-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation is nearly identical to Sk2f, with these changes: - float32x2_t -> float64x2_t - vfoo -> vfooq - one extra Newton's method step in sqrt(). Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON). SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler seem to set __ARM_NEON__), so this CL fixes everything up. BUG=skia: Committed: https://skia.googlesource.com/skia/+/e57b5cab261a243dcbefa74c91c896c28959bf09 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.7-Clang-Arm7-Debug-iOS-Trybot,Build-Ubuntu-GCC-Arm64-Release-Android-Trybot Review URL: https://codereview.chromium.org/1020963002
*	Revert of Specialize Sk2d for ARM64 (patchset #3 id:40001 of ↵	mtklein	2015-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://codereview.chromium.org/1020963002/) Reason for revert: https://uberchromegw.corp.google.com/i/client.skia.compile/builders/Build-Mac10.7-Clang-Arm7-Debug-iOS/builds/2441/steps/build%20most/logs/stdio https://uberchromegw.corp.google.com/i/client.skia.compile/builders/Build-Mac10.7-Clang-Arm7-Release-iOS/builds/2424/steps/build%20most/logs/stdio https://uberchromegw.corp.google.com/i/client.skia.compile/builders/Build-Ubuntu-GCC-Arm64-Release-Android/builds/8/steps/build%20most/logs/stdio Original issue's description: > Specialize Sk2d for ARM64 > > The implementation is nearly identical to Sk2f, with these changes: > - float32x2_t -> float64x2_t > - vfoo -> vfooq > - one extra Newton's method step in sqrt(). > > Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON). > SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler > seem to set __ARM_NEON__), so this CL fixes everything up. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/e57b5cab261a243dcbefa74c91c896c28959bf09 TBR=msarett@google.com,reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1028523003
*	Specialize Sk2d for ARM64	mtklein	2015-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation is nearly identical to Sk2f, with these changes: - float32x2_t -> float64x2_t - vfoo -> vfooq - one extra Newton's method step in sqrt(). Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON). SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler seem to set __ARM_NEON__), so this CL fixes everything up. BUG=skia: Review URL: https://codereview.chromium.org/1020963002
*	SkPMFloat: avoid loads and stores where possible.	mtklein	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A store/load pair like this is a redundant no-op: store simd_register_a, memory_address load memory_address, simd_register_a Everyone seems to be good at removing those when using SSE, but GCC and Clang are pretty terrible at this for NEON. We end up issuing both redundant commands, usually to and from the stack. That's slow. Let's not do that. This CL unions in the native SIMD register type into SkPMFloat, so that we can assign to and from it directly, which is generating a lot better NEON code. On my Nexus 5, the benchmarks improve from 36ns to 23ns. SSE is just as fast either way, but I paralleled the NEON code for consistency. It's a little terser. And because it needed the platform headers anyway, I moved all includes into SkPMFloat.h, again only for consistency. I'd union in Sk4f too to make its conversion methods a little clearer, but MSVC won't let me (it has a copy constructor... they're apparently not up to speed with C++11 unrestricted unions). BUG=skia: Review URL: https://codereview.chromium.org/1015083004
*	SKPMFloat: we can beat the naive loops when clamping	mtklein	2015-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clamping 4 at a time is now about 15% faster than 1 at a time with SSSE3. Clamping 4 at a time is now about 20% faster with SSE2, and this applies to non-clamping too (we still just clamp there). In all cases, 4 at a time is never worse than 1 at a time, and not clamping is never slower than clamping. Here's all the bench results, with the numbers for portable code as a fun point of reference: SSSE3: maxrss loops min median mean max stddev samples config bench 10M 2291 4.66ns 4.66ns 4.66ns 4.68ns 0% ▆█▁▁▁▇▁▇▁▃ nonrendering SkPMFloat_get_1x 10M 2040 5.29ns 5.3ns 5.3ns 5.32ns 0% ▃▆▃▃▁▁▆▃▃█ nonrendering SkPMFloat_clamp_1x 10M 7175 4.62ns 4.62ns 4.62ns 4.63ns 0% ▁▄▃████▃▄▇ nonrendering SkPMFloat_get_4x 10M 5801 4.89ns 4.89ns 4.89ns 4.91ns 0% █▂▄▃▁▃▄█▁▁ nonrendering SkPMFloat_clamp_4x SSE2: maxrss loops min median mean max stddev samples config bench 10M 1601 6.02ns 6.05ns 6.04ns 6.08ns 0% █▅▄▅▄▂▁▂▂▂ nonrendering SkPMFloat_get_1x 10M 2918 6.05ns 6.06ns 6.05ns 6.06ns 0% ▂▇▁▇▇▁▇█▇▂ nonrendering SkPMFloat_clamp_1x 10M 3569 5.43ns 5.45ns 5.44ns 5.45ns 0% ▄█▂██▇▁▁▇▇ nonrendering SkPMFloat_get_4x 10M 4168 5.43ns 5.43ns 5.43ns 5.44ns 0% █▄▇▁▇▄▁▁▁▁ nonrendering SkPMFloat_clamp_4x Portable: maxrss loops min median mean max stddev samples config bench 10M 500 27.8ns 28.1ns 28ns 28.2ns 0% ▃█▆▃▇▃▆▁▇▂ nonrendering SkPMFloat_get_1x 10M 770 40.1ns 40.2ns 40.2ns 40.3ns 0% ▅▁▃▂▆▄█▂▅▂ nonrendering SkPMFloat_clamp_1x 10M 1269 28.4ns 28.8ns 29.1ns 32.7ns 4% ▂▂▂█▂▁▁▂▁▁ nonrendering SkPMFloat_get_4x 10M 1439 40.2ns 40.4ns 40.4ns 40.5ns 0% ▆▆▆█▁▆▅█▅▆ nonrendering SkPMFloat_clamp_4x SkPMFloat_neon.h is still one big TODO as far as 4-at-a-time APIs go. BUG=skia: Review URL: https://codereview.chromium.org/982123002
*	4-at-a-time SkPMColor -> SkPMFloat API.	mtklein	2015-03-05
\| \| \| \| \| \| \| \| \| \| \| \|	Please see if this looks usable. It may even give a perf boost if you use it, even without custom implementations for each instruction set. I've been trying this morning to beat this naive loop implementation, but so far no luck with either _SSE2.h or _SSSE3.h. It's possible this is an artifact of the microbenchmark, because we're not doing anything between the conversions. I'd like to see how this fits into real code, what assembly's generated, what the hot spots are, etc. I've updated the tests to test these new APIs, and splintered off a pair of new benchmarks that use the new APIs. This required some minor rejiggering in the benches. BUG=skia: Review URL: https://codereview.chromium.org/978213003
*	Update SkPMFloat API a bit.	mtklein	2015-03-04
\| \| \| \| \| \| \| \| \| \| \|	Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor). Replace setA(), setR(), etc. with a 4 float constructor. And, promise to stick to SkPMColor order. BUG=skia: Review URL: https://codereview.chromium.org/977773002
*	Add SSSE3 implementation for SkPMFloat, with faster get() and set().	mtklein	2015-03-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSSE3, we can use the Swiss Army Knife byte shuffler pshufb, a.k.a. _mm_shuffle_epi8(), to jump directly between 32 and 128 bits. In microbench isolation, this looks like an additional 10-15% speedup: SkPMFloat_get: 2.35ns -> 1.98ns SkPMFloat_clamp: 2.35ns -> 2.18ns Before this CL, get() and clamp() were identical code. The _get benchmark improves because both set() and get() become faster; the _clamp benchmark shows the improvement from set() getting faster with clamp() staying the same. BUG=skia: Review URL: https://codereview.chromium.org/976493002
*	Test and fix SkPMFloat rounding.	mtklein	2015-03-03
\| \| \| \| \| \| \| \| \| \| \|	SSE rounds for free (that was a happy accident: they also have a truncating version). NEON does not, nor obviously the portable code, so they add 0.5 before truncating. NOPRESUBMIT=true BUG=skia: Review URL: https://codereview.chromium.org/974643002
*	Make SkPMFloats store floats in [0,255] instead of [0,1].	mtklein	2015-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	This pushes the cost of the 255 and 1/255 conversions onto only those code paths that need it. We're not doing it any more efficiently than can be done with Sk4f. In microbenchmark isolation, this is about a 15% speedup. BUG=skia: NOPRESUBMIT=true Review URL: https://codereview.chromium.org/973603002
*	Don't guarantee any particular color order for SkPMFloat. Hide fColor.	mtklein	2015-03-02
\| \| \| \| \| \| \| \|	Also add a little note that get() may incidentally clamp. BUG=skia: Review URL: https://codereview.chromium.org/973553004
*	add auto SkPMFloat <-> Sk4f conversion	mtklein	2015-02-26
\| \| \| \| \| \|	BUG=skia: Review URL: https://codereview.chromium.org/954323002
*	Spin off some fixes to land right away.	mtklein	2015-02-26
\| \| \| \| \| \|	BUG=skia: Review URL: https://codereview.chromium.org/960023002
*	Sketch SkPMFloat	mtklein	2015-02-23
\| \| \| \| \| \| \| \| \| \| \| \|	BUG=skia: Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon-Trybot http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Review URL: https://codereview.chromium.org/936633002
*	Revert of Sketch SkPMFloat (patchset #15 id:270001 of ↵	mtklein	2015-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://codereview.chromium.org/936633002/) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Original issue's description: > Sketch SkPMFloat > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f TBR=reed@google.com,msarrett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/952453004
*	Sketch SkPMFloat	mtklein	2015-02-23
	BUG=skia: Review URL: https://codereview.chromium.org/936633002