aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkNx_neon.h
diff options
context:
space:
mode:
authorGravatar mtklein <mtklein@chromium.org>2015-06-26 10:46:31 -0700
committerGravatar Commit bot <commit-bot@chromium.org>2015-06-26 10:46:31 -0700
commit2aab22a58a366df4752c1cf0f004092c6e7be335 (patch)
treebc4026ca98f28068b99ca6394c05a0129f0dc4d6 /src/opts/SkNx_neon.h
parentcdb42bb55c3bdbbd6682dcd50b5c77322bb6e565 (diff)
Color dodge and burn with SkPMFloat.
Both 25-35% faster with SSE. With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement. The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content. Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows. In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%. Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight. BUG=skia: Review URL: https://codereview.chromium.org/1214443002
Diffstat (limited to 'src/opts/SkNx_neon.h')
-rw-r--r--src/opts/SkNx_neon.h7
1 files changed, 7 insertions, 0 deletions
diff --git a/src/opts/SkNx_neon.h b/src/opts/SkNx_neon.h
index b319807779..ccba163e56 100644
--- a/src/opts/SkNx_neon.h
+++ b/src/opts/SkNx_neon.h
@@ -297,6 +297,13 @@ public:
|| vgetq_lane_u32(v,2) || vgetq_lane_u32(v,3);
}
+ SkNf thenElse(const SkNf& t, const SkNf& e) const {
+ uint32x4_t ci = vreinterpretq_u32_f32(fVec),
+ ti = vreinterpretq_u32_f32(t.fVec),
+ ei = vreinterpretq_u32_f32(e.fVec);
+ return vreinterpretq_f32_u32(vorrq_u32(vandq_u32(ti, ci), vbicq_u32(ei, ci)));
+ }
+
float32x4_t fVec;
};