diff options
author | mtklein <mtklein@chromium.org> | 2015-05-18 07:03:01 -0700 |
---|---|---|
committer | Commit bot <commit-bot@chromium.org> | 2015-05-18 07:03:01 -0700 |
commit | 9b777967b1e531d0ebdb3349c4bd149fdb86589f (patch) | |
tree | 1dc9b030c103e7df4e60f9ce948eb31585a4da96 /src/core/Sk4px.h | |
parent | d8b544cd04668f130e30a6a29eab8ec43b0bbb8b (diff) |
sk4px the rest of the easy xfermodes.
Adds and uses fastMulDiv255Round() where possible,
which approximates x*y/255 as (x*y+x)/256. Seems like a sizeable
speedup, as seen below on Exclusion, Screen, and Modulate. The
existing NEON code uses this approximation for
{Src,Dst}x{In,Out,Over}, and without it we'd regress speed there.
This will require rebaselines whether or not we use this
approximation: the x86 bots change if we do, the ARM bots change
if we don't. None of the diffs are significant.
Desktop:
Xfermode_Screen_aa 5.82ms -> 5.54ms 0.95x
Xfermode_Modulate_aa 5.67ms -> 5.36ms 0.95x
Xfermode_Exclusion_aa 6.18ms -> 5.81ms 0.94x
Xfermode_Exclusion 5.03ms -> 4.24ms 0.84x
Xfermode_Screen 4.51ms -> 3.59ms 0.8x
Xfermode_Modulate 4.2ms -> 3.19ms 0.76x
Xfermode_DstOver 6.73ms -> 3.88ms 0.58x
Xfermode_SrcOut 6.47ms -> 3.48ms 0.54x
Xfermode_SrcIn 6.46ms -> 3.46ms 0.54x
Xfermode_DstOut 6.49ms -> 3.41ms 0.52x
Xfermode_DstIn 6.5ms -> 3.32ms 0.51x
Xfermode_Src_aa 9.53ms -> 4.75ms 0.5x
Xfermode_Clear_aa 9.65ms -> 4.8ms 0.5x
Xfermode_DstIn_aa 11.5ms -> 5.57ms 0.49x
Xfermode_DstOver_aa 11.6ms -> 5.63ms 0.49x
Xfermode_SrcOut_aa 11.6ms -> 5.5ms 0.47x
Xfermode_SrcIn_aa 11.7ms -> 5.51ms 0.47x
Xfermode_DstOut_aa 11.7ms -> 5.4ms 0.46x
N7 performance is close enough to 1x that I'm not sure whether
this is a net win, net loss, or truly neutral. I figure the bots will
show that.
I experimented with another approximation,
(x*(255-y))/255 ≈ (x*(256-y))/256. This was inconclusive, so I'm
leaving it out for now.
The remaining modes are the complicated conditional ones.
BUG=skia:
Review URL: https://codereview.chromium.org/1141953004
Diffstat (limited to 'src/core/Sk4px.h')
-rw-r--r-- | src/core/Sk4px.h | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/src/core/Sk4px.h b/src/core/Sk4px.h index 028630d100..48e09e1c92 100644 --- a/src/core/Sk4px.h +++ b/src/core/Sk4px.h @@ -65,6 +65,15 @@ public: return this->mulWiden(Sk16b(255)); } + // Generally faster than this->mulWiden(other).div255RoundNarrow(). + // May be incorrect by +-1, but is always exactly correct when *this or other is 0 or 255. + Sk4px fastMulDiv255Round(const Sk16b& other) const { + // (x*y + x) / 256 meets these criteria. (As of course does (x*y + y) / 256 by symmetry.) + Sk4px::Wide x = this->widenLo(), + xy = this->mulWiden(other); + return x.addNarrowHi(xy); + } + // A generic driver that maps fn over a src array into a dst array. // fn should take an Sk4px (4 src pixels) and return an Sk4px (4 dst pixels). template <typename Fn> |