aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/Sk4px_SSE2.h
diff options
context:
space:
mode:
authorGravatar mtklein <mtklein@chromium.org>2015-07-14 10:54:19 -0700
committerGravatar Commit bot <commit-bot@chromium.org>2015-07-14 10:54:19 -0700
commit4be181e304d2b280c6801bd13369cfba236d1a66 (patch)
treeae0510f8a6504c3333582fa004e961a8771a2d99 /src/opts/Sk4px_SSE2.h
parenta5517e2b190a8083b38964972b031c13e99f1012 (diff)
3-15% speedup to HardLight / Overlay xfermodes.
While investigating my bug (skia:4052) I saw this TODO and figured it'd make me feel better about an otherwise unsuccessful investigation. This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls to 2 cheap comparisons and 1 expensive div255(). NEON speeds up by a more modest ~3%. BUG=skia: Review URL: https://codereview.chromium.org/1230663005
Diffstat (limited to 'src/opts/Sk4px_SSE2.h')
-rw-r--r--src/opts/Sk4px_SSE2.h5
1 files changed, 5 insertions, 0 deletions
diff --git a/src/opts/Sk4px_SSE2.h b/src/opts/Sk4px_SSE2.h
index 74ccffc277..3809c5e47b 100644
--- a/src/opts/Sk4px_SSE2.h
+++ b/src/opts/Sk4px_SSE2.h
@@ -31,6 +31,11 @@ inline Sk4px::Wide Sk4px::widenHi() const {
_mm_unpackhi_epi8(_mm_setzero_si128(), this->fVec));
}
+inline Sk4px::Wide Sk4px::widenLoHi() const {
+ return Sk16h(_mm_unpacklo_epi8(this->fVec, this->fVec),
+ _mm_unpackhi_epi8(this->fVec, this->fVec));
+}
+
inline Sk4px::Wide Sk4px::mulWiden(const Sk16b& other) const {
return this->widenLo() * Sk4px(other).widenLo();
}