aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkNx.h
diff options
context:
space:
mode:
authorGravatar Mike Klein <mtklein@chromium.org>2016-10-06 15:06:38 -0400
committerGravatar Skia Commit-Bot <skia-commit-bot@chromium.org>2016-10-07 12:52:29 +0000
commit1aebdaee0e2aa4324509fd3ad4c40c21703ae4a2 (patch)
treec5ffae6c59217f3d228891177e1d50d7f784801a /src/core/SkNx.h
parent2766cc567d5c939730fadd2d865e4bdf05477263 (diff)
SkRasterPipeline: 8x pipelines
Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
Diffstat (limited to 'src/core/SkNx.h')
-rw-r--r--src/core/SkNx.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/src/core/SkNx.h b/src/core/SkNx.h
index 383f2aaae0..6b63199a08 100644
--- a/src/core/SkNx.h
+++ b/src/core/SkNx.h
@@ -307,6 +307,11 @@ SI SkNx<1,Dst> SkNx_cast(const SkNx<1,Src>& v) {
return static_cast<Dst>(v.fVal);
}
+template <int N, typename T>
+SI SkNx<N,T> SkNx_fma(const SkNx<N,T>& f, const SkNx<N,T>& m, const SkNx<N,T>& a) {
+ return f*m+a;
+}
+
typedef SkNx<2, float> Sk2f;
typedef SkNx<4, float> Sk4f;
typedef SkNx<8, float> Sk8f;
@@ -326,6 +331,7 @@ typedef SkNx<8, uint16_t> Sk8h;
typedef SkNx<16, uint16_t> Sk16h;
typedef SkNx<4, int32_t> Sk4i;
+typedef SkNx<8, int32_t> Sk8i;
typedef SkNx<4, uint32_t> Sk4u;
// Include platform specific specializations if available.