aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/Sk4x4f.h
Commit message (Collapse)AuthorAge
* Wrap SkNx types in anonymous namespace again.Gravatar Mike Klein2016-10-14
| | | | | | | | | | | | | | | | | | | | | This should make each compilation unit's SkNx types distinct from each other's as far as C++ cares. This keeps us from violating the One Definition Rule with different implementations for the same function. Here's an example I like. Sk4i SkNx_cast(Sk4b) has at least 4 different sensible implementations: - SSE2: punpcklbw xmm, zero; punpcklbw xmm, zero - SSSE3: load mask; pshufb xmm, mask - SSE4.1: pmovzxbd - AVX2: vpmovzxbd We really want all these to inline, but if for some reason they don't (Debug build, poor inliner) and they're compiled in SkOpts.cpp, SkOpts_ssse3.cpp, SkOpts_sse41.cpp, SkOpts_hsw.cpp... boom! BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3461 Change-Id: I0088ebfd7640c1b0de989738ed43c81b530dc0d9 Reviewed-on: https://skia-review.googlesource.com/3461 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Sk4x4f: NEON impl.Gravatar mtklein2016-03-24
| | | | | | | | | | | Notable tricks: - v{ld,st}4q_f32 handle transposing loads and stores of floats in one step - vcvtq_n_{f32_u32,u32_f32} let us do conversion to and from floats without shifts BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1828613002 Review URL: https://codereview.chromium.org/1828613002
* Sk4x4f: Simplify x86 down to SSE2.Gravatar mtklein2016-03-23
| | | | | | | | | | | | | | | | | | | - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast on Windows and Linux, and may help mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1817353005 Review URL: https://codereview.chromium.org/1817353005
* Sk4x4fGravatar mtklein2016-03-22
An API for loading and storing 4 Sk4f with transpose. This has SSSE3+ and portable versions. SSE2 and NEON versions to follow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1825663002 Review URL: https://codereview.chromium.org/1825663002