aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkHalf.h
Commit message (Collapse)AuthorAge
* partial revert of https://skia-review.googlesource.com/c/skia/+/90026 for ↵Gravatar Mike Reed2018-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | android Some errors: frameworks/base/core/jni/android/graphics/Bitmap.cpp:385: error: undefined reference to 'gDitherMatrix_4Bit_16' frameworks/base/core/jni/android/graphics/Bitmap.cpp:397: error: undefined reference to 'gDitherMatrix_4Bit_16' frameworks/base/core/jni/android/graphics/Bitmap.cpp:373: error: undefined reference to 'gDitherMatrix_3Bit_16' frameworks/base/core/jni/android/graphics/Bitmap.cpp:322: error: undefined reference to 'SkPM4f::toF16() const' frameworks/base/core/jni/android/graphics/Bitmap.cpp:333: error: undefined reference to 'SkFloatToHalf(float)' frameworks/base/core/jni/android/graphics/Bitmap.cpp:334: error: undefined reference to 'SkFloatToHalf(float)' frameworks/base/core/jni/android/graphics/Bitmap.cpp:335: error: undefined reference to 'SkFloatToHalf(float)' frameworks/base/core/jni/android/graphics/Bitmap.cpp:336: error: undefined reference to 'SkFloatToHalf(float)' frameworks/base/core/jni/android/graphics/Bitmap.cpp:639: error: undefined reference to 'SkPM4f::toF16() const' frameworks/base/core/jni/android/graphics/Bitmap.cpp:494: error: undefined reference to 'SkPM4f::FromF16(unsigned short const*)' frameworks/base/core/jni/android/graphics/Bitmap.cpp:494: error: undefined reference to 'SkPM4f::unpremul() const' external/skia/src/core/SkUtils.h:24: error: undefined reference to 'SkOpts::memset32' frameworks/base/core/jni/android/graphics/NinePatch.cpp:112: error: undefined reference to 'SkLatticeIter::Valid(int, int, SkCanvas::Lattice const&)' frameworks/base/core/jni/android/graphics/NinePatch.cpp:113: error: undefined reference to 'SkLatticeIter::SkLatticeIter(SkCanvas::Lattice const&, SkRect const&)' frameworks/base/core/jni/android/graphics/NinePatch.cpp:118: error: undefined reference to 'SkLatticeIter::next(SkRect*, SkRect*, bool*, unsigned int*)' frameworks/base/core/jni/android/graphics/NinePatch.cpp:118: error: undefined reference to 'SkLatticeIter::next(SkRect*, SkRect*, bool*, unsigned int*)' frameworks/base/core/jni/android/graphics/NinePatch.cpp:118: error: undefined reference to 'SkLatticeIter::next(SkRect*, SkRect*, bool*, unsigned int*)' external/skia/src/core/SkGeometry.h:427: error: undefined reference to 'SkConic::computeQuadPOW2(float) const' external/skia/src/core/SkGeometry.h:430: error: undefined reference to 'SkConic::chopIntoQuadsPOW2(SkPoint*, int) const' frameworks/base/core/jni/android/graphics/Shader.cpp:77: error: undefined reference to 'SkMakeImageFromRasterBitmap(SkBitmap const&, SkCopyPixelsMode)' Bug: skia:7454 Change-Id: I49b7c0890ac38b576b39e1ce76b22e2420c99889 Reviewed-on: https://skia-review.googlesource.com/90524 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Reed <reed@google.com>
* remove unused declarationsGravatar Mike Reed2018-01-03
| | | | | | | | Bug: skia: Change-Id: If8ca5e3d649dab3cf8b2bdb1cf072ff23cea9465 Reviewed-on: https://skia-review.googlesource.com/90026 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Reed <reed@google.com>
* Add SK_API to APIs used by the android framework.Gravatar Derek Sollenberger2017-09-21
| | | | | | | | | | | | | This CL enables us to set the default visibility of the symbols on Android to hidden. It is the intent that all of he SK_APIs that have been added to /src directies should be removed as soon as we can remove their callers within Android. Bug: b/31971097 Change-Id: Ic787f94df0fb0c2b8d941aa7095a12b317c4b5de Reviewed-on: https://skia-review.googlesource.com/49501 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Derek Sollenberger <djsollen@google.com>
* only SkHalfToFloat_finite_ftz(uint64_t hs) is calledGravatar Mike Klein2017-06-12
| | | | | | | | | This just cleans up an unused half->float entry point. Change-Id: I7b869d3fd049d807453745c3cca4eab1162848d8 Reviewed-on: https://skia-review.googlesource.com/19451 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* test and fix f16<->f32 conversion stagesGravatar Mike Klein2017-04-20
| | | | | | | | | | | | This refactors from_half() and to_half() a bit, totally reimplementing the non-hardware cases to be more clearly correct. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-PixelC-CPU-TegraX1-arm64-Release-Android,Test-Android-Clang-Ci20-CPU-IngenicJZ4780-mipsel-Release-Android,Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: I439463cf90935c5e8fe2369cbcf45e07f3af62c7 Reviewed-on: https://skia-review.googlesource.com/13921 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Matt Sarett <msarett@google.com>
* remove SkNx AVX codeGravatar Mike Klein2017-04-11
| | | | | | | | | | | | | | We can't realistically use AVX and SkNx together because of ODR problems, so remove the code that may tempt us to try. Remaining code paths using AVX: - one intrinsics-only routine in SkOpts_hsw.cpp - SkJumper Change-Id: I0d2d03b47ea4a0eec27f2de2b28a4c3d1ff8376f Reviewed-on: https://skia-review.googlesource.com/13121 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, without any 8x code enabled.Gravatar Mike Klein2016-10-12
| | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Second attempt here: https://skia-review.googlesource.com/c/3064/ This is the same as the second attempt, but with the change to SkOpts_hsw.cpp left out. That omitted part is the key piece... this just lands the refactoring. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3242 Change-Id: Iaafa793a4854c2c9cd7e85cca3701bf871253f71 Reviewed-on: https://skia-review.googlesource.com/3242 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines, attempt 2"Gravatar Mike Klein2016-10-10
| | | | | | | | | | | | This reverts commit Id0ba250037e271a9475fe2f0989d64f0aa909bae. crbug.com/654213 Looks like Chrome Canary's picking up Haswell code on non-Haswell machines. Change-Id: I16f976da24db86d5c99636c472ffad56db213a2a Reviewed-on: https://skia-review.googlesource.com/3108 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelines, attempt 2Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | Original review here: https://skia-review.googlesource.com/c/2990/ Changes since: - simpler implementations of load_tail() / store_tail(): slower, but more obviously correct to all compilers - fleshed out math ops on Sk8i and Sk8u to make unit tests happy on -Fast bot (where we always have AVX2) - now storing stage functions as void(*)() to avoid undefined behavior and/or linker problems. This restores 32-bit Windows. - all AVX2 Sk8x methods are marked always-inline, to avoid linking the "wrong" version on Debug builds. CQ_INCLUDE_TRYBOTS=master.client.skia:Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot,Perf-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug-GN,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot;master.client.skia.compile:Build-Win-MSVC-x86_64-Debug-Trybot GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3064 Change-Id: Id0ba250037e271a9475fe2f0989d64f0aa909bae Reviewed-on: https://skia-review.googlesource.com/3064 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Revert "SkRasterPipeline: 8x pipelines"Gravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | This reverts commit I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a. Reason for revert: lots of failing bots. TBR=mtklein@chromium.org,msarett@google.com,reviews@skia.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I653bed3905187f43196504f19424985fa2a765b5 Reviewed-on: https://skia-review.googlesource.com/3063 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkRasterPipeline: 8x pipelinesGravatar Mike Klein2016-10-07
| | | | | | | | | | | | | | | | | | | | | | Bench runtime changes: sRGB: 7194 -> 3735 = 1.93x faster F16: 6531 -> 2559 = 2.55x faster Instead of building 4x and 1-3x pipelines and then maybe 8x and 1-7x, instead build either the short ones or the long ones, but not both. If we just take care to use a compatible run_pipeline(), there's some cross-module type disagreement but everything works out in the end. Oddly, a few places that looked like they'd be faster using SkNx_fma() or Sk4f_round()/Sk8f_round() are actually faster the long way, e.g. multiply, add 0.5, truncate. Curious! In all the other places you see here that I've used SkNx_fma(), it's been a significant speedup. This folds in a couple refactors and cleanups that I've been meaning to do. Hope you don't mind... if find the new code considerably easier to read than the old code. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2990 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I1c82e5755d8e44cc0b9c6673d04b117f85d71a3a Reviewed-on: https://skia-review.googlesource.com/2990 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* f16<->f32 ftz is an optional thing for speed.Gravatar mtklein2016-08-23
| | | | | | | | | | | | | The ARMv8 asm path actually does it right... that should be okay. My Nexus 5x fails `dm -m _finite_ftz` before this and passes after it. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2276533002 TBR=msarett@google.com Review-Url: https://codereview.chromium.org/2276533002
* Flush denorm half floats to zero.Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | | | | | I think we convinced ourselves that denorms, while a good chunk of half floats, cover a rather small fraction of the representable range, which is always close enough to zero to flush. This makes both paths of the conversion to or from float considerably simpler. These functions now work for zero-or-normal half floats (excluding infinite, NaN). I'm not aware of a term for this class so I've called them "ordinary". A handful of GMs and SKPs draw differently in --config f16, but all imperceptibly. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2256023002 Review-Url: https://codereview.chromium.org/2256023002
* Add Sk4h_load4 for loading F16.Gravatar mtklein2016-07-26
| | | | | | | | | | | | | | Should feel very similar to Sk4h_store4: NEON uses its native instruction, SSE unpacks manually. Since we'll have our F16s in 4 Sk4h by the time we're done here, this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2184753002
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* Clean up hyper-local SkCpu feature test experiment.Gravatar mtklein2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This removes the code paths where we make SkCpu::Supports() calls from within a tight loop. It keeps code paths using SkCpu::Supports() to choose entire routines from src/opts/. We can't rely on these hyper-local checks to be hoisted up reliably enough. It worked pretty well with the first couple platforms we tried (e.g. Clang on Linux/Mac) but we can't gaurantee it works everywhere. Further, I'm not able to actually do anything fancy with those tests outside of x86... I've not found a way to get, say, NEON+F16 conversion code embedded into ordinary NEON code outside writing then entire function in external assembly. This whole idea becomes less important now that we've got a way to chain separate function calls together efficiently. We can now, e.g., use an AVX+F16C method to load some pixels, then chain that into an ordinary AVX method to color filter them. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2138073002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2138073002
* skcpu: sse4.1 floor, f16c f16<->f32Gravatar mtklein2016-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 Review URL: https://codereview.chromium.org/1891513002
* Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #11 id:200001 of ↵Gravatar mtklein2016-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1891513002/ ) Reason for revert: this depends on a CL I want to revert Original issue's description: > skcpu: sse4.1 floor, f16c f16<->f32 > > - floor with roundps is about 4.5x faster when available > - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: > > +0x180 movups (%r15), %xmm1 > +0x184 vcvtph2ps (%rbx), %xmm2 > +0x189 movaps %xmm1, %xmm3 > +0x18c shufps $255, %xmm3, %xmm3 > +0x190 movaps %xmm0, %xmm4 > +0x193 subps %xmm3, %xmm4 > +0x196 mulps %xmm2, %xmm4 > +0x199 addps %xmm1, %xmm4 > +0x19c vcvtps2ph $0, %xmm4, (%rbx) > +0x1a2 addq $16, %r15 > +0x1a6 addq $8, %rbx > +0x1aa decl %r14d > +0x1ad jne +0x180 > > If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 > > Committed: https://skia.googlesource.com/skia/+/3faf74b8364491ca806f523fbb1d8a97be592663 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1897433002
* skcpu: sse4.1 floor, f16c f16<->f32Gravatar mtklein2016-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 Review URL: https://codereview.chromium.org/1891513002
* Revert of skcpu: sse4.1 floor, f16c f16<->f32 (patchset #10 id:180001 of ↵Gravatar mtklein2016-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1891513002/ ) Reason for revert: Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC. Original issue's description: > skcpu: sse4.1 floor, f16c f16<->f32 > > - floor with roundps is about 4.5x faster when available > - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: > > +0x180 movups (%r15), %xmm1 > +0x184 vcvtph2ps (%rbx), %xmm2 > +0x189 movaps %xmm1, %xmm3 > +0x18c shufps $255, %xmm3, %xmm3 > +0x190 movaps %xmm0, %xmm4 > +0x193 subps %xmm3, %xmm4 > +0x196 mulps %xmm2, %xmm4 > +0x199 addps %xmm1, %xmm4 > +0x19c vcvtps2ph $0, %xmm4, (%rbx) > +0x1a2 addq $16, %r15 > +0x1a6 addq $8, %rbx > +0x1aa decl %r14d > +0x1ad jne +0x180 > > If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 > CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot > > Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9 TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1891993002
* skcpu: sse4.1 floor, f16c f16<->f32Gravatar mtklein2016-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | - floor with roundps is about 4.5x faster when available - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions: +0x180 movups (%r15), %xmm1 +0x184 vcvtph2ps (%rbx), %xmm2 +0x189 movaps %xmm1, %xmm3 +0x18c shufps $255, %xmm3, %xmm3 +0x190 movaps %xmm0, %xmm4 +0x193 subps %xmm3, %xmm4 +0x196 mulps %xmm2, %xmm4 +0x199 addps %xmm1, %xmm4 +0x19c vcvtps2ph $0, %xmm4, (%rbx) +0x1a2 addq $16, %r15 +0x1a6 addq $8, %rbx +0x1aa decl %r14d +0x1ad jne +0x180 If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1891513002
* NEON f32 <-> f16 and f32 <-> u16Gravatar mtklein2016-02-19
| | | | | | | | | | | | | | | | | | Adds f32 <-> f16 ARMv7 and ARMv8 NEON code. Also adds NEON f32 <-> u16 code to make the comparison fair. The NDK GCC does not support the ARMv8 NEON intrinsics needed to go fastest, so we use a tiny amount of inline assembly. The ARMv7 half -> float is different enough from the SSE version that it does not make sense to use SkNx. Still TODO: ARMv7 float -> half. Naively translating the SSE version results in 0x0000 where we'd expect a denormal output. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1700473003 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1700473003
* new version of SkHalfToFloat_01Gravatar mtklein2016-02-11
| | | | | | | | | This is a little faster than the previous version, and much better explained. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1688233002 Review URL: https://codereview.chromium.org/1688233002
* SkHalfToFloat_01 / SkFloatToHalf_01Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1685133005
* Revert of SkHalfToFloat_01 / SkFloatToHalf_01 (patchset #11 id:200001 of ↵Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1685133005/ ) Reason for revert: Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Original issue's description: > SkHalfToFloat_01 / SkFloatToHalf_01 > > These are basically inlined, 4-at-a-time versions of our existing functions, > but cut down to avoid any work that's only necessary outside [0,1]. > > Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. > > In exchange for a little speed, f32->f16 does not round properly. > Instead it truncates, so it's never off by more than 1 bit. > > Support for finite values >1 or <0 is straightforward to add back. > >1 might already work as-is. > > Getting close to _u16 performance: > micros bench > 261.13 xferu64_bw_1_opaque_u16 > 1833.51 xferu64_bw_1_alpha_u16 > 2762.32 ? xferu64_aa_1_opaque_u16 > 3334.29 xferu64_aa_1_alpha_u16 > 249.78 xferu64_bw_1_opaque_f16 > 3383.18 xferu64_bw_1_alpha_f16 > 4214.72 xferu64_aa_1_opaque_f16 > 4701.19 xferu64_aa_1_alpha_f16 > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 > > Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 TBR=jvanverth@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1693443003
* SkHalfToFloat_01 / SkFloatToHalf_01Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 Review URL: https://codereview.chromium.org/1685133005
* Add support for half float alpha textures.Gravatar jvanverth2014-12-05
| | | | | | | | | This allows us to create distance field textures with better precision, which may help text quality. BUG=skia:3103 Review URL: https://codereview.chromium.org/762923003
* Add float-to-half (binary16) conversion functions.Gravatar jvanverth2014-11-26
Based on code by Fabian Giesen at https://fgiesen.wordpress.com/2012/03/28/half-to-float-done-quic/. These will be needed for creating binary16 textures from floating point data. BUG=skia:3103 Review URL: https://codereview.chromium.org/760753003