aboutsummaryrefslogtreecommitdiffhomepage
path: root/tests/Float16Test.cpp
Commit message (Collapse)AuthorAge
* f16<->f32 ftz is an optional thing for speed.Gravatar mtklein2016-08-23
| | | | | | | | | | | | | The ARMv8 asm path actually does it right... that should be okay. My Nexus 5x fails `dm -m _finite_ftz` before this and passes after it. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2276533002 TBR=msarett@google.com Review-Url: https://codereview.chromium.org/2276533002
* Flush denorm half floats to zero.Gravatar mtklein2016-08-22
| | | | | | | | | | | | | | | | | | I think we convinced ourselves that denorms, while a good chunk of half floats, cover a rather small fraction of the representable range, which is always close enough to zero to flush. This makes both paths of the conversion to or from float considerably simpler. These functions now work for zero-or-normal half floats (excluding infinite, NaN). I'm not aware of a term for this class so I've called them "ordinary". A handful of GMs and SKPs draw differently in --config f16, but all imperceptibly. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2256023002 Review-Url: https://codereview.chromium.org/2256023002
* Improve naive SkColorXform to half floatsGravatar msarett2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | This should give us a good baseline to explore using SkRasterPipeline. A particular colorxform to half float drops from 425us to 282us on my desktop. Color Xform to Half Float (HP z620) Original 425us Trans16 (not 32) 355us Vector Trans16 378us Trans16 + Keep Halfs in Vector 335us Vector Trans16 + Keep Halfs in Vector 282us Final 282us Color Xform to Half Float (Nexus 5X) Original 556us Final 472us BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2159993003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Review-Url: https://codereview.chromium.org/2145663003
* Revert of Expand _01 half<->float limitation to _finite. Simplify. ↵Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ ) Reason for revert: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast Original issue's description: > Expand _01 half<->float limitation to _finite. Simplify. > > It's become clear we need to sometimes deal with values <0 or >1. > I'm not yet convinced we care about NaN or +-inf. > > We had some fairly clever tricks and optimizations here for NEON > and SSE. I've thrown them out in favor of a single implementation. > If we find the specializations mattered, we can certainly figure out > how to extend them to this new range/domain. > > This happens to add a vectorized float -> half for ARMv7, which was > missing from the _01 version. (The SSE strategy was not portable to > platforms that flush denorm floats to zero.) > > I've tested the full float range for FloatToHalf on my desktop and a 5x. > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 > CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot > > Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 TBR=msarett@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review-Url: https://codereview.chromium.org/2151023003
* Expand _01 half<->float limitation to _finite. Simplify.Gravatar mtklein2016-07-14
| | | | | | | | | | | | | | | | | | | | | | It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2145663003
* Remove bulk float <-> half routines. These are dead code.Gravatar mtklein2016-07-13
| | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152583002 Review-Url: https://codereview.chromium.org/2152583002
* Change SkColor4f to RGBA channel orderGravatar brianosman2016-06-24
| | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2093763003 Review-Url: https://codereview.chromium.org/2093763003
* Add SkSpecialImage::extractSubset & NewFromPixmapGravatar robertphillips2016-03-17
| | | | | | | | | | | | This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface) This now relies on: https://codereview.chromium.org/1813483002/ (ImagePixelLocker now manually allocates SkPixmap) to clean up the uses of SkAutoPixmapStorage in Chromium GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002 Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002 Review URL: https://codereview.chromium.org/1787883002
* Revert of Add SkSpecialImage::extractSubset & NewFromPixmap (patchset #5 ↵Gravatar robertphillips2016-03-16
| | | | | | | | | | | | | | | | | | | | | | | | id:80001 of https://codereview.chromium.org/1787883002/ ) Reason for revert: Need to wean ImagePixelLocker.h off of SkAutoPixmapStorage :( Original issue's description: > Add SkSpecialImage::extractSubset & NewFromPixmap > > This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface) > > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002 > > Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002 TBR=bsalomon@google.com,reed@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review URL: https://codereview.chromium.org/1808833002
* Add SkSpecialImage::extractSubset & NewFromPixmapGravatar robertphillips2016-03-16
| | | | | | | | This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface) GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002 Review URL: https://codereview.chromium.org/1787883002
* make SkPM4f privateGravatar reed2016-02-18
| | | | | | | BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1713653002 Review URL: https://codereview.chromium.org/1713653002
* new version of SkHalfToFloat_01Gravatar mtklein2016-02-11
| | | | | | | | | This is a little faster than the previous version, and much better explained. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1688233002 Review URL: https://codereview.chromium.org/1688233002
* SkHalfToFloat_01 / SkFloatToHalf_01Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1685133005
* Revert of SkHalfToFloat_01 / SkFloatToHalf_01 (patchset #11 id:200001 of ↵Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1685133005/ ) Reason for revert: Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Original issue's description: > SkHalfToFloat_01 / SkFloatToHalf_01 > > These are basically inlined, 4-at-a-time versions of our existing functions, > but cut down to avoid any work that's only necessary outside [0,1]. > > Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. > > In exchange for a little speed, f32->f16 does not round properly. > Instead it truncates, so it's never off by more than 1 bit. > > Support for finite values >1 or <0 is straightforward to add back. > >1 might already work as-is. > > Getting close to _u16 performance: > micros bench > 261.13 xferu64_bw_1_opaque_u16 > 1833.51 xferu64_bw_1_alpha_u16 > 2762.32 ? xferu64_aa_1_opaque_u16 > 3334.29 xferu64_aa_1_alpha_u16 > 249.78 xferu64_bw_1_opaque_f16 > 3383.18 xferu64_bw_1_alpha_f16 > 4214.72 xferu64_aa_1_opaque_f16 > 4701.19 xferu64_aa_1_alpha_f16 > > > BUG=skia: > GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 > > Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 TBR=jvanverth@google.com,reed@google.com,mtklein@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1693443003
* SkHalfToFloat_01 / SkFloatToHalf_01Gravatar mtklein2016-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005 Review URL: https://codereview.chromium.org/1685133005
* skeleton for float <-> half optimized procsGravatar mtklein2016-02-09
| | | | | | | | | | | | | | | | Nothing fancy yet, just calls the serial code in a loop. I will try to folow this up with at least some of: - SSE2 version of serial code - NEON version of serial code - NEON version using vcvt.f32.f16/vcvt.f16.f32 - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph The last two are fastest but need runtime detection. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003 Review URL: https://codereview.chromium.org/1686543003
* add kRGBA_F16_SkColorTypeGravatar reed2016-02-05
BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666343002 Review URL: https://codereview.chromium.org/1666343002