aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Enable highQualityFilter_SSE2Gravatar qiankun.miao2014-09-04
| | | | | | | | | | | | | | | | With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40% performance improvement on desktop i7-3770. BUG=skia: Committed: https://skia.googlesource.com/skia/+/b381fa10d8079c58928058bb8a6db32b39f05e51 CQ_EXTRA_TRYBOTS=tryserver.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot R=mtklein@google.com, humper@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/525283002
* Revert of Enable highQualityFilter_SSE2 (patchset #1 id:1 of ↵Gravatar mtklein2014-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/525283002/) Reason for revert: Color order looks wrong on Macs: Before: http://chromium-skia-gm.commondatastorage.googleapis.com/gm/bitmap-64bitMD5/filterbitmap_image_mandrill_16.png/12823183142873462143.png After: http://chromium-skia-gm.commondatastorage.googleapis.com/gm/bitmap-64bitMD5/filterbitmap_image_mandrill_16.png/13683040204546320578.png Original issue's description: > Enable highQualityFilter_SSE2 > > With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40% > performance improvement on desktop i7-3770. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/b381fa10d8079c58928058bb8a6db32b39f05e51 R=humper@google.com, qiankun.miao@intel.com TBR=humper@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: mtklein@google.com Review URL: https://codereview.chromium.org/539523002
* Enable highQualityFilter_SSE2Gravatar qiankun.miao2014-09-03
| | | | | | | | | | | | With SSE2, bitmap_BGRA_8888_A_scale_rotate_bicubic gains about 40% performance improvement on desktop i7-3770. BUG=skia: R=mtklein@google.com, humper@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/525283002
* Disable SSE4 S32A_Opaque blit.Gravatar mtklein2014-09-03
| | | | | | | | | | | This code sometimes generates a build warning that bothers Chrome. BUG=399842,skia:2906 R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/538463003
* Remove dead code in SkBitmapFilter_opts_SSE2.h/cppGravatar qiankun.miao2014-09-03
| | | | | | | | | BUG=skia: R=mtklein@google.com, humper@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/530673002
* Disable NEON procs for box blur as it produces invalid resultsGravatar djsollen2014-09-02
| | | | | | | | | | R=reed@google.com, mtklein@google.com, senorblanco@google.com TBR=reed@google.com BUG=skia:2845 Author: djsollen@google.com Review URL: https://codereview.chromium.org/527973002
* Revert of Disable NEON procs for box blur as it produces invalid results ↵Gravatar djsollen2014-09-02
| | | | | | | | | | | | | | | | | | | | | | | | (patchset #1 id:1 of https://codereview.chromium.org/520963002/) Reason for revert: failing more GMs than expected. Original issue's description: > Disable NEON procs for box blur as it produces invalid results > > BUG=skia:2845 > > Committed: https://skia.googlesource.com/skia/+/4a1764688c990fb926aaeab538497dad52768d99 R=senorblanco@google.com, mtklein@google.com TBR=mtklein@google.com, senorblanco@google.com NOTREECHECKS=true NOTRY=true BUG=skia:2845 Author: djsollen@google.com Review URL: https://codereview.chromium.org/531023002
* Disable NEON procs for box blur as it produces invalid resultsGravatar djsollen2014-09-02
| | | | | | | | | BUG=skia:2845 R=senorblanco@google.com, mtklein@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/520963002
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | | | | BUG=skia:2797 Committed: https://skia.googlesource.com/skia/+/84cab93186fbe3e87d931fea73cb31b70ff5017b R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* Disable Neon optimization of bad S32A/D565 blend.Gravatar mtklein2014-08-22
| | | | | | | | | BUG=skia:2797 R=mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/497823002
* disable neon proc that is triggering assertsGravatar reed2014-08-22
| | | | | | | | | BUG=skia:2845 R=mtklein@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/498733002
* Simplify flattening to just write enough to call the ↵Gravatar reed2014-08-21
| | | | | | | | | | | | | | | | | | factory/public-constructor for the class. We want to *not* rely on private constructors, and not rely on calling through the inheritance hierarchy for either flattening or unflattening(CreateProc). Refactoring pattern: 1. guard the existing constructor(readbuffer) with the legacy build-flag 2. If you are a instancable subclass, implement CreateProc(readbuffer) to create a new instances from the buffer params (or return NULL). If you're a shader subclass 1. You must read/write the local matrix if your class accepts that in its factory/constructor, else ignore it. R=robertphillips@google.com, mtklein@google.com, senorblanco@google.com, senorblanco@chromium.org, sugoi@chromium.org Author: reed@google.com Review URL: https://codereview.chromium.org/395603002
* Turn off NEON SkBoxBlurGetPlatformProcs for ARM64 (for now)Gravatar halcanary2014-08-20
| | | | | | | | | BUG=skia:2845 R=djsollen@google.com, senorblanco@google.com, senorblanco@chromium.org Author: halcanary@google.com Review URL: https://codereview.chromium.org/491973002
* Let skia build with clang's integrated assembler.Gravatar thakis2014-08-11
| | | | | | | | | | | | | | | | 1. vuzpq is a gcc instruction. Replace it with the equivalent vuzp (see http://llvm.org/PR20423) 2. .func / .endfunc only have an effect with -gstabs, which we don't use. As it's unused and clang doesn't support it, remove .func / .endfunc (also see http://llvm.org/20424) BUG=chromium:124610 R=mtklein@google.com Author: thakis@chromium.org Review URL: https://codereview.chromium.org/461693004
* Replace a pre-UAL instruction with its modern form.Gravatar thakis2014-08-11
| | | | | | | | | | | See the notes in the Chromium bug, and http://llvm.org/20427 BUG=chromium:124610,skia:900 R=djsollen@google.com, mtklein@google.com Author: thakis@chromium.org Review URL: https://codereview.chromium.org/455903002
* Fix S32A_D565_Opaque for RGBA on arm64Gravatar kevin.petit2014-08-09
| | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia:2813 R=halcanary@google.com, djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/458453002
* Disable suspect NEON function for 64-bit AndroidGravatar djsollen2014-08-07
| | | | | | | | R=halcanary@google.com, mtklein@google.com, kevin.petit@arm.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/451633006
* Add query for block dimensions of a given formatGravatar krajcevski2014-07-29
| | | | | | | | R=robertphillips@google.com Author: krajcevski@google.com Review URL: https://codereview.chromium.org/422023006
* Enable the SSSE3 compile time check on all platforms (4th attempt)Gravatar djsollen2014-07-24
| | | | | | | | | BUG=skia:2746 R=bungeman@google.com, robertphillips@google.com, mtklein@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/414033002
* Revert of Enable the SSSE3 compile time check on all platforms. ↵Gravatar bungeman2014-07-23
| | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/403583002/) Reason for revert: This is blocking the roll. Chromium Windows trybots (like win_chromium_x64_rel) are crashing in the SSSE3 code (for example SkCanvasVideoRenderTest.CroppedFrame). Original issue's description: > Enable the SSSE3 compile time check on all platforms (3rd attempt) > > BUG=skia:2746 > > Committed: https://skia.googlesource.com/skia/+/933834851f9d48fbd85b728cc92e1f0134bfaa4e R=halcanary@google.com, mtklein@google.com, djsollen@google.com TBR=djsollen@google.com, halcanary@google.com, mtklein@google.com NOTREECHECKS=true NOTRY=true BUG=skia:2746 Author: bungeman@google.com Review URL: https://codereview.chromium.org/418523002
* Enable the SSSE3 compile time check on all platforms (3rd attempt)Gravatar djsollen2014-07-22
| | | | | | | | | BUG=skia:2746 R=halcanary@google.com, mtklein@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/403583002
* Add support for NEON intrinsics to speed up texture compression. We canGravatar krajcevski2014-07-14
| | | | | | | | | | | | | now convert the time that we would have spent uploading the texture to compressing it giving a net 50% memory savings for these things. Committed: https://skia.googlesource.com/skia/+/bc9205be0a1094e312da098348601398c210dc5a R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com Author: krajcevski@google.com Review URL: https://codereview.chromium.org/390453002
* Revert of Enable the SSSE3 compile time check on all platforms. ↵Gravatar halcanary2014-07-14
| | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/391693004/) Reason for revert: windows fail Original issue's description: > Enable the SSSE3 compile time check on all platforms. > > BUG=skia:2746 > > Committed: https://skia.googlesource.com/skia/+/ee349531446ae2a8336b0903e05d0b2150d2131f R=mtklein@google.com, djsollen@google.com TBR=djsollen@google.com, mtklein@google.com NOTREECHECKS=true NOTRY=true BUG=skia:2746 Author: halcanary@google.com Review URL: https://codereview.chromium.org/390063002
* Enable the SSSE3 compile time check on all platforms.Gravatar djsollen2014-07-14
| | | | | | | | | BUG=skia:2746 R=mtklein@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/391693004
* MIPS: added optimization for SkRGB16_Opaque_Blitter::blitMaskGravatar djordje.pesut2014-07-14
| | | | | | | | | | gaint is ~30% R=djsollen@google.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/357693002
* Revert of Add support for NEON intrinsics to speed up texture compression. ↵Gravatar krajcevski2014-07-11
| | | | | | | | | | | | | | | | | | | | | | | We can (https://codereview.chromium.org/390453002/) Reason for revert: Breaking chrome. Original issue's description: > Add support for NEON intrinsics to speed up texture compression. We can > now convert the time that we would have spent uploading the texture to > compressing it giving a net 50% memory savings for these things. > > Committed: https://skia.googlesource.com/skia/+/bc9205be0a1094e312da098348601398c210dc5a R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com TBR=kevin.petit@arm.com, mtklein@google.com, robertphillips@google.com NOTREECHECKS=true NOTRY=true Author: krajcevski@google.com Review URL: https://codereview.chromium.org/384053003
* Add support for NEON intrinsics to speed up texture compression. We canGravatar krajcevski2014-07-11
| | | | | | | | | | | now convert the time that we would have spent uploading the texture to compressing it giving a net 50% memory savings for these things. R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com Author: krajcevski@google.com Review URL: https://codereview.chromium.org/390453002
* MIPS: added optimizations for functions from SkBitmapProcStateGravatar djordje.pesut2014-07-08
| | | | | | | | | | | | | | gain is ~30% following functions are optimized: SI8_D16_nofilter_DX SI8_opaque_D32_nofilter_DX R=djsollen@google.com, teodora.petrovic@gmail.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/336533003
* Add return to SkBoxBlurGetPlatformProcs_SSE4.Gravatar scroggo2014-07-07
| | | | | | | | | | This fixes Android build. R=reed@google.com, mtklein@google.com Author: scroggo@google.com Review URL: https://codereview.chromium.org/378613002
* Add SSE4 version of BlurImage optimizations.Gravatar henrik.smiding2014-07-07
| | | | | | | | | | | | | | | | Adds an SSE4.1 version of the existing BlurImage optimizations. Performance of blur_image_filter_* benchmarks show a 10-50% improvement on Linux/Ubuntu Core i7. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665 R=mtklein@google.com, reed@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/366593004
* Revert of Add SSE4 version of BlurImage optimizations. ↵Gravatar reed2014-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/366593004/) Reason for revert: breaks linker on chrome [04:36:09.966000] [503/5965] LIB obj\chrome\installer_util.lib [04:36:10.466000] FAILED: C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-with-manifests environment.x86 True skia.dll "C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-wrapper environment.x86 False link.exe /nologo /IMPLIB:skia.dll.lib /DLL /OUT:skia.dll @skia.dll.rsp" 2 mt.exe rc.exe "obj\skia\skia.skia.dll.intermediate.manifest" obj\skia\skia.skia.dll.generated.manifest [04:36:10.466000] skia.opts_check_x86.obj : error LNK2019: unresolved external symbol "bool __cdecl SkBoxBlurGetPlatformProcs_SSE4(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs_SSE4@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z) referenced in function "bool __cdecl SkBoxBlurGetPlatformProcs(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z) [04:36:10.466000] [04:36:10.466000] skia.dll : fatal error LNK1120: 1 unresolved externals Original issue's description: > Add SSE4 version of BlurImage optimizations. > > Adds an SSE4.1 version of the existing BlurImage optimizations. > Performance of blur_image_filter_* benchmarks show a 10-50% > improvement on Linux/Ubuntu Core i7. > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665 R=mtklein@google.com, henrik.smiding@intel.com TBR=henrik.smiding@intel.com, mtklein@google.com NOTREECHECKS=true NOTRY=true Author: reed@chromium.org Review URL: https://codereview.chromium.org/375503003
* Add SSE4 version of BlurImage optimizations.Gravatar henrik.smiding2014-07-04
| | | | | | | | | | | | | | Adds an SSE4.1 version of the existing BlurImage optimizations. Performance of blur_image_filter_* benchmarks show a 10-50% improvement on Linux/Ubuntu Core i7. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=mtklein@google.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/366593004
* Exclude Clang on Windows too. Comment this up a bit.Gravatar mtklein2014-07-02
| | | | | | | | | BUG=391016 R=tomhudson@chromium.org, mtklein@google.com, rnk@chromium.org, thakis@chromium.org Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/363983004
* Disable assembly code in MemorySanitizer builds.Gravatar Mike Klein2014-07-02
| | | | | | | | | | MemorySanitizer is an unitialized memory use detector which is used in Chromium, and does not presently support assembly code. BUG=chromium:344505, chromium:373739 R=mtklein@google.com Review URL: https://codereview.chromium.org/367973005
* Hide symbols in S32A_Opaque_BlitRow32_SSE4Gravatar henrik.smiding2014-07-01
| | | | | | | | | | | | | | Marks the symbols in the S32A_Opaque_BlitRow32_SSE4 files as hidden, so Chromium can build. Also enables the optimizations. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=mtklein@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/368573002
* Revert of Re-enable SSE4. (https://codereview.chromium.org/357593003/)Gravatar mtklein2014-06-30
| | | | | | | | | | | | | | | | | | | | | | | | Reason for revert: Mac Chrome builders still failing. Original issue's description: > Re-enable SSE4. > > I will roll this into Chrome with https://codereview.chromium.org/332393003. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/a75b0fadbdec4214afec6dd727fd224d34ed164f R=reed@google.com, mtklein@chromium.org TBR=mtklein@chromium.org, reed@google.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: mtklein@google.com Review URL: https://codereview.chromium.org/337093004
* Re-enable SSE4.Gravatar mtklein2014-06-30
| | | | | | | | | | | I will roll this into Chrome with https://codereview.chromium.org/332393003. BUG=skia: R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/357593003
* ARM Skia NEON patches - 41 - arm64: SkXfermode::xfer32Gravatar kevin.petit2014-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the NEON code for Xfermodes performs well on arm64 targets except for dstout and dstin which are significantly slower than the C code. This patch fixes this and gives further improvements on other modes. Here are some perf results: +------------+------------+------------+ | mode | Cortex-A53 | Cortex-A57 | +------------+------------+------------+ | multiply | +24.58% | +23.71% | +------------+------------+------------+ | exclusion | +22.72% | +22.05% | +------------+------------+------------+ | difference | +34.67% | +36.82% | +------------+------------+------------+ | hardlight | +17.07% | +14.74% | +------------+------------+------------+ | lighten | +38.21% | +32.87% | +------------+------------+------------+ | darken | +37.59% | +32.99% | +------------+------------+------------+ | overlay | +17.36% | +16.88% | +------------+------------+------------+ | screen | +52.56% | +54.43% | +------------+------------+------------+ | modulate | +62.85% | +61.32% | +------------+------------+------------+ | plus | +91.52% | +117.41% | +------------+------------+------------+ | xor | +42.86% | +43.38% | +------------+------------+------------+ | dstatop | +48.46% | +48.99% | +------------+------------+------------+ | srcatop | +50.50% | +48.51% | +------------+------------+------------+ | dstout | +67.83% | +78.09% | +------------+------------+------------+ | srcout | +69.02% | +78.26% | +------------+------------+------------+ | dstin | +70.92% | +79.24% | +------------+------------+------------+ | srcin | +68.90% | +78.23% | +------------+------------+------------+ | dstover | +73.80% | +68.10% | +------------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/350343002
* Disable SSE4 code.Gravatar mtklein2014-06-27
| | | | | | | | | | | | | Chrome canary failing to link chrome: http://108.170.220.120:10115/builders/Canary-Chrome-Ubuntu13.10-Ninja-x86_64-ToT/builds/1009/steps/BuildChrome/logs/stdio BUG=skia: NOTRY=true R=mtklein@google.com, rmistry@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/361493002
* Refactor bitmap scaler to make it easier to migrate rest of chrome to use itGravatar humper2014-06-27
| | | | | | | | | | | | | | Previously, the set of platform-specific function pointers to do fast convolution (e.g., neon, SSE) were passed in a structure to the scaler. I refactored this so that the scaler fills in these function pointers after it's called, so the caller doesn't have to worry about it. R=mtklein@google.com TBR=mtklein NOTRY=True Author: humper@google.com Review URL: https://codereview.chromium.org/354193002
* Add SSE4 optimization of S32A_Opaque_BlitrowGravatar henrik.smiding2014-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_*, repeatTile_*, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
* ARM Skia NEON patches - 40 - arm64: S32A_D565_OpaqueGravatar kevin.petit2014-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are some perf results: +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.54% | -5.39% | +-------+------------+------------+ | 2 | -0.66% | -2.08% | +-------+------------+------------+ | 4 | -11.13% | 0.00% | +-------+------------+------------+ | 8 | -5.79% | -1.30% | +-------+------------+------------+ | 16 | 71.60% | 93.27% | +-------+------------+------------+ | 64 | 30.99% | 57.35% | +-------+------------+------------+ | 256 | 25.41% | 52.59% | +-------+------------+------------+ | 1024 | 25.56% | 53.76% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/346843003
* Fix SkBlitRow_opts_arm so that it works on ARM v4t.Gravatar george2014-06-20
| | | | | | | | | | | Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208 R=reed@google.com, mtklein@google.com, reed1 BUG=skia: Author: george@mozilla.com Review URL: https://codereview.chromium.org/337853003
* Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵Gravatar mtklein2014-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/289473009/) NOTREECHECKS=true NOTRY=true Reason for revert: Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too: > ld: warning: could not create compact unwind for S32A_Opaque_BlitRow32_SSE4_asm: > stack subq instruction is too different from dwarf stack size > [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework", > POSTBUILDS > FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A && > (export > BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium > Framework.framework/Versions/A"; export > DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export > EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium > Framework.framework/Versions/A/Chromium Framework"; export > FULL_PRODUCT_NAME="Chromium Framework.framework"; export > INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist"; > export LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium > Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export > PRODUCT_NAME="Chromium Framework"; export > PRODUCT_TYPE=com.apple.product-type.framework; export > SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk; > export > SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome; > export SOURCE_ROOT="${SRCROOT}"; export > TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export TEMP_DIR="${TMPDIR}"; export UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium > Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium > Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py > "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1" > "--branding=Chromium" && ln -fns Versions/Current/Libraries > "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" && > tools/build/mac/verify_order _ChromeMain > "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf > 'Chromium Framework.framework') && exit $G) && touch "Chromium > Framework.framework" > tools/build/mac/verify_order: unordered symbols in > /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium > Framework.framework/Versions/A/Chromium Framework: > S32A_Opaque_BlitRow32_SSE4_asm > _S32A_Opaque_BlitRow32_SSE4_asm > ninja: build stopped: subcommand failed. Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_*, repeatTile_*, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e > > Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org Author: mtklein@google.com Review URL: https://codereview.chromium.org/336413007
* Add SSE4 optimization of S32A_Opaque_BlitrowGravatar henrik.smiding2014-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_*, repeatTile_*, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
* Revert of Temporarily limit x86 SIMD to SSE2 only, to see effect on all ↵Gravatar mtklein2014-06-16
| | | | | | | | | | | | | | | | | | | | | | | | benches and bots. (https://codereview.chromium.org/331193004/) Reason for revert: Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got. Original issue's description: > Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots. > > BUG=372232 > > Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883 R=reed@google.com, mtklein@chromium.org TBR=mtklein@chromium.org, reed@google.com NOTREECHECKS=true NOTRY=true BUG=372232 Author: mtklein@google.com Review URL: https://codereview.chromium.org/332213005
* Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.Gravatar mtklein2014-06-16
| | | | | | | | | BUG=372232 R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/331193004
* MIPS: added optimization for functions from SkBlitRow.Gravatar djordje.pesut2014-06-11
| | | | | | | | | | | | | | | | | | | gain is ~40% following function are optimized: S32_D565_Blend S32A_D565_Opaque_Dither S32_D565_Opaque_Dither S32_D565_Blend_Dither S32A_D565_Opaque S32A_D565_Blend S32_Blend_BlitRow32 R=djsollen@google.com, teodora.petrovic@gmail.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/326913004
* ARM Skia NEON patches - 39 - arm64 565 blittersGravatar kevin.petit2014-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -18.37% | -13.04% | +-------+------------+------------+ | 2 | -9.90% | -13.78% | +-------+------------+------------+ | 4 | -8.28% | -6.77% | +-------+------------+------------+ | 8 | 157.63% | 78.15% | +-------+------------+------------+ | 16 | 72.67% | 44.81% | +-------+------------+------------+ | 64 | 76.78% | 40.89% | +-------+------------+------------+ | 256 | 73.85% | 36.05% | +-------+------------+------------+ | 1024 | 75.73% | 36.70% | +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -9.99% | -13.79% | +-------+------------+------------+ | 2 | -9.17% | -6.74% | +-------+------------+------------+ | 4 | -6.73% | -4.42% | +-------+------------+------------+ | 8 | 163.31% | 112.82% | +-------+------------+------------+ | 16 | 55.21% | 44.68% | +-------+------------+------------+ | 64 | 54.09% | 41.99% | +-------+------------+------------+ | 256 | 52.63% | 40.64% | +-------+------------+------------+ | 1024 | 52.46% | 40.45% | +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -5.88% | -6.06% | +-------+------------+------------+ | 2 | -4.74% | -0.01% | +-------+------------+------------+ | 4 | -5.42% | -3.03% | +-------+------------+------------+ | 8 | 78.78% | 77.96% | +-------+------------+------------+ | 16 | 98.19% | 79.61% | +-------+------------+------------+ | 64 | 111.56% | 72.60% | +-------+------------+------------+ | 256 | 113.80% | 69.96% | +-------+------------+------------+ | 1024 | 114.42% | 70.85% | +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.18% | -0.93% | +-------+------------+------------+ | 2 | -2.43% | -2.04% | +-------+------------+------------+ | 4 | -1.09% | -1.23% | +-------+------------+------------+ | 8 | 184.89% | 136.53% | +-------+------------+------------+ | 16 | 128.64% | 89.11% | +-------+------------+------------+ | 64 | 132.68% | 100.98% | +-------+------------+------------+ | 256 | 157.02% | 100.86% | +-------+------------+------------+ | 1024 | 163.85% | 103.62% | +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.87% | 0.01% | +-------+------------+------------+ | 2 | -2.71% | 2.97% | +-------+------------+------------+ | 4 | -2.20% | 0.28% | +-------+------------+------------+ | 8 | 149.76% | 146.80% | +-------+------------+------------+ | 16 | 85.69% | 95.77% | +-------+------------+------------+ | 64 | 88.81% | 101.39% | +-------+------------+------------+ | 256 | 97.32% | 107.22% | +-------+------------+------------+ | 1024 | 98.08% | 115.71% | +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -1.86% | 0.02% | +-------+------------+------------+ | 2 | -0.58% | -1.52% | +-------+------------+------------+ | 4 | -0.75% | 1.16% | +-------+------------+------------+ | 8 | 240.74% | 155.16% | +-------+------------+------------+ | 16 | 181.97% | 132.15% | +-------+------------+------------+ | 64 | 203.11% | 136.48% | +-------+------------+------------+ | 256 | 223.45% | 133.05% | +-------+------------+------------+ | 1024 | 225.96% | 134.05% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
* Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵Gravatar jvanverth2014-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/289473009/) Reason for revert: Buildbot failures on Mac 10.6 and Mac 10.7. R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=reed@google.com NOTRY=True Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_*, repeatTile_*, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Author: jvanverth@google.com Review URL: https://codereview.chromium.org/311053009