aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts/SkBlitRow_opts_SSE2.cpp
Commit message (Collapse)AuthorAge
* Cleanup of SSE optimization files.Gravatar commit-bot@chromium.org2014-04-30
| | | | | | | | | | | | | | | | | | | | | General cleanup of optimization files for x86/SSEx. Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific to SSE2. Commented out the ColorRect32 optimization, since it's disabled anyway, to make it more visible. Also fixed a lot of indentation, inclusion guards, spelling, copyright headers, braces, whitespace, and sorting of includes. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/264603002 git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of multiply_modeprocGravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵Gravatar commit-bot@chromium.org2014-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/202903004/) Reason for revert: It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 > > Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/224253003 git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of multiply_modeprocGravatar commit-bot@chromium.org2014-04-03
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵Gravatar commit-bot@chromium.org2014-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/202903004/) Reason for revert: Breaking builds Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/219243009 git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of multiply_modeprocGravatar commit-bot@chromium.org2014-04-01
| | | | | | | | | | | | | | | | | | | | | This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 implementation of S32A_D565_Opaque_DitherGravatar commit-bot@chromium.org2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run benchmarks with command line option "--forceDither true --forceBlend 1", almost all the benchmarks exercised S32A_D565_Opaque_Dither can get about 20%-70% performance improvement. Here are the data on i7-3770: before after verts 4314.81 3627.64 15.93% constXTile_MM_filter_trans 1434.22 432.82 69.82% constXTile_CC_filter_trans_scale 1440.17 437.00 69.66% constXTile_RR_filter_trans 1436.96 431.93 69.94% constXTile_MM_trans_scale 1436.33 435.77 69.66% constXTile_CC_trans 1433.12 431.36 69.90% constXTile_RR_trans_scale 1436.13 436.06 69.64% constXTile_MM_filter 1411.55 408.06 71.09% constXTile_CC_filter_scale 1416.68 414.18 70.76% constXTile_RR_filter 1429.46 409.81 71.33% constXTile_MM_scale 1415.00 412.56 70.84% constXTile_CC 1410.32 408.36 71.04% constXTile_RR_scale 1413.26 413.16 70.77% repeatTile_4444_A 1922.01 879.03 54.27% repeatTile_4444_A 1430.68 818.34 42.80% repeatTile_4444_X 1817.43 816.63 55.07% maskshader 5911.09 5895.46 0.26% gradient_create_alpha 4.41 4.41 -0.15% gradient_conical_clamp_3color 35298.71 27574.34 21.88% gradient_conical_clamp_hicolor 35262.15 27538.99 21.90% gradient_conical_clamp 35276.21 27599.80 21.76% gradient_radial2_mirror 20846.74 12969.39 37.79% gradient_radial2_clamp_hicolor 21848.12 13967.57 36.07% gradient_radial2_clamp 21829.95 13978.57 35.97% bitmap_4444_A_scale_rotate_bicubic 105.31 87.13 17.26% bitmap_4444_A_scale_bicubic 73.69 47.76 35.20% bitmap_4444_update_scale_rotate_bilerp 125.65 87.86 30.08% bitmap_4444_update_volatile_scale_rotate_bilerp 125.50 87.65 30.16% bitmap_4444_scale_rotate_bilerp 124.46 87.91 29.37% bitmap_4444_A_scale_rotate_bilerp 105.09 87.27 16.96% bitmap_4444_update_scale_bilerp 106.78 63.28 40.74% bitmap_4444_update_volatile_scale_bilerp 106.66 63.66 40.32% bitmap_4444_scale_bilerp 106.70 63.19 40.78% bitmap_4444_A_scale_bilerp 83.05 62.25 25.04% bitmap_a8 98.11 52.76 46.22% bitmap_a8_A 98.24 52.85 46.20% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/179443003 git-svn-id: http://skia.googlecode.com/svn/trunk@13699 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 implementation of S32_D565_Opaque_DitherGravatar commit-bot@chromium.org2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run benchmarks with command line option "--forceDither true". The result shows that all benchmarks exercised S32_D565_Opaque_Dither benefit from this SSE2 optimization. Here are the data on i7-3770: before after constXTile_MM_filter 900.93 217.75 75.83% constXTile_CC_filter_scale 907.59 225.65 75.14% constXTile_RR_filter 903.33 219.41 75.71% constXTile_MM_scale 902.45 221.46 75.46% constXTile_CC 898.55 218.37 75.70% constXTile_RR_scale 902.69 222.35 75.37% repeatTile_4444_X 938.53 240.49 74.38% gradient_radial2_mirror 16999.49 11540.39 32.11% gradient_radial2_clamp_hicolor 17943.38 12501.71 30.33% gradient_radial2_clamp 17816.36 12492.04 29.88% bitmaprect_FF_filter_trans 47.81 10.98 77.03% bitmaprect_FF_nofilter_trans 47.79 10.91 77.18% bitmaprect_FF_filter_identity 47.74 10.89 77.18% bitmaprect_FF_nofilter_identity 47.83 10.89 77.24% bitmap_4444_update_scale_rotate_bilerp 100.45 76.84 23.50% bitmap_4444_update_volatile_scale_rotate_bilerp 100.80 76.70 23.91% bitmap_4444_scale_rotate_bilerp 100.43 77.18 23.15% bitmap_4444_update_scale_bilerp 79.00 49.03 37.93% bitmap_4444_update_volatile_scale_bilerp 78.90 48.87 38.06% bitmap_4444_scale_bilerp 78.92 48.81 38.16% bitmap_4444_update 42.19 11.53 72.68% bitmap_4444_update_volatile 42.28 11.49 72.82% bitmap_a8 60.37 29.75 50.72% bitmap_4444 42.19 11.52 72.69% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/181293002 git-svn-id: http://skia.googlecode.com/svn/trunk@13698 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 implementation of S32_D565_OpaqueGravatar commit-bot@chromium.org2014-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | Benchmarks hitting this path can benfit from this patch. Here are the data: before after gradient_radial2_mirror 10885.52 10849.48 0.33% gradient_radial2_clamp_hicolor 11819.69 11644.83 1.48% gradient_radial2_clamp 11816.10 11649.91 1.41% bitmaprect_FF_filter_trans 6.27 4.88 22.17% bitmaprect_FF_nofilter_trans 6.27 4.88 22.17% bitmaprect_FF_filter_identity 6.31 4.86 22.98% bitmaprect_FF_nofilter_identity 6.25 4.86 22.24% bitmap_4444_update 6.26 5.05 19.33% bitmap_4444_update_volatile 6.21 5.06 18.52% bitmap_4444 6.22 5.06 18.65% BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/172083003 git-svn-id: http://skia.googlecode.com/svn/trunk@13556 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 implementation of S32A_D565_OpaqueGravatar commit-bot@chromium.org2014-02-19
| | | | | | | | | | | | | microbenchmark of S32A_D565_Opaque() shows a 3x speedup after SSE optimization with various count on i7-3770. BUG= R=mtklein@google.com, reed@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/138163013 git-svn-id: http://skia.googlecode.com/svn/trunk@13495 2bbb7eff-a529-9590-31e7-b0007b416f81
* Commented SSE blend functions and cleaned-up variable naming.Gravatar commit-bot@chromium.org2013-07-02
| | | | | | | | | | R=senorblanco@chromium.org, alokp@chromium.org, reed@google.com, bungeman@google.com Author: ernstm@chromium.org Review URL: https://chromiumcodereview.appspot.com/17847010 git-svn-id: http://skia.googlecode.com/svn/trunk@9870 2bbb7eff-a529-9590-31e7-b0007b416f81
* Result of running tools/sanitize_source_files.py (which was added in ↵Gravatar rmistry@google.com2012-08-23
| | | | | | | | | https://codereview.appspot.com/6465078/) This CL is part I of IV (I broke down the 1280 files into 4 CLs). Review URL: https://codereview.appspot.com/6485054 git-svn-id: http://skia.googlecode.com/svn/trunk@5262 2bbb7eff-a529-9590-31e7-b0007b416f81
* Force opaque in SkBlendLCD16Opaque_SSE2 to match SkBlendLCD16.Gravatar bungeman@google.com2012-08-21
| | | | | | | https://codereview.appspot.com/6460123/ git-svn-id: http://skia.googlecode.com/svn/trunk@5218 2bbb7eff-a529-9590-31e7-b0007b416f81
* revert 4799-4801 -- red and blue are reversed on windows and linuxGravatar reed@google.com2012-07-27
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@4803 2bbb7eff-a529-9590-31e7-b0007b416f81
* use SK_RESTRICT instead of __restrict__Gravatar reed@google.com2012-07-27
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@4801 2bbb7eff-a529-9590-31e7-b0007b416f81
* use intptr_t to cast from ptr to int for masking low bitsGravatar reed@google.com2012-07-27
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@4800 2bbb7eff-a529-9590-31e7-b0007b416f81
* land http://codereview.appspot.com/6327044/Gravatar reed@google.com2012-07-27
| | | | | | | | SSE optimization for 565 pixel format -- by Lei git-svn-id: http://skia.googlecode.com/svn/trunk@4799 2bbb7eff-a529-9590-31e7-b0007b416f81
* Fix SkBlendLCD16_SSE2 for non ARGB platforms.Gravatar bungeman@google.com2012-07-09
| | | | | | | http://codereview.appspot.com/6356062/ git-svn-id: http://skia.googlecode.com/svn/trunk@4481 2bbb7eff-a529-9590-31e7-b0007b416f81
* fix warnings on Mac in src/optsGravatar caryclark@google.com2012-06-06
| | | | | | | | | | | | | | | | | | Fix these class of warnings: - unused functions - unused locals - sign mismatch - missing function prototypes - missing newline at end of file - 64 to 32 bit truncation The changes prefer to link in dead code in the debug build with 'if (false)' than to comment it out, but trivial cases are commented out or sometimes deleted if it appears to be a copy/paste error. Review URL: https://codereview.appspot.com/6303045 git-svn-id: http://skia.googlecode.com/svn/trunk@4184 2bbb7eff-a529-9590-31e7-b0007b416f81
* Improve SSE2 code for Blending BlitRow functions, producing 10% speedup.Gravatar tomhudson@google.com2012-02-28
| | | | | | | | | | Courtesy of Evan Nier. http://codereview.appspot.com/5518045/ git-svn-id: http://skia.googlecode.com/svn/trunk@3273 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 version of blit_lcd16, courtesy of Jin Yang.Gravatar tomhudson@google.com2012-02-14
| | | | | | | | | | | | | | | Yields 25-30% speedup on Windows (32b), 4-7% on Linux (64b, less register pressure), not invoked on Mac (lcd text is 32b instead of 16b). Followup: GDI system settings on Windows can suppress LCD text for small fonts, interfering with our benchmarks. (http://code.google.com/p/skia/issues/detail?id=483) http://codereview.appspot.com/5617058/ git-svn-id: http://skia.googlecode.com/svn/trunk@3189 2bbb7eff-a529-9590-31e7-b0007b416f81
* don't blend with zero in colorproc (forgot to return after memcpy check).Gravatar reed@google.com2011-10-25
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@2527 2bbb7eff-a529-9590-31e7-b0007b416f81
* move LCD blits into opts, so they can have assembly versionsGravatar reed@google.com2011-10-18
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@2484 2bbb7eff-a529-9590-31e7-b0007b416f81
* Automatic update of all copyright notices to reflect new license terms.Gravatar epoger@google.com2011-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have manually examined all of these diffs and restored a few files that seem to require manual adjustment. The following files still need to be modified manually, in a separate CL: android_sample/SampleApp/AndroidManifest.xml android_sample/SampleApp/res/layout/layout.xml android_sample/SampleApp/res/menu/sample.xml android_sample/SampleApp/res/values/strings.xml android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java experimental/CiCarbonSampleMain.c experimental/CocoaDebugger/main.m experimental/FileReaderApp/main.m experimental/SimpleCocoaApp/main.m experimental/iOSSampleApp/Shared/SkAlertPrompt.h experimental/iOSSampleApp/Shared/SkAlertPrompt.m experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig gpu/src/android/GrGLDefaultInterface_android.cpp gyp/common.gypi gyp_skia include/ports/SkHarfBuzzFont.h include/views/SkOSWindow_wxwidgets.h make.bat make.py src/opts/memset.arm.S src/opts/memset16_neon.S src/opts/memset32_neon.S src/opts/opts_check_arm.cpp src/ports/SkDebug_brew.cpp src/ports/SkMemory_brew.cpp src/ports/SkOSFile_brew.cpp src/ports/SkXMLParser_empty.cpp src/utils/ios/SkImageDecoder_iOS.mm src/utils/ios/SkOSFile_iOS.mm src/utils/ios/SkStream_NSData.mm tests/FillPathTest.cpp Review URL: http://codereview.appspot.com/4816058 git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81
* Correct blitmask procs to recognize that we pass them an SkColor, and if theyGravatar reed@google.com2011-03-09
| | | | | | | | | | want a SkPMColor, they need to call SkPreMultiplyColor() Add Opaque and Black optimizations for blitmask_d32 git-svn-id: http://skia.googlecode.com/svn/trunk@911 2bbb7eff-a529-9590-31e7-b0007b416f81
* http://codereview.appspot.com/3980041/Gravatar reed@google.com2011-03-09
| | | | | | | | | Add blitmask procs (with optional platform acceleration) patch by yaojie.yan git-svn-id: http://skia.googlecode.com/svn/trunk@910 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 optimizations for 32bit Color operation.Gravatar senorblanco@chromium.org2010-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [Patch from weiwei.li@intel.com] SSE2 optimization has been added by Stephen White before, this improves the skia performance on SSE2-supporting platform. (please refer to below issues) Issue 171055: More SSE2ification Issue 157141: More SSE2ification Issue 150060: minor tweaks to SSE2 code for -fPIC Issue 144072: SSE2 optimizations for 32bit blending blitters This CL implements SSE2 optimizations for the 32bit Color operation. Like above issues, it uses CPUID to detect for SSE2 and changes the platform procs at runtime as well. The 32bit Color operation is heavily used on Chrome HTML5 canvas operations. Take Microsoft IE test drives Pulsating Bubbles as example (http://ie.microsoft.com/testdrive/Performance/PulsatingBubbles/Default.xhtml), if running this cases on Chrome, the overhead of 32bit Color operation is about 40~50%. So this CL will make skia performance more better, and also make Chrome HTML5 canvas performance more better. Additional, this CL has passed the skia bench & tests validation, the result is pretty good. We also apply this CL to the latest chromium, and re-run Microsoft IE test drives Pulsating Bubbles, the performance is improved by almost 9~10%. git-svn-id: http://skia.googlecode.com/svn/trunk@633 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2-ified S32_alpha_D32_filter_DX (refactoring to come). Also shaved a fewGravatar senorblanco@chromium.org2009-12-10
| | | | | | | | | | cycles off the SSE2 blends. Review URL: http://codereview.appspot.com/171055 git-svn-id: http://skia.googlecode.com/svn/trunk@456 2bbb7eff-a529-9590-31e7-b0007b416f81
* More SSE2 optimizations. This CL implements an SSE2 version of ↵Gravatar senorblanco@chromium.org2009-11-30
| | | | | | | | | | S32_bitmap_D32_filter_DX, and uses aligned loads and stores for dst, in all blending. Review URL: http://codereview.appspot.com/157141 git-svn-id: http://skia.googlecode.com/svn/trunk@448 2bbb7eff-a529-9590-31e7-b0007b416f81
* More SSE2-ification; fix for gcc -msse2.Gravatar senorblanco@chromium.org2009-11-16
| | | | | | | | Review URL: http://codereview.appspot.com/154163 git-svn-id: http://skia.googlecode.com/svn/trunk@428 2bbb7eff-a529-9590-31e7-b0007b416f81
* remove const modifiers on function return types (unneeded, and caused an errorGravatar reed@android.com2009-11-13
| | | | | | | | on some gccs). git-svn-id: http://skia.googlecode.com/svn/trunk@425 2bbb7eff-a529-9590-31e7-b0007b416f81
* Fix for gcc -fPIC build.Gravatar senorblanco@chromium.org2009-11-09
| | | | | | | | http://codereview.appspot.com/150060 git-svn-id: http://skia.googlecode.com/svn/trunk@421 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 optimizations for 32bit blending blitters.Gravatar senorblanco@chromium.org2009-11-04
This CL implements SSE2 optimizations for 3 of the 32bit blending blitters. It uses CPUID to detect for SSE2 at runtime. In order to accomodate runtime detection, it changes the platform procs from static arrays to static functions. It also includes an implementation of SkTime for Win32. http://codereview.appspot.com/144072 git-svn-id: http://skia.googlecode.com/svn/trunk@418 2bbb7eff-a529-9590-31e7-b0007b416f81