skia - 2D graphics library

	Commit message (Collapse)	Author	Age
*	Fix SkBlitRow_opts_arm so that it works on ARM v4t.	george	2014-06-20
\| \| \| \| \| \| \| \| \| \| \|	Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208 R=reed@google.com, mtklein@google.com, reed1 BUG=skia: Author: george@mozilla.com Review URL: https://codereview.chromium.org/337853003
*	Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵	mtklein	2014-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/289473009/) NOTREECHECKS=true NOTRY=true Reason for revert: Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too: > ld: warning: could not create compact unwind for S32A_Opaque_BlitRow32_SSE4_asm: > stack subq instruction is too different from dwarf stack size > [10339/10982 \| 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework", > POSTBUILDS > FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A && > (export > BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium > Framework.framework/Versions/A"; export > DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export > EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium > Framework.framework/Versions/A/Chromium Framework"; export > FULL_PRODUCT_NAME="Chromium Framework.framework"; export > INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist"; > export LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium > Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export > PRODUCT_NAME="Chromium Framework"; export > PRODUCT_TYPE=com.apple.product-type.framework; export > SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk; > export > SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome; > export SOURCE_ROOT="${SRCROOT}"; export > TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export TEMP_DIR="${TMPDIR}"; export UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium > Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium > Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py > "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1" > "--branding=Chromium" && ln -fns Versions/Current/Libraries > "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" && > tools/build/mac/verify_order _ChromeMain > "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) \|\| rm -rf > 'Chromium Framework.framework') && exit $G) && touch "Chromium > Framework.framework" > tools/build/mac/verify_order: unordered symbols in > /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium > Framework.framework/Versions/A/Chromium Framework: > S32A_Opaque_BlitRow32_SSE4_asm > _S32A_Opaque_BlitRow32_SSE4_asm > ninja: build stopped: subcommand failed. Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_, repeatTile_, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e > > Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org Author: mtklein@google.com Review URL: https://codereview.chromium.org/336413007
*	Add SSE4 optimization of S32A_Opaque_Blitrow	henrik.smiding	2014-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_, repeatTile_, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
*	Revert of Temporarily limit x86 SIMD to SSE2 only, to see effect on all ↵	mtklein	2014-06-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	benches and bots. (https://codereview.chromium.org/331193004/) Reason for revert: Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got. Original issue's description: > Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots. > > BUG=372232 > > Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883 R=reed@google.com, mtklein@chromium.org TBR=mtklein@chromium.org, reed@google.com NOTREECHECKS=true NOTRY=true BUG=372232 Author: mtklein@google.com Review URL: https://codereview.chromium.org/332213005
*	Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.	mtklein	2014-06-16
\| \| \| \| \| \| \| \| \|	BUG=372232 R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/331193004
*	MIPS: added optimization for functions from SkBlitRow.	djordje.pesut	2014-06-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gain is ~40% following function are optimized: S32_D565_Blend S32A_D565_Opaque_Dither S32_D565_Opaque_Dither S32_D565_Blend_Dither S32A_D565_Opaque S32A_D565_Blend S32_Blend_BlitRow32 R=djsollen@google.com, teodora.petrovic@gmail.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/326913004
*	ARM Skia NEON patches - 39 - arm64 565 blitters	kevin.petit	2014-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -18.37% \| -13.04% \| +-------+------------+------------+ \| 2 \| -9.90% \| -13.78% \| +-------+------------+------------+ \| 4 \| -8.28% \| -6.77% \| +-------+------------+------------+ \| 8 \| 157.63% \| 78.15% \| +-------+------------+------------+ \| 16 \| 72.67% \| 44.81% \| +-------+------------+------------+ \| 64 \| 76.78% \| 40.89% \| +-------+------------+------------+ \| 256 \| 73.85% \| 36.05% \| +-------+------------+------------+ \| 1024 \| 75.73% \| 36.70% \| +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -9.99% \| -13.79% \| +-------+------------+------------+ \| 2 \| -9.17% \| -6.74% \| +-------+------------+------------+ \| 4 \| -6.73% \| -4.42% \| +-------+------------+------------+ \| 8 \| 163.31% \| 112.82% \| +-------+------------+------------+ \| 16 \| 55.21% \| 44.68% \| +-------+------------+------------+ \| 64 \| 54.09% \| 41.99% \| +-------+------------+------------+ \| 256 \| 52.63% \| 40.64% \| +-------+------------+------------+ \| 1024 \| 52.46% \| 40.45% \| +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -5.88% \| -6.06% \| +-------+------------+------------+ \| 2 \| -4.74% \| -0.01% \| +-------+------------+------------+ \| 4 \| -5.42% \| -3.03% \| +-------+------------+------------+ \| 8 \| 78.78% \| 77.96% \| +-------+------------+------------+ \| 16 \| 98.19% \| 79.61% \| +-------+------------+------------+ \| 64 \| 111.56% \| 72.60% \| +-------+------------+------------+ \| 256 \| 113.80% \| 69.96% \| +-------+------------+------------+ \| 1024 \| 114.42% \| 70.85% \| +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -4.18% \| -0.93% \| +-------+------------+------------+ \| 2 \| -2.43% \| -2.04% \| +-------+------------+------------+ \| 4 \| -1.09% \| -1.23% \| +-------+------------+------------+ \| 8 \| 184.89% \| 136.53% \| +-------+------------+------------+ \| 16 \| 128.64% \| 89.11% \| +-------+------------+------------+ \| 64 \| 132.68% \| 100.98% \| +-------+------------+------------+ \| 256 \| 157.02% \| 100.86% \| +-------+------------+------------+ \| 1024 \| 163.85% \| 103.62% \| +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -4.87% \| 0.01% \| +-------+------------+------------+ \| 2 \| -2.71% \| 2.97% \| +-------+------------+------------+ \| 4 \| -2.20% \| 0.28% \| +-------+------------+------------+ \| 8 \| 149.76% \| 146.80% \| +-------+------------+------------+ \| 16 \| 85.69% \| 95.77% \| +-------+------------+------------+ \| 64 \| 88.81% \| 101.39% \| +-------+------------+------------+ \| 256 \| 97.32% \| 107.22% \| +-------+------------+------------+ \| 1024 \| 98.08% \| 115.71% \| +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -1.86% \| 0.02% \| +-------+------------+------------+ \| 2 \| -0.58% \| -1.52% \| +-------+------------+------------+ \| 4 \| -0.75% \| 1.16% \| +-------+------------+------------+ \| 8 \| 240.74% \| 155.16% \| +-------+------------+------------+ \| 16 \| 181.97% \| 132.15% \| +-------+------------+------------+ \| 64 \| 203.11% \| 136.48% \| +-------+------------+------------+ \| 256 \| 223.45% \| 133.05% \| +-------+------------+------------+ \| 1024 \| 225.96% \| 134.05% \| +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
*	Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵	jvanverth	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/289473009/) Reason for revert: Buildbot failures on Mac 10.6 and Mac 10.7. R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=reed@google.com NOTRY=True Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_, repeatTile_, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Author: jvanverth@google.com Review URL: https://codereview.chromium.org/311053009
*	Add SSE4 optimization of S32A_Opaque_Blitrow	henrik.smiding	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_, repeatTile_, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
*	SK_CPU_ARM --> SK_CPU_ARM32	mtklein	2014-06-03
\| \| \| \| \| \| \| \| \| \| \|	That's what it means. It keeps confusing us as named today. BUG=skia: R=djsollen@google.com, mtklein@google.com, reed@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/314643004
*	ARM Skia NEON patches - 38 - arm64 8888 blitters	kevin.petit	2014-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable NEON on arm64 for most 8888 blitters This patch enables NEON optimisation for the Color32, S32_Blend, S32A_Opaque blitters on arm64. Here are the perf improvements vs the existing code: Color32: ======== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -2.39% \| 23.78% \| +-------+------------+------------+ \| 2 \| -5.46% \| 8.88% \| +-------+------------+------------+ \| 4 \| -4.74% \| 4.89% \| +-------+------------+------------+ \| 8 \| 67.74% \| 107.12% \| +-------+------------+------------+ \| 16 \| 40.03% \| 101.20% \| +-------+------------+------------+ \| 64 \| 11.09% \| 98.40% \| +-------+------------+------------+ \| 256 \| -2.20% \| 74.81% \| +-------+------------+------------+ \| 1024 \| -4.28% \| 78.90% \| +-------+------------+------------+ S32_Blend: ========== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| 7.84% \| -6.75% \| +-------+------------+------------+ \| 2 \| 28.95% \| 39.77% \| +-------+------------+------------+ \| 4 \| 5.80% \| 8.26% \| +-------+------------+------------+ \| 8 \| 1.35% \| 33.80% \| +-------+------------+------------+ \| 16 \| -2.13% \| 41.13% \| +-------+------------+------------+ \| 64 \| -4.91% \| 42.84% \| +-------+------------+------------+ \| 256 \| -6.53% \| 48.72% \| +-------+------------+------------+ \| 1024 \| -6.65% \| 46.66% \| +-------+------------+------------+ S32A_Opaque: ============ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -7.51% \| -19.06% \| +-------+------------+------------+ \| 2 \| -5.02% \| -27.70% \| +-------+------------+------------+ \| 4 \| 15.38% \| -21.66% \| +-------+------------+------------+ \| 8 \| -0.98% \| 1.05% \| +-------+------------+------------+ \| 16 \| -7.35% \| 3.34% \| +-------+------------+------------+ \| 64 \| 50.53% \| 94.63% \| +-------+------------+------------+ \| 256 \| 71.17% \| 164.10% \| +-------+------------+------------+ \| 1024 \| 79.58% \| 197.60% \| +-------+------------+------------+ Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/302283003
*	use colortype instead of config	reed	2014-06-02
\| \| \| \| \| \| \| \| \| \| \| \|	clone of https://codereview.chromium.org/305133006/ TBR= BUG=skia: Author: reed@google.com Review URL: https://codereview.chromium.org/301233011
*	Fixing clusterfuzz issue	commit-bot@chromium.org	2014-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	When reading an SkSSE2ProcCoeffXfermode object, fProcSIMD should never be NULL. The reason for this is that it's not possible to create such an object through SkPlatformXfermodeFactory_impl_SSE2(), which is the only function used to create these objects, so if we're reading one, it's clearly invalid. BUG=379181 R=reed@google.com, mtklein@google.com Author: sugoi@chromium.org Review URL: https://codereview.chromium.org/306183002 git-svn-id: http://skia.googlecode.com/svn/trunk@15000 2bbb7eff-a529-9590-31e7-b0007b416f81
*	replace config() with colorType()	commit-bot@chromium.org	2014-05-29
\| \| \| \| \| \| \| \| \| \| \|	BUG=skia: R=robertphillips@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/303543009 git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
*	SSE2 implementation of memcpy32	commit-bot@chromium.org	2014-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp has about 30% performance improvement. Here are the data on desktop i7-3770. before: bitmap_scale_filter_90_90 8888: cmsecs = 2.01 bitmaprect_FF_filter_trans 8888: cmsecs = 3.61 bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57 bitmaprect_FF_filter_identity 8888: cmsecs = 3.53 bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53 bitmap_4444_update 8888: cmsecs = 4.84 bitmap_4444_update_volatile 8888: cmsecs = 4.81 bitmap_4444 8888: cmsecs = 4.81 after: bitmap_scale_filter_90_90 8888: cmsecs = 1.83 bitmaprect_FF_filter_trans 8888: cmsecs = 2.36 bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36 bitmaprect_FF_filter_identity 8888: cmsecs = 2.60 bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63 bitmap_4444_update 8888: cmsecs = 3.30 bitmap_4444_update_volatile 8888: cmsecs = 3.30 bitmap_4444 8888: cmsecs = 3.29 BUG=skia: R=mtklein@google.com, reed@google.com, bsalomon@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/285313002 git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Undo troublesome SSE 4.1 change for now to unblock Skia -> Chrome roll.	commit-bot@chromium.org	2014-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	BUG=chromium:374796 NOTREECHECKS=true R=fmalita@chromium.org, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/292563005 git-svn-id: http://skia.googlecode.com/svn/trunk@14816 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Add missing include in SkBlurImage optimization	commit-bot@chromium.org	2014-05-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds the missing include for smmintrin.h in the SkBlurImage_opts_SSE2.cpp file. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> BUG=chromium:374796 TEST=Unknown R=tomhudson@chromium.org, vapier@chromium.org, reed@chromium.org, bsalomon@chromium.org, dgreid@chromium.org, dgarrett@chromium.org, michaelpg@chromium.org, vandebo@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/290923002 git-svn-id: http://skia.googlecode.com/svn/trunk@14792 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Add SSE4 check to BlurImage optimization.	commit-bot@chromium.org	2014-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a build-time SSE4 check to SkBlurImage_opts_SSE2.cpp in the SkBoxBlur_SSE2 function. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, senorblanco@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/281963002 git-svn-id: http://skia.googlecode.com/svn/trunk@14750 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Always inline Filter_32_*_neon functions	commit-bot@chromium.org	2014-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The functions are rather performance critical and already marked 'inline'. However, Chrome for Android will not have these functions inlined due to it being compiled with -Os and a small -finline-limit. This avoids one call in the filtering functions. Does not increase the library size. BUG=chromium:363073 R=mtklein@google.com Author: kkinnunen@nvidia.com Review URL: https://codereview.chromium.org/280403005 git-svn-id: http://skia.googlecode.com/svn/trunk@14709 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Make gMask_00FF00FF a constant	commit-bot@chromium.org	2014-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default symbol the constant is not known at compile time (and is not a constant), but instead is fetched through a double indirection through GOT. The function is quite hot on one of the chromium benchmarks: rasterize_and_record_micro.key_silk_cases. This change replaces the symbol with a compile-time constant. As a bonus the variable is not exported from the dynamic library, i. e. a cleaner library interface. See specific performance improvements on Android here: http://goo.gl/iMuTDt R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com Author: pasko@chromium.org Review URL: https://codereview.chromium.org/270473003 git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Improved x86 SSE build and run-time checks.	commit-bot@chromium.org	2014-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Improved x86 SSE build and run-time checks. ↵	commit-bot@chromium.org	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/272503006/) Reason for revert: Windows builders breaking. :( Original issue's description: > Improved x86 SSE build and run-time checks. > > Replaces the current build/run-time checks for SSE level in > opts_check_x86.cpp with a simpler and more future-proof version. > Also adds SSE versions 4.1 and 4.2 to the config file. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: mtklein@google.com Review URL: https://codereview.chromium.org/277593004 git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Improved x86 SSE build and run-time checks.	commit-bot@chromium.org	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Cleanup of SSE optimization files.	commit-bot@chromium.org	2014-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	General cleanup of optimization files for x86/SSEx. Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific to SSE2. Commented out the ColorRect32 optimization, since it's disabled anyway, to make it more visible. Also fixed a lot of indentation, inclusion guards, spelling, copyright headers, braces, whitespace, and sorting of includes. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/264603002 git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Remove the unused SkCachePreload_arm	commit-bot@chromium.org	2014-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/263553008 git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 36 - Color32	commit-bot@chromium.org	2014-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert Color32 to intrinsics This change is performance-neutral for high values of count and is a big improvement for values smaller than 64. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, borenet@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/258173005 git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Properly enable S32_D16_filter_DX_SSE2 optimization.	commit-bot@chromium.org	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well as fixing a color conversion bug when the subpixels are converted into RGB565 format. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14403 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of darken&lighten mode	commit-bot@chromium.org	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of the related two benchmarks will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_Lighten 8888: cmsecs = 33.60 565: cmsecs = 48.84 Xfermode_Darken 8888: cmsecs = 34.16 565: cmsecs = 48.99 after: Xfermode_Lighten 8888: cmsecs = 18.71 565: cmsecs = 25.41 Xfermode_Darken 8888: cmsecs = 18.39 565: cmsecs = 25.40 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234653002 git-svn-id: http://skia.googlecode.com/svn/trunk@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of colordodge&colorburn modes	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of the related benchmarks will improve about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on desktop i7-3770. The little performance improvement for Xfermode_ColorBurn is due to the portable version may mostly go the fast if branch while the SSE2 version do the calculation for all the three if-else branches. Here are the data: before: Xfermode_ColorDodge 8888: cmsecs = 73.71 565: cmsecs = 82.88 Xfermode_ColorBurn 8888: cmsecs = 46.46 565: cmsecs = 52.23 after: Xfermode_ColorDodge 8888: cmsecs = 39.70 565: cmsecs = 47.45 Xfermode_ColorBurn 8888: cmsecs = 45.02 565: cmsecs = 51.15 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/224823004 git-svn-id: http://skia.googlecode.com/svn/trunk@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of softlight_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_SoftLight will improve about 30% on desktop i7-3770. Here are the data: before: Xfermode_SoftLight 8888: cmsecs = 379.44 565: cmsecs = 387.74 after: Xfermode_SoftLight 8888: cmsecs = 272.29 565: cmsecs = 284.31 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/236363012 git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of difference_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Difference will improve about 60% on desktop i7-3770. Here are the data: before: Xfermode_Difference 8888: cmsecs = 51.10 565: cmsecs = 66.39 after: Xfermode_Difference 8888: cmsecs = 21.10 565: cmsecs = 29.33 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234433003 git-svn-id: http://skia.googlecode.com/svn/trunk@14375 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of hardlight mode	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_HardLight will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_HardLight 8888: cmsecs = 48.43 565: cmsecs = 63.11 after: Xfermode_HardLight 8888: cmsecs = 25.71 565: cmsecs = 33.46 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/229003004 git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of exclusion_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Exclusion will improve about 50% on desktop i7-3770. Here are the data: before: Xfermode_Exclusion 8888: cmsecs = 40.17 565: cmsecs = 55.22 after: Xfermode_Exclusion 8888: cmsecs = 18.53 565: cmsecs = 26.55 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/233733005 git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of overlay_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Overlay will improve about 35% on desktop i7-3770. Here are the data: before: Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27 after: Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232783002 git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
*	fix x86 emulator for Android framework.	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The emulator is the one case where the android framework can be compiled without SSSE3 but be expected to run on a device with SSS3. In that case we just disable all SSSE3 options to be safe. R=scroggo@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/249883004 git-svn-id: http://skia.googlecode.com/svn/trunk@14342 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Properly enable S32_D16_filter_DX_SSE2 optimization. ↵	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/239453010/) Reason for revert: Broke GMs in 565 mode. To repro: out/Debug/gm --match filterbitmap_image_mandrill -w . --config 565 open filterbitmap_image_mandrill_512.png_565.png Original issue's description: > Properly enable S32_D16_filter_DX_SSE2 optimization. > > Currently, the S32_D16_filter_DX_SSE2 optimization is only used in > configurations where the maximum SSE level is SSE2. > This patch enables it for higher levels, as well. > Also, refactored the function a bit, to make future modifications > less error-prone. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, mtklein@google.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: bsalomon@google.com Review URL: https://codereview.chromium.org/246393013 git-svn-id: http://skia.googlecode.com/svn/trunk@14336 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Properly enable S32_D16_filter_DX_SSE2 optimization.	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14333 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of a number of simple transfer modes	commit-bot@chromium.org	2014-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These modes share some common code and not very complex, so group them together. This CL yields about 50% performance improvement on desktop i7-3770. Here are the data: before: Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81 Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06 Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51 Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53 Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42 Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25 Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53 Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39 Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50 Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40 Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42 after: Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73 Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97 Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71 Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62 Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74 Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40 Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85 Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84 Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72 Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79 Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232793002 git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Real fix for SK_API / Windows shared lib problems.	mtklein@google.com	2014-04-09
\| \| \| \| \| \| \|	Ben reviewed this over my shoulder, and we tested on his machine. git-svn-id: http://skia.googlecode.com/svn/trunk@14122 2bbb7eff-a529-9590-31e7-b0007b416f81
*	SK_API for SkXfermode_opts_SSE2 so Chrome can initialize flattenables.	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \|	BUG=skia:2401 R=bungeman@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/231423003 git-svn-id: http://skia.googlecode.com/svn/trunk@14120 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of multiply_modeproc	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 22 - S32_D565_Blend	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BlitRow565: new NEON version of S32_D565_Blend This new implementation brings a good speedup in most cases and gives exact results (removes one mismatch in gm). Here are the benchmark results (speedup vs. existing S32A_D565_Blend): +-------+-----------+------------+ \| count \| Cortex-A9 \| Cortex-A15 \| +-------+-----------+------------+ \| 1 \| -26,7% \| -27,5% \| +-------+-----------+------------+ \| 2 \| 0% \| +53% \| +-------+-----------+------------+ \| 4 \| +38,3% \| +26,5% \| +-------+-----------+------------+ \| 8 \| +10,9% \| -4,5% \| +-------+-----------+------------+ \| 16 \| +18,2% \| +1,6% \| +-------+-----------+------------+ \| 64 \| +22,3% \| +8,75% \| +-------+-----------+------------+ \| 256 \| +12,3% \| +11,2% \| +-------+-----------+------------+ \| 1024 \| +79,2% \| +10,9% \| +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/181523002 git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵	commit-bot@chromium.org	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/202903004/) Reason for revert: It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 > > Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/224253003 git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of multiply_modeproc	commit-bot@chromium.org	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 35 - First AArch64 support	commit-bot@chromium.org	2014-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aarch64 support This change contains the necessary modifications to have Skia build and run properly on an ARMv8 processor in aarch64 execution state. Here's a list of the changes: - add an arm64 target to the build system + SK_CPU_ARM64 flag - MatrixTest was failing when built in Release mode. Fused MAC instructions were generated which made some intermediate results more accurate. As the test relies on result comparison, the more precise results when compared to others led to a gap bigger than what was tolerated. As I don't know if some actual skia code relies on results being comparable, I've disabled fused MAC instruction with -ffp-contract=off for arm64. - Modify include/core/SkOnce.h to have barriers work. - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. - use existing Xfermode optimisations with modifications that can be removed in the future when toolchains are ready. Also save a few instructions is two Xfermodes (will apply to ARM too). - use existing SkBoxBlur and SkMorphology optimisations. - use existing SkBlitMask optimisations - use existing BitmapProcState and Convolution optimisations. Future changes will include: - Blitters (only partialy merged upstream) - SkUtils (there's little value in sending asm optimisations without having them benchmarked on real hardware). Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: Committed: http://code.google.com/p/skia/source/detail?r=13980 R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/143423004 git-svn-id: http://skia.googlecode.com/svn/trunk@14025 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵	commit-bot@chromium.org	2014-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/202903004/) Reason for revert: Breaking builds Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/219243009 git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of multiply_modeproc	commit-bot@chromium.org	2014-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of ARM Skia NEON patches - 35 - First AArch64 support ↵	commit-bot@chromium.org	2014-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/143423004/) Reason for revert: GYP's failing on most (all?) bots. Original issue's description: > ARM Skia NEON patches - 35 - First AArch64 support > > Aarch64 support > > This change contains the necessary modifications to have Skia build and > run properly on an ARMv8 processor in aarch64 execution state. > > Here's a list of the changes: > > - add an arm64 target to the build system + SK_CPU_ARM64 flag > > - MatrixTest was failing when built in Release mode. Fused MAC > instructions were generated which made some intermediate results > more accurate. As the test relies on result comparison, the more > precise results when compared to others led to a gap bigger than > what was tolerated. As I don't know if some actual skia code relies > on results being comparable, I've disabled fused MAC instruction > with -ffp-contract=off for arm64. > > - Modify include/core/SkOnce.h to have barriers work. > > - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. > > - use existing Xfermode optimisations with modifications that can be > removed in the future when toolchains are ready. Also save a few > instructions is two Xfermodes (will apply to ARM too). > > - use existing SkBoxBlur and SkMorphology optimisations. > > - use existing SkBlitMask optimisations > > - use existing BitmapProcState and Convolution optimisations. > > Future changes will include: > > - Blitters (only partialy merged upstream) > > - SkUtils (there's little value in sending asm optimisations without > having them benchmarked on real hardware). > > Signed-off-by: Kevin PETIT <kevin.petit@arm.com> > > BUG=skia: > > Committed: http://code.google.com/p/skia/source/detail?r=13980 R=djsollen@google.com, reed@google.com, halcanary@google.com, kevin.petit@arm.com TBR=djsollen@google.com, halcanary@google.com, kevin.petit@arm.com, reed@google.com NOTREECHECKS=true NOTRY=true BUG=skia: Author: mtklein@google.com Review URL: https://codereview.chromium.org/216113005 git-svn-id: http://skia.googlecode.com/svn/trunk@13983 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 35 - First AArch64 support	commit-bot@chromium.org	2014-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aarch64 support This change contains the necessary modifications to have Skia build and run properly on an ARMv8 processor in aarch64 execution state. Here's a list of the changes: - add an arm64 target to the build system + SK_CPU_ARM64 flag - MatrixTest was failing when built in Release mode. Fused MAC instructions were generated which made some intermediate results more accurate. As the test relies on result comparison, the more precise results when compared to others led to a gap bigger than what was tolerated. As I don't know if some actual skia code relies on results being comparable, I've disabled fused MAC instruction with -ffp-contract=off for arm64. - Modify include/core/SkOnce.h to have barriers work. - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS. - use existing Xfermode optimisations with modifications that can be removed in the future when toolchains are ready. Also save a few instructions is two Xfermodes (will apply to ARM too). - use existing SkBoxBlur and SkMorphology optimisations. - use existing SkBlitMask optimisations - use existing BitmapProcState and Convolution optimisations. Future changes will include: - Blitters (only partialy merged upstream) - SkUtils (there's little value in sending asm optimisations without having them benchmarked on real hardware). Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/143423004 git-svn-id: http://skia.googlecode.com/svn/trunk@13980 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Allow toString capability to be toggled independent of developer mode.	commit-bot@chromium.org	2014-03-13
\| \| \| \| \| \| \| \| \| \| \| \|	This change is motivated by the desire to see the text information in the debugger when not in developer mode. It is structured so user's can disable it if the capability is not wanted. R=bsalomon@google.com Author: robertphillips@google.com Review URL: https://codereview.chromium.org/197763008 git-svn-id: http://skia.googlecode.com/svn/trunk@13795 2bbb7eff-a529-9590-31e7-b0007b416f81