skia - 2D graphics library

	Commit message (Collapse)	Author	Age
*	Re-enable SSE4.	mtklein	2014-06-30
\| \| \| \| \| \| \| \| \| \| \|	I will roll this into Chrome with https://codereview.chromium.org/332393003. BUG=skia: R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/357593003
*	ARM Skia NEON patches - 41 - arm64: SkXfermode::xfer32	kevin.petit	2014-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the NEON code for Xfermodes performs well on arm64 targets except for dstout and dstin which are significantly slower than the C code. This patch fixes this and gives further improvements on other modes. Here are some perf results: +------------+------------+------------+ \| mode \| Cortex-A53 \| Cortex-A57 \| +------------+------------+------------+ \| multiply \| +24.58% \| +23.71% \| +------------+------------+------------+ \| exclusion \| +22.72% \| +22.05% \| +------------+------------+------------+ \| difference \| +34.67% \| +36.82% \| +------------+------------+------------+ \| hardlight \| +17.07% \| +14.74% \| +------------+------------+------------+ \| lighten \| +38.21% \| +32.87% \| +------------+------------+------------+ \| darken \| +37.59% \| +32.99% \| +------------+------------+------------+ \| overlay \| +17.36% \| +16.88% \| +------------+------------+------------+ \| screen \| +52.56% \| +54.43% \| +------------+------------+------------+ \| modulate \| +62.85% \| +61.32% \| +------------+------------+------------+ \| plus \| +91.52% \| +117.41% \| +------------+------------+------------+ \| xor \| +42.86% \| +43.38% \| +------------+------------+------------+ \| dstatop \| +48.46% \| +48.99% \| +------------+------------+------------+ \| srcatop \| +50.50% \| +48.51% \| +------------+------------+------------+ \| dstout \| +67.83% \| +78.09% \| +------------+------------+------------+ \| srcout \| +69.02% \| +78.26% \| +------------+------------+------------+ \| dstin \| +70.92% \| +79.24% \| +------------+------------+------------+ \| srcin \| +68.90% \| +78.23% \| +------------+------------+------------+ \| dstover \| +73.80% \| +68.10% \| +------------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/350343002
*	Disable SSE4 code.	mtklein	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Chrome canary failing to link chrome: http://108.170.220.120:10115/builders/Canary-Chrome-Ubuntu13.10-Ninja-x86_64-ToT/builds/1009/steps/BuildChrome/logs/stdio BUG=skia: NOTRY=true R=mtklein@google.com, rmistry@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/361493002
*	Refactor bitmap scaler to make it easier to migrate rest of chrome to use it	humper	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the set of platform-specific function pointers to do fast convolution (e.g., neon, SSE) were passed in a structure to the scaler. I refactored this so that the scaler fills in these function pointers after it's called, so the caller doesn't have to worry about it. R=mtklein@google.com TBR=mtklein NOTRY=True Author: humper@google.com Review URL: https://codereview.chromium.org/354193002
*	Add SSE4 optimization of S32A_Opaque_Blitrow	henrik.smiding	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_, repeatTile_, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
*	ARM Skia NEON patches - 40 - arm64: S32A_D565_Opaque	kevin.petit	2014-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here are some perf results: +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -2.54% \| -5.39% \| +-------+------------+------------+ \| 2 \| -0.66% \| -2.08% \| +-------+------------+------------+ \| 4 \| -11.13% \| 0.00% \| +-------+------------+------------+ \| 8 \| -5.79% \| -1.30% \| +-------+------------+------------+ \| 16 \| 71.60% \| 93.27% \| +-------+------------+------------+ \| 64 \| 30.99% \| 57.35% \| +-------+------------+------------+ \| 256 \| 25.41% \| 52.59% \| +-------+------------+------------+ \| 1024 \| 25.56% \| 53.76% \| +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/346843003
*	Fix SkBlitRow_opts_arm so that it works on ARM v4t.	george	2014-06-20
\| \| \| \| \| \| \| \| \| \| \|	Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208 R=reed@google.com, mtklein@google.com, reed1 BUG=skia: Author: george@mozilla.com Review URL: https://codereview.chromium.org/337853003
*	Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵	mtklein	2014-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/289473009/) NOTREECHECKS=true NOTRY=true Reason for revert: Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too: > ld: warning: could not create compact unwind for S32A_Opaque_BlitRow32_SSE4_asm: > stack subq instruction is too different from dwarf stack size > [10339/10982 \| 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework", > POSTBUILDS > FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A && > (export > BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium > Framework.framework/Versions/A"; export > DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export > EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium > Framework.framework/Versions/A/Chromium Framework"; export > FULL_PRODUCT_NAME="Chromium Framework.framework"; export > INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist"; > export LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium > Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export > PRODUCT_NAME="Chromium Framework"; export > PRODUCT_TYPE=com.apple.product-type.framework; export > SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk; > export > SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome; > export SOURCE_ROOT="${SRCROOT}"; export > TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export TEMP_DIR="${TMPDIR}"; export UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium > Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium > Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py > "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1" > "--branding=Chromium" && ln -fns Versions/Current/Libraries > "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" && > tools/build/mac/verify_order _ChromeMain > "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) \|\| rm -rf > 'Chromium Framework.framework') && exit $G) && touch "Chromium > Framework.framework" > tools/build/mac/verify_order: unordered symbols in > /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium > Framework.framework/Versions/A/Chromium Framework: > S32A_Opaque_BlitRow32_SSE4_asm > _S32A_Opaque_BlitRow32_SSE4_asm > ninja: build stopped: subcommand failed. Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_, repeatTile_, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e > > Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org Author: mtklein@google.com Review URL: https://codereview.chromium.org/336413007
*	Add SSE4 optimization of S32A_Opaque_Blitrow	henrik.smiding	2014-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_, repeatTile_, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
*	Revert of Temporarily limit x86 SIMD to SSE2 only, to see effect on all ↵	mtklein	2014-06-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	benches and bots. (https://codereview.chromium.org/331193004/) Reason for revert: Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got. Original issue's description: > Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots. > > BUG=372232 > > Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883 R=reed@google.com, mtklein@chromium.org TBR=mtklein@chromium.org, reed@google.com NOTREECHECKS=true NOTRY=true BUG=372232 Author: mtklein@google.com Review URL: https://codereview.chromium.org/332213005
*	Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.	mtklein	2014-06-16
\| \| \| \| \| \| \| \| \|	BUG=372232 R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/331193004
*	MIPS: added optimization for functions from SkBlitRow.	djordje.pesut	2014-06-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gain is ~40% following function are optimized: S32_D565_Blend S32A_D565_Opaque_Dither S32_D565_Opaque_Dither S32_D565_Blend_Dither S32A_D565_Opaque S32A_D565_Blend S32_Blend_BlitRow32 R=djsollen@google.com, teodora.petrovic@gmail.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/326913004
*	ARM Skia NEON patches - 39 - arm64 565 blitters	kevin.petit	2014-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -18.37% \| -13.04% \| +-------+------------+------------+ \| 2 \| -9.90% \| -13.78% \| +-------+------------+------------+ \| 4 \| -8.28% \| -6.77% \| +-------+------------+------------+ \| 8 \| 157.63% \| 78.15% \| +-------+------------+------------+ \| 16 \| 72.67% \| 44.81% \| +-------+------------+------------+ \| 64 \| 76.78% \| 40.89% \| +-------+------------+------------+ \| 256 \| 73.85% \| 36.05% \| +-------+------------+------------+ \| 1024 \| 75.73% \| 36.70% \| +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -9.99% \| -13.79% \| +-------+------------+------------+ \| 2 \| -9.17% \| -6.74% \| +-------+------------+------------+ \| 4 \| -6.73% \| -4.42% \| +-------+------------+------------+ \| 8 \| 163.31% \| 112.82% \| +-------+------------+------------+ \| 16 \| 55.21% \| 44.68% \| +-------+------------+------------+ \| 64 \| 54.09% \| 41.99% \| +-------+------------+------------+ \| 256 \| 52.63% \| 40.64% \| +-------+------------+------------+ \| 1024 \| 52.46% \| 40.45% \| +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -5.88% \| -6.06% \| +-------+------------+------------+ \| 2 \| -4.74% \| -0.01% \| +-------+------------+------------+ \| 4 \| -5.42% \| -3.03% \| +-------+------------+------------+ \| 8 \| 78.78% \| 77.96% \| +-------+------------+------------+ \| 16 \| 98.19% \| 79.61% \| +-------+------------+------------+ \| 64 \| 111.56% \| 72.60% \| +-------+------------+------------+ \| 256 \| 113.80% \| 69.96% \| +-------+------------+------------+ \| 1024 \| 114.42% \| 70.85% \| +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -4.18% \| -0.93% \| +-------+------------+------------+ \| 2 \| -2.43% \| -2.04% \| +-------+------------+------------+ \| 4 \| -1.09% \| -1.23% \| +-------+------------+------------+ \| 8 \| 184.89% \| 136.53% \| +-------+------------+------------+ \| 16 \| 128.64% \| 89.11% \| +-------+------------+------------+ \| 64 \| 132.68% \| 100.98% \| +-------+------------+------------+ \| 256 \| 157.02% \| 100.86% \| +-------+------------+------------+ \| 1024 \| 163.85% \| 103.62% \| +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -4.87% \| 0.01% \| +-------+------------+------------+ \| 2 \| -2.71% \| 2.97% \| +-------+------------+------------+ \| 4 \| -2.20% \| 0.28% \| +-------+------------+------------+ \| 8 \| 149.76% \| 146.80% \| +-------+------------+------------+ \| 16 \| 85.69% \| 95.77% \| +-------+------------+------------+ \| 64 \| 88.81% \| 101.39% \| +-------+------------+------------+ \| 256 \| 97.32% \| 107.22% \| +-------+------------+------------+ \| 1024 \| 98.08% \| 115.71% \| +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -1.86% \| 0.02% \| +-------+------------+------------+ \| 2 \| -0.58% \| -1.52% \| +-------+------------+------------+ \| 4 \| -0.75% \| 1.16% \| +-------+------------+------------+ \| 8 \| 240.74% \| 155.16% \| +-------+------------+------------+ \| 16 \| 181.97% \| 132.15% \| +-------+------------+------------+ \| 64 \| 203.11% \| 136.48% \| +-------+------------+------------+ \| 256 \| 223.45% \| 133.05% \| +-------+------------+------------+ \| 1024 \| 225.96% \| 134.05% \| +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
*	Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵	jvanverth	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/289473009/) Reason for revert: Buildbot failures on Mac 10.6 and Mac 10.7. R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=reed@google.com NOTRY=True Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_, repeatTile_, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Author: jvanverth@google.com Review URL: https://codereview.chromium.org/311053009
*	Add SSE4 optimization of S32A_Opaque_Blitrow	henrik.smiding	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_, repeatTile_, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
*	SK_CPU_ARM --> SK_CPU_ARM32	mtklein	2014-06-03
\| \| \| \| \| \| \| \| \| \| \|	That's what it means. It keeps confusing us as named today. BUG=skia: R=djsollen@google.com, mtklein@google.com, reed@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/314643004
*	ARM Skia NEON patches - 38 - arm64 8888 blitters	kevin.petit	2014-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable NEON on arm64 for most 8888 blitters This patch enables NEON optimisation for the Color32, S32_Blend, S32A_Opaque blitters on arm64. Here are the perf improvements vs the existing code: Color32: ======== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -2.39% \| 23.78% \| +-------+------------+------------+ \| 2 \| -5.46% \| 8.88% \| +-------+------------+------------+ \| 4 \| -4.74% \| 4.89% \| +-------+------------+------------+ \| 8 \| 67.74% \| 107.12% \| +-------+------------+------------+ \| 16 \| 40.03% \| 101.20% \| +-------+------------+------------+ \| 64 \| 11.09% \| 98.40% \| +-------+------------+------------+ \| 256 \| -2.20% \| 74.81% \| +-------+------------+------------+ \| 1024 \| -4.28% \| 78.90% \| +-------+------------+------------+ S32_Blend: ========== +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| 7.84% \| -6.75% \| +-------+------------+------------+ \| 2 \| 28.95% \| 39.77% \| +-------+------------+------------+ \| 4 \| 5.80% \| 8.26% \| +-------+------------+------------+ \| 8 \| 1.35% \| 33.80% \| +-------+------------+------------+ \| 16 \| -2.13% \| 41.13% \| +-------+------------+------------+ \| 64 \| -4.91% \| 42.84% \| +-------+------------+------------+ \| 256 \| -6.53% \| 48.72% \| +-------+------------+------------+ \| 1024 \| -6.65% \| 46.66% \| +-------+------------+------------+ S32A_Opaque: ============ +-------+------------+------------+ \| count \| Cortex-A53 \| Cortex-A57 \| +-------+------------+------------+ \| 1 \| -7.51% \| -19.06% \| +-------+------------+------------+ \| 2 \| -5.02% \| -27.70% \| +-------+------------+------------+ \| 4 \| 15.38% \| -21.66% \| +-------+------------+------------+ \| 8 \| -0.98% \| 1.05% \| +-------+------------+------------+ \| 16 \| -7.35% \| 3.34% \| +-------+------------+------------+ \| 64 \| 50.53% \| 94.63% \| +-------+------------+------------+ \| 256 \| 71.17% \| 164.10% \| +-------+------------+------------+ \| 1024 \| 79.58% \| 197.60% \| +-------+------------+------------+ Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/302283003
*	use colortype instead of config	reed	2014-06-02
\| \| \| \| \| \| \| \| \| \| \| \|	clone of https://codereview.chromium.org/305133006/ TBR= BUG=skia: Author: reed@google.com Review URL: https://codereview.chromium.org/301233011
*	Fixing clusterfuzz issue	commit-bot@chromium.org	2014-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	When reading an SkSSE2ProcCoeffXfermode object, fProcSIMD should never be NULL. The reason for this is that it's not possible to create such an object through SkPlatformXfermodeFactory_impl_SSE2(), which is the only function used to create these objects, so if we're reading one, it's clearly invalid. BUG=379181 R=reed@google.com, mtklein@google.com Author: sugoi@chromium.org Review URL: https://codereview.chromium.org/306183002 git-svn-id: http://skia.googlecode.com/svn/trunk@15000 2bbb7eff-a529-9590-31e7-b0007b416f81
*	replace config() with colorType()	commit-bot@chromium.org	2014-05-29
\| \| \| \| \| \| \| \| \| \| \|	BUG=skia: R=robertphillips@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/303543009 git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
*	SSE2 implementation of memcpy32	commit-bot@chromium.org	2014-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp has about 30% performance improvement. Here are the data on desktop i7-3770. before: bitmap_scale_filter_90_90 8888: cmsecs = 2.01 bitmaprect_FF_filter_trans 8888: cmsecs = 3.61 bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57 bitmaprect_FF_filter_identity 8888: cmsecs = 3.53 bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53 bitmap_4444_update 8888: cmsecs = 4.84 bitmap_4444_update_volatile 8888: cmsecs = 4.81 bitmap_4444 8888: cmsecs = 4.81 after: bitmap_scale_filter_90_90 8888: cmsecs = 1.83 bitmaprect_FF_filter_trans 8888: cmsecs = 2.36 bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36 bitmaprect_FF_filter_identity 8888: cmsecs = 2.60 bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63 bitmap_4444_update 8888: cmsecs = 3.30 bitmap_4444_update_volatile 8888: cmsecs = 3.30 bitmap_4444 8888: cmsecs = 3.29 BUG=skia: R=mtklein@google.com, reed@google.com, bsalomon@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/285313002 git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Undo troublesome SSE 4.1 change for now to unblock Skia -> Chrome roll.	commit-bot@chromium.org	2014-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	BUG=chromium:374796 NOTREECHECKS=true R=fmalita@chromium.org, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/292563005 git-svn-id: http://skia.googlecode.com/svn/trunk@14816 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Add missing include in SkBlurImage optimization	commit-bot@chromium.org	2014-05-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds the missing include for smmintrin.h in the SkBlurImage_opts_SSE2.cpp file. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> BUG=chromium:374796 TEST=Unknown R=tomhudson@chromium.org, vapier@chromium.org, reed@chromium.org, bsalomon@chromium.org, dgreid@chromium.org, dgarrett@chromium.org, michaelpg@chromium.org, vandebo@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/290923002 git-svn-id: http://skia.googlecode.com/svn/trunk@14792 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Add SSE4 check to BlurImage optimization.	commit-bot@chromium.org	2014-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a build-time SSE4 check to SkBlurImage_opts_SSE2.cpp in the SkBoxBlur_SSE2 function. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, senorblanco@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/281963002 git-svn-id: http://skia.googlecode.com/svn/trunk@14750 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Always inline Filter_32_*_neon functions	commit-bot@chromium.org	2014-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The functions are rather performance critical and already marked 'inline'. However, Chrome for Android will not have these functions inlined due to it being compiled with -Os and a small -finline-limit. This avoids one call in the filtering functions. Does not increase the library size. BUG=chromium:363073 R=mtklein@google.com Author: kkinnunen@nvidia.com Review URL: https://codereview.chromium.org/280403005 git-svn-id: http://skia.googlecode.com/svn/trunk@14709 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Make gMask_00FF00FF a constant	commit-bot@chromium.org	2014-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default symbol the constant is not known at compile time (and is not a constant), but instead is fetched through a double indirection through GOT. The function is quite hot on one of the chromium benchmarks: rasterize_and_record_micro.key_silk_cases. This change replaces the symbol with a compile-time constant. As a bonus the variable is not exported from the dynamic library, i. e. a cleaner library interface. See specific performance improvements on Android here: http://goo.gl/iMuTDt R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com Author: pasko@chromium.org Review URL: https://codereview.chromium.org/270473003 git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Improved x86 SSE build and run-time checks.	commit-bot@chromium.org	2014-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Improved x86 SSE build and run-time checks. ↵	commit-bot@chromium.org	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/272503006/) Reason for revert: Windows builders breaking. :( Original issue's description: > Improved x86 SSE build and run-time checks. > > Replaces the current build/run-time checks for SSE level in > opts_check_x86.cpp with a simpler and more future-proof version. > Also adds SSE versions 4.1 and 4.2 to the config file. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: mtklein@google.com Review URL: https://codereview.chromium.org/277593004 git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Improved x86 SSE build and run-time checks.	commit-bot@chromium.org	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Cleanup of SSE optimization files.	commit-bot@chromium.org	2014-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	General cleanup of optimization files for x86/SSEx. Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific to SSE2. Commented out the ColorRect32 optimization, since it's disabled anyway, to make it more visible. Also fixed a lot of indentation, inclusion guards, spelling, copyright headers, braces, whitespace, and sorting of includes. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/264603002 git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Remove the unused SkCachePreload_arm	commit-bot@chromium.org	2014-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/263553008 git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 36 - Color32	commit-bot@chromium.org	2014-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert Color32 to intrinsics This change is performance-neutral for high values of count and is a big improvement for values smaller than 64. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, borenet@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/258173005 git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Properly enable S32_D16_filter_DX_SSE2 optimization.	commit-bot@chromium.org	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well as fixing a color conversion bug when the subpixels are converted into RGB565 format. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14403 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of darken&lighten mode	commit-bot@chromium.org	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of the related two benchmarks will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_Lighten 8888: cmsecs = 33.60 565: cmsecs = 48.84 Xfermode_Darken 8888: cmsecs = 34.16 565: cmsecs = 48.99 after: Xfermode_Lighten 8888: cmsecs = 18.71 565: cmsecs = 25.41 Xfermode_Darken 8888: cmsecs = 18.39 565: cmsecs = 25.40 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234653002 git-svn-id: http://skia.googlecode.com/svn/trunk@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of colordodge&colorburn modes	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of the related benchmarks will improve about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on desktop i7-3770. The little performance improvement for Xfermode_ColorBurn is due to the portable version may mostly go the fast if branch while the SSE2 version do the calculation for all the three if-else branches. Here are the data: before: Xfermode_ColorDodge 8888: cmsecs = 73.71 565: cmsecs = 82.88 Xfermode_ColorBurn 8888: cmsecs = 46.46 565: cmsecs = 52.23 after: Xfermode_ColorDodge 8888: cmsecs = 39.70 565: cmsecs = 47.45 Xfermode_ColorBurn 8888: cmsecs = 45.02 565: cmsecs = 51.15 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/224823004 git-svn-id: http://skia.googlecode.com/svn/trunk@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of softlight_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_SoftLight will improve about 30% on desktop i7-3770. Here are the data: before: Xfermode_SoftLight 8888: cmsecs = 379.44 565: cmsecs = 387.74 after: Xfermode_SoftLight 8888: cmsecs = 272.29 565: cmsecs = 284.31 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/236363012 git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of difference_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Difference will improve about 60% on desktop i7-3770. Here are the data: before: Xfermode_Difference 8888: cmsecs = 51.10 565: cmsecs = 66.39 after: Xfermode_Difference 8888: cmsecs = 21.10 565: cmsecs = 29.33 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234433003 git-svn-id: http://skia.googlecode.com/svn/trunk@14375 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of hardlight mode	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_HardLight will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_HardLight 8888: cmsecs = 48.43 565: cmsecs = 63.11 after: Xfermode_HardLight 8888: cmsecs = 25.71 565: cmsecs = 33.46 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/229003004 git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of exclusion_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Exclusion will improve about 50% on desktop i7-3770. Here are the data: before: Xfermode_Exclusion 8888: cmsecs = 40.17 565: cmsecs = 55.22 after: Xfermode_Exclusion 8888: cmsecs = 18.53 565: cmsecs = 26.55 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/233733005 git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of overlay_modeproc	commit-bot@chromium.org	2014-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With SSE2 optimization, performance of Xfermode_Overlay will improve about 35% on desktop i7-3770. Here are the data: before: Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27 after: Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232783002 git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
*	fix x86 emulator for Android framework.	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The emulator is the one case where the android framework can be compiled without SSSE3 but be expected to run on a device with SSS3. In that case we just disable all SSSE3 options to be safe. R=scroggo@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/249883004 git-svn-id: http://skia.googlecode.com/svn/trunk@14342 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Properly enable S32_D16_filter_DX_SSE2 optimization. ↵	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/239453010/) Reason for revert: Broke GMs in 565 mode. To repro: out/Debug/gm --match filterbitmap_image_mandrill -w . --config 565 open filterbitmap_image_mandrill_512.png_565.png Original issue's description: > Properly enable S32_D16_filter_DX_SSE2 optimization. > > Currently, the S32_D16_filter_DX_SSE2 optimization is only used in > configurations where the maximum SSE level is SSE2. > This patch enables it for higher levels, as well. > Also, refactored the function a bit, to make future modifications > less error-prone. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, mtklein@google.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: bsalomon@google.com Review URL: https://codereview.chromium.org/246393013 git-svn-id: http://skia.googlecode.com/svn/trunk@14336 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Properly enable S32_D16_filter_DX_SSE2 optimization.	commit-bot@chromium.org	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14333 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of a number of simple transfer modes	commit-bot@chromium.org	2014-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These modes share some common code and not very complex, so group them together. This CL yields about 50% performance improvement on desktop i7-3770. Here are the data: before: Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81 Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06 Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51 Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53 Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42 Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25 Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53 Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39 Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50 Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40 Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42 after: Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73 Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97 Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71 Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62 Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74 Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40 Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85 Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84 Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72 Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79 Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232793002 git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Real fix for SK_API / Windows shared lib problems.	mtklein@google.com	2014-04-09
\| \| \| \| \| \| \|	Ben reviewed this over my shoulder, and we tested on his machine. git-svn-id: http://skia.googlecode.com/svn/trunk@14122 2bbb7eff-a529-9590-31e7-b0007b416f81
*	SK_API for SkXfermode_opts_SSE2 so Chrome can initialize flattenables.	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \|	BUG=skia:2401 R=bungeman@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/231423003 git-svn-id: http://skia.googlecode.com/svn/trunk@14120 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of multiply_modeproc	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
*	ARM Skia NEON patches - 22 - S32_D565_Blend	commit-bot@chromium.org	2014-04-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BlitRow565: new NEON version of S32_D565_Blend This new implementation brings a good speedup in most cases and gives exact results (removes one mismatch in gm). Here are the benchmark results (speedup vs. existing S32A_D565_Blend): +-------+-----------+------------+ \| count \| Cortex-A9 \| Cortex-A15 \| +-------+-----------+------------+ \| 1 \| -26,7% \| -27,5% \| +-------+-----------+------------+ \| 2 \| 0% \| +53% \| +-------+-----------+------------+ \| 4 \| +38,3% \| +26,5% \| +-------+-----------+------------+ \| 8 \| +10,9% \| -4,5% \| +-------+-----------+------------+ \| 16 \| +18,2% \| +1,6% \| +-------+-----------+------------+ \| 64 \| +22,3% \| +8,75% \| +-------+-----------+------------+ \| 256 \| +12,3% \| +11,2% \| +-------+-----------+------------+ \| 1024 \| +79,2% \| +10,9% \| +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/181523002 git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵	commit-bot@chromium.org	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(https://codereview.chromium.org/202903004/) Reason for revert: It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 > > Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/224253003 git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
*	Xfermode: SSE2 implementation of multiply_modeproc	commit-bot@chromium.org	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81