aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/opts
Commit message (Collapse)AuthorAge
* Re-enable SSE4.Gravatar mtklein2014-06-30
| | | | | | | | | | | I will roll this into Chrome with https://codereview.chromium.org/332393003. BUG=skia: R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/357593003
* ARM Skia NEON patches - 41 - arm64: SkXfermode::xfer32Gravatar kevin.petit2014-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the NEON code for Xfermodes performs well on arm64 targets except for dstout and dstin which are significantly slower than the C code. This patch fixes this and gives further improvements on other modes. Here are some perf results: +------------+------------+------------+ | mode | Cortex-A53 | Cortex-A57 | +------------+------------+------------+ | multiply | +24.58% | +23.71% | +------------+------------+------------+ | exclusion | +22.72% | +22.05% | +------------+------------+------------+ | difference | +34.67% | +36.82% | +------------+------------+------------+ | hardlight | +17.07% | +14.74% | +------------+------------+------------+ | lighten | +38.21% | +32.87% | +------------+------------+------------+ | darken | +37.59% | +32.99% | +------------+------------+------------+ | overlay | +17.36% | +16.88% | +------------+------------+------------+ | screen | +52.56% | +54.43% | +------------+------------+------------+ | modulate | +62.85% | +61.32% | +------------+------------+------------+ | plus | +91.52% | +117.41% | +------------+------------+------------+ | xor | +42.86% | +43.38% | +------------+------------+------------+ | dstatop | +48.46% | +48.99% | +------------+------------+------------+ | srcatop | +50.50% | +48.51% | +------------+------------+------------+ | dstout | +67.83% | +78.09% | +------------+------------+------------+ | srcout | +69.02% | +78.26% | +------------+------------+------------+ | dstin | +70.92% | +79.24% | +------------+------------+------------+ | srcin | +68.90% | +78.23% | +------------+------------+------------+ | dstover | +73.80% | +68.10% | +------------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/350343002
* Disable SSE4 code.Gravatar mtklein2014-06-27
| | | | | | | | | | | | | Chrome canary failing to link chrome: http://108.170.220.120:10115/builders/Canary-Chrome-Ubuntu13.10-Ninja-x86_64-ToT/builds/1009/steps/BuildChrome/logs/stdio BUG=skia: NOTRY=true R=mtklein@google.com, rmistry@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/361493002
* Refactor bitmap scaler to make it easier to migrate rest of chrome to use itGravatar humper2014-06-27
| | | | | | | | | | | | | | Previously, the set of platform-specific function pointers to do fast convolution (e.g., neon, SSE) were passed in a structure to the scaler. I refactored this so that the scaler fills in these function pointers after it's called, so the caller doesn't have to worry about it. R=mtklein@google.com TBR=mtklein NOTRY=True Author: humper@google.com Review URL: https://codereview.chromium.org/354193002
* Add SSE4 optimization of S32A_Opaque_BlitrowGravatar henrik.smiding2014-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_*, repeatTile_*, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
* ARM Skia NEON patches - 40 - arm64: S32A_D565_OpaqueGravatar kevin.petit2014-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are some perf results: +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.54% | -5.39% | +-------+------------+------------+ | 2 | -0.66% | -2.08% | +-------+------------+------------+ | 4 | -11.13% | 0.00% | +-------+------------+------------+ | 8 | -5.79% | -1.30% | +-------+------------+------------+ | 16 | 71.60% | 93.27% | +-------+------------+------------+ | 64 | 30.99% | 57.35% | +-------+------------+------------+ | 256 | 25.41% | 52.59% | +-------+------------+------------+ | 1024 | 25.56% | 53.76% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=mtklein@google.com, djsollen@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/346843003
* Fix SkBlitRow_opts_arm so that it works on ARM v4t.Gravatar george2014-06-20
| | | | | | | | | | | Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208 R=reed@google.com, mtklein@google.com, reed1 BUG=skia: Author: george@mozilla.com Review URL: https://codereview.chromium.org/337853003
* Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵Gravatar mtklein2014-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/289473009/) NOTREECHECKS=true NOTRY=true Reason for revert: Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too: > ld: warning: could not create compact unwind for S32A_Opaque_BlitRow32_SSE4_asm: > stack subq instruction is too different from dwarf stack size > [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework", > POSTBUILDS > FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A && > (export > BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium > Framework.framework/Versions/A"; export > DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export > EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium > Framework.framework/Versions/A/Chromium Framework"; export > FULL_PRODUCT_NAME="Chromium Framework.framework"; export > INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist"; > export LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium > Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export > PRODUCT_NAME="Chromium Framework"; export > PRODUCT_TYPE=com.apple.product-type.framework; export > SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk; > export > SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome; > export SOURCE_ROOT="${SRCROOT}"; export > TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release; > export TEMP_DIR="${TMPDIR}"; export UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium > Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium > Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py > "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1" > "--branding=Chromium" && ln -fns Versions/Current/Libraries > "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" && > tools/build/mac/verify_order _ChromeMain > "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf > 'Chromium Framework.framework') && exit $G) && touch "Chromium > Framework.framework" > tools/build/mac/verify_order: unordered symbols in > /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium > Framework.framework/Versions/A/Chromium Framework: > S32A_Opaque_BlitRow32_SSE4_asm > _S32A_Opaque_BlitRow32_SSE4_asm > ninja: build stopped: subcommand failed. Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_*, repeatTile_*, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e > > Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org Author: mtklein@google.com Review URL: https://codereview.chromium.org/336413007
* Add SSE4 optimization of S32A_Opaque_BlitrowGravatar henrik.smiding2014-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_*, repeatTile_*, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
* Revert of Temporarily limit x86 SIMD to SSE2 only, to see effect on all ↵Gravatar mtklein2014-06-16
| | | | | | | | | | | | | | | | | | | | | | | | benches and bots. (https://codereview.chromium.org/331193004/) Reason for revert: Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got. Original issue's description: > Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots. > > BUG=372232 > > Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883 R=reed@google.com, mtklein@chromium.org TBR=mtklein@chromium.org, reed@google.com NOTREECHECKS=true NOTRY=true BUG=372232 Author: mtklein@google.com Review URL: https://codereview.chromium.org/332213005
* Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.Gravatar mtklein2014-06-16
| | | | | | | | | BUG=372232 R=reed@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/331193004
* MIPS: added optimization for functions from SkBlitRow.Gravatar djordje.pesut2014-06-11
| | | | | | | | | | | | | | | | | | | gain is ~40% following function are optimized: S32_D565_Blend S32A_D565_Opaque_Dither S32_D565_Opaque_Dither S32_D565_Blend_Dither S32A_D565_Opaque S32A_D565_Blend S32_Blend_BlitRow32 R=djsollen@google.com, teodora.petrovic@gmail.com Author: djordje.pesut@imgtec.com Review URL: https://codereview.chromium.org/326913004
* ARM Skia NEON patches - 39 - arm64 565 blittersGravatar kevin.petit2014-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables all 565 blitters except S32A_D565_Opaque. Here are some performance results: S32_D565_Opaque: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -18.37% | -13.04% | +-------+------------+------------+ | 2 | -9.90% | -13.78% | +-------+------------+------------+ | 4 | -8.28% | -6.77% | +-------+------------+------------+ | 8 | 157.63% | 78.15% | +-------+------------+------------+ | 16 | 72.67% | 44.81% | +-------+------------+------------+ | 64 | 76.78% | 40.89% | +-------+------------+------------+ | 256 | 73.85% | 36.05% | +-------+------------+------------+ | 1024 | 75.73% | 36.70% | +-------+------------+------------+ S32_D565_Blend: =============== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -9.99% | -13.79% | +-------+------------+------------+ | 2 | -9.17% | -6.74% | +-------+------------+------------+ | 4 | -6.73% | -4.42% | +-------+------------+------------+ | 8 | 163.31% | 112.82% | +-------+------------+------------+ | 16 | 55.21% | 44.68% | +-------+------------+------------+ | 64 | 54.09% | 41.99% | +-------+------------+------------+ | 256 | 52.63% | 40.64% | +-------+------------+------------+ | 1024 | 52.46% | 40.45% | +-------+------------+------------+ S32A_D565_Blend: ================ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -5.88% | -6.06% | +-------+------------+------------+ | 2 | -4.74% | -0.01% | +-------+------------+------------+ | 4 | -5.42% | -3.03% | +-------+------------+------------+ | 8 | 78.78% | 77.96% | +-------+------------+------------+ | 16 | 98.19% | 79.61% | +-------+------------+------------+ | 64 | 111.56% | 72.60% | +-------+------------+------------+ | 256 | 113.80% | 69.96% | +-------+------------+------------+ | 1024 | 114.42% | 70.85% | +-------+------------+------------+ S32_D565_Opaque_Dither: ======================= +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.18% | -0.93% | +-------+------------+------------+ | 2 | -2.43% | -2.04% | +-------+------------+------------+ | 4 | -1.09% | -1.23% | +-------+------------+------------+ | 8 | 184.89% | 136.53% | +-------+------------+------------+ | 16 | 128.64% | 89.11% | +-------+------------+------------+ | 64 | 132.68% | 100.98% | +-------+------------+------------+ | 256 | 157.02% | 100.86% | +-------+------------+------------+ | 1024 | 163.85% | 103.62% | +-------+------------+------------+ S32_D565_Blend_Dither: ====================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -4.87% | 0.01% | +-------+------------+------------+ | 2 | -2.71% | 2.97% | +-------+------------+------------+ | 4 | -2.20% | 0.28% | +-------+------------+------------+ | 8 | 149.76% | 146.80% | +-------+------------+------------+ | 16 | 85.69% | 95.77% | +-------+------------+------------+ | 64 | 88.81% | 101.39% | +-------+------------+------------+ | 256 | 97.32% | 107.22% | +-------+------------+------------+ | 1024 | 98.08% | 115.71% | +-------+------------+------------+ S32A_D565_Opaque_Dither: ======================== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -1.86% | 0.02% | +-------+------------+------------+ | 2 | -0.58% | -1.52% | +-------+------------+------------+ | 4 | -0.75% | 1.16% | +-------+------------+------------+ | 8 | 240.74% | 155.16% | +-------+------------+------------+ | 16 | 181.97% | 132.15% | +-------+------------+------------+ | 64 | 203.11% | 136.48% | +-------+------------+------------+ | 256 | 223.45% | 133.05% | +-------+------------+------------+ | 1024 | 225.96% | 134.05% | +-------+------------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/317193003
* Revert of Add SSE4 optimization of S32A_Opaque_Blitrow ↵Gravatar jvanverth2014-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/289473009/) Reason for revert: Buildbot failures on Mac 10.6 and Mac 10.7. R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=reed@google.com NOTRY=True Original issue's description: > Add SSE4 optimization of S32A_Opaque_Blitrow > > Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD > instruction set. Special case for when alpha is zero or opaque. > > Performance increase of 10%-400% compared to the existing SSE2 > optimization (measured on Silvermont architecture). > Noticeable in ~25 different skia bench subtests, especially in > bitmap_8888_*, repeatTile_*, and morph_*. > > bitmap_8888_A - 100% faster > bitmap_8888_A_source_transparent - 250% faster > bitmap_8888_A_source_opaque - 25% faster > bitmap_8888_A_scale_bicubic - 75% faster > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e Author: jvanverth@google.com Review URL: https://codereview.chromium.org/311053009
* Add SSE4 optimization of S32A_Opaque_BlitrowGravatar henrik.smiding2014-06-05
| | | | | | | | | | | | | | | | | | | | | | | Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD instruction set. Special case for when alpha is zero or opaque. Performance increase of 10%-400% compared to the existing SSE2 optimization (measured on Silvermont architecture). Noticeable in ~25 different skia bench subtests, especially in bitmap_8888_*, repeatTile_*, and morph_*. bitmap_8888_A - 100% faster bitmap_8888_A_source_transparent - 250% faster bitmap_8888_A_source_opaque - 25% faster bitmap_8888_A_scale_bicubic - 75% faster Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/289473009
* SK_CPU_ARM --> SK_CPU_ARM32Gravatar mtklein2014-06-03
| | | | | | | | | | | That's what it means. It keeps confusing us as named today. BUG=skia: R=djsollen@google.com, mtklein@google.com, reed@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/314643004
* ARM Skia NEON patches - 38 - arm64 8888 blittersGravatar kevin.petit2014-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable NEON on arm64 for most 8888 blitters This patch enables NEON optimisation for the Color32, S32_Blend, S32A_Opaque blitters on arm64. Here are the perf improvements vs the existing code: Color32: ======== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -2.39% | 23.78% | +-------+------------+------------+ | 2 | -5.46% | 8.88% | +-------+------------+------------+ | 4 | -4.74% | 4.89% | +-------+------------+------------+ | 8 | 67.74% | 107.12% | +-------+------------+------------+ | 16 | 40.03% | 101.20% | +-------+------------+------------+ | 64 | 11.09% | 98.40% | +-------+------------+------------+ | 256 | -2.20% | 74.81% | +-------+------------+------------+ | 1024 | -4.28% | 78.90% | +-------+------------+------------+ S32_Blend: ========== +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | 7.84% | -6.75% | +-------+------------+------------+ | 2 | 28.95% | 39.77% | +-------+------------+------------+ | 4 | 5.80% | 8.26% | +-------+------------+------------+ | 8 | 1.35% | 33.80% | +-------+------------+------------+ | 16 | -2.13% | 41.13% | +-------+------------+------------+ | 64 | -4.91% | 42.84% | +-------+------------+------------+ | 256 | -6.53% | 48.72% | +-------+------------+------------+ | 1024 | -6.65% | 46.66% | +-------+------------+------------+ S32A_Opaque: ============ +-------+------------+------------+ | count | Cortex-A53 | Cortex-A57 | +-------+------------+------------+ | 1 | -7.51% | -19.06% | +-------+------------+------------+ | 2 | -5.02% | -27.70% | +-------+------------+------------+ | 4 | 15.38% | -21.66% | +-------+------------+------------+ | 8 | -0.98% | 1.05% | +-------+------------+------------+ | 16 | -7.35% | 3.34% | +-------+------------+------------+ | 64 | 50.53% | 94.63% | +-------+------------+------------+ | 256 | 71.17% | 164.10% | +-------+------------+------------+ | 1024 | 79.58% | 197.60% | +-------+------------+------------+ Signed-off-by: Kevin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/302283003
* use colortype instead of configGravatar reed2014-06-02
| | | | | | | | | | | | clone of https://codereview.chromium.org/305133006/ TBR= BUG=skia: Author: reed@google.com Review URL: https://codereview.chromium.org/301233011
* Fixing clusterfuzz issueGravatar commit-bot@chromium.org2014-05-30
| | | | | | | | | | | | | When reading an SkSSE2ProcCoeffXfermode object, fProcSIMD should never be NULL. The reason for this is that it's not possible to create such an object through SkPlatformXfermodeFactory_impl_SSE2(), which is the only function used to create these objects, so if we're reading one, it's clearly invalid. BUG=379181 R=reed@google.com, mtklein@google.com Author: sugoi@chromium.org Review URL: https://codereview.chromium.org/306183002 git-svn-id: http://skia.googlecode.com/svn/trunk@15000 2bbb7eff-a529-9590-31e7-b0007b416f81
* replace config() with colorType()Gravatar commit-bot@chromium.org2014-05-29
| | | | | | | | | | | BUG=skia: R=robertphillips@google.com Author: reed@google.com Review URL: https://codereview.chromium.org/303543009 git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 implementation of memcpy32Gravatar commit-bot@chromium.org2014-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp has about 30% performance improvement. Here are the data on desktop i7-3770. before: bitmap_scale_filter_90_90 8888: cmsecs = 2.01 bitmaprect_FF_filter_trans 8888: cmsecs = 3.61 bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57 bitmaprect_FF_filter_identity 8888: cmsecs = 3.53 bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53 bitmap_4444_update 8888: cmsecs = 4.84 bitmap_4444_update_volatile 8888: cmsecs = 4.81 bitmap_4444 8888: cmsecs = 4.81 after: bitmap_scale_filter_90_90 8888: cmsecs = 1.83 bitmaprect_FF_filter_trans 8888: cmsecs = 2.36 bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36 bitmaprect_FF_filter_identity 8888: cmsecs = 2.60 bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63 bitmap_4444_update 8888: cmsecs = 3.30 bitmap_4444_update_volatile 8888: cmsecs = 3.30 bitmap_4444 8888: cmsecs = 3.29 BUG=skia: R=mtklein@google.com, reed@google.com, bsalomon@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/285313002 git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
* Undo troublesome SSE 4.1 change for now to unblock Skia -> Chrome roll.Gravatar commit-bot@chromium.org2014-05-20
| | | | | | | | | | | | | BUG=chromium:374796 NOTREECHECKS=true R=fmalita@chromium.org, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/292563005 git-svn-id: http://skia.googlecode.com/svn/trunk@14816 2bbb7eff-a529-9590-31e7-b0007b416f81
* Add missing include in SkBlurImage optimizationGravatar commit-bot@chromium.org2014-05-19
| | | | | | | | | | | | | | | | | Adds the missing include for smmintrin.h in the SkBlurImage_opts_SSE2.cpp file. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> BUG=chromium:374796 TEST=Unknown R=tomhudson@chromium.org, vapier@chromium.org, reed@chromium.org, bsalomon@chromium.org, dgreid@chromium.org, dgarrett@chromium.org, michaelpg@chromium.org, vandebo@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/290923002 git-svn-id: http://skia.googlecode.com/svn/trunk@14792 2bbb7eff-a529-9590-31e7-b0007b416f81
* Add SSE4 check to BlurImage optimization.Gravatar commit-bot@chromium.org2014-05-15
| | | | | | | | | | | | | | | Adds a build-time SSE4 check to SkBlurImage_opts_SSE2.cpp in the SkBoxBlur_SSE2 function. Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, senorblanco@chromium.org Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/281963002 git-svn-id: http://skia.googlecode.com/svn/trunk@14750 2bbb7eff-a529-9590-31e7-b0007b416f81
* Always inline Filter_32_*_neon functionsGravatar commit-bot@chromium.org2014-05-13
| | | | | | | | | | | | | | | | | | | The functions are rather performance critical and already marked 'inline'. However, Chrome for Android will not have these functions inlined due to it being compiled with -Os and a small -finline-limit. This avoids one call in the filtering functions. Does not increase the library size. BUG=chromium:363073 R=mtklein@google.com Author: kkinnunen@nvidia.com Review URL: https://codereview.chromium.org/280403005 git-svn-id: http://skia.googlecode.com/svn/trunk@14709 2bbb7eff-a529-9590-31e7-b0007b416f81
* Make gMask_00FF00FF a constantGravatar commit-bot@chromium.org2014-05-12
| | | | | | | | | | | | | | | | | | | | | | | This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default symbol the constant is not known at compile time (and is not a constant), but instead is fetched through a double indirection through GOT. The function is quite hot on one of the chromium benchmarks: rasterize_and_record_micro.key_silk_cases. This change replaces the symbol with a compile-time constant. As a bonus the variable is not exported from the dynamic library, i. e. a cleaner library interface. See specific performance improvements on Android here: http://goo.gl/iMuTDt R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com Author: pasko@chromium.org Review URL: https://codereview.chromium.org/270473003 git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
* Improved x86 SSE build and run-time checks.Gravatar commit-bot@chromium.org2014-05-12
| | | | | | | | | | | | | | | | | | | | Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of Improved x86 SSE build and run-time checks. ↵Gravatar commit-bot@chromium.org2014-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/272503006/) Reason for revert: Windows builders breaking. :( Original issue's description: > Improved x86 SSE build and run-time checks. > > Replaces the current build/run-time checks for SSE level in > opts_check_x86.cpp with a simpler and more future-proof version. > Also adds SSE versions 4.1 and 4.2 to the config file. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14644 R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: mtklein@google.com Review URL: https://codereview.chromium.org/277593004 git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
* Improved x86 SSE build and run-time checks.Gravatar commit-bot@chromium.org2014-05-08
| | | | | | | | | | | | | | | | | | Replaces the current build/run-time checks for SSE level in opts_check_x86.cpp with a simpler and more future-proof version. Also adds SSE versions 4.1 and 4.2 to the config file. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/272503006 git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
* Cleanup of SSE optimization files.Gravatar commit-bot@chromium.org2014-04-30
| | | | | | | | | | | | | | | | | | | | | General cleanup of optimization files for x86/SSEx. Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific to SSE2. Commented out the ColorRect32 optimization, since it's disabled anyway, to make it more visible. Also fixed a lot of indentation, inclusion guards, spelling, copyright headers, braces, whitespace, and sorting of includes. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/264603002 git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
* Remove the unused SkCachePreload_armGravatar commit-bot@chromium.org2014-04-30
| | | | | | | | | | | | | Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/263553008 git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 36 - Color32Gravatar commit-bot@chromium.org2014-04-29
| | | | | | | | | | | | | | | | | | Convert Color32 to intrinsics This change is performance-neutral for high values of count and is a big improvement for values smaller than 64. Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com, borenet@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/258173005 git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
* Properly enable S32_D16_filter_DX_SSE2 optimization.Gravatar commit-bot@chromium.org2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well as fixing a color conversion bug when the subpixels are converted into RGB565 format. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14403 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of darken&lighten modeGravatar commit-bot@chromium.org2014-04-28
| | | | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of the related two benchmarks will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_Lighten 8888: cmsecs = 33.60 565: cmsecs = 48.84 Xfermode_Darken 8888: cmsecs = 34.16 565: cmsecs = 48.99 after: Xfermode_Lighten 8888: cmsecs = 18.71 565: cmsecs = 25.41 Xfermode_Darken 8888: cmsecs = 18.39 565: cmsecs = 25.40 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234653002 git-svn-id: http://skia.googlecode.com/svn/trunk@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of colordodge&colorburn modesGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of the related benchmarks will improve about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on desktop i7-3770. The little performance improvement for Xfermode_ColorBurn is due to the portable version may mostly go the fast if branch while the SSE2 version do the calculation for all the three if-else branches. Here are the data: before: Xfermode_ColorDodge 8888: cmsecs = 73.71 565: cmsecs = 82.88 Xfermode_ColorBurn 8888: cmsecs = 46.46 565: cmsecs = 52.23 after: Xfermode_ColorDodge 8888: cmsecs = 39.70 565: cmsecs = 47.45 Xfermode_ColorBurn 8888: cmsecs = 45.02 565: cmsecs = 51.15 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/224823004 git-svn-id: http://skia.googlecode.com/svn/trunk@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of softlight_modeprocGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of Xfermode_SoftLight will improve about 30% on desktop i7-3770. Here are the data: before: Xfermode_SoftLight 8888: cmsecs = 379.44 565: cmsecs = 387.74 after: Xfermode_SoftLight 8888: cmsecs = 272.29 565: cmsecs = 284.31 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/236363012 git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of difference_modeprocGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of Xfermode_Difference will improve about 60% on desktop i7-3770. Here are the data: before: Xfermode_Difference 8888: cmsecs = 51.10 565: cmsecs = 66.39 after: Xfermode_Difference 8888: cmsecs = 21.10 565: cmsecs = 29.33 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/234433003 git-svn-id: http://skia.googlecode.com/svn/trunk@14375 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of hardlight modeGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of Xfermode_HardLight will improve about 45% on desktop i7-3770. Here are the data: before: Xfermode_HardLight 8888: cmsecs = 48.43 565: cmsecs = 63.11 after: Xfermode_HardLight 8888: cmsecs = 25.71 565: cmsecs = 33.46 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/229003004 git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of exclusion_modeprocGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of Xfermode_Exclusion will improve about 50% on desktop i7-3770. Here are the data: before: Xfermode_Exclusion 8888: cmsecs = 40.17 565: cmsecs = 55.22 after: Xfermode_Exclusion 8888: cmsecs = 18.53 565: cmsecs = 26.55 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/233733005 git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of overlay_modeprocGravatar commit-bot@chromium.org2014-04-25
| | | | | | | | | | | | | | | | | | With SSE2 optimization, performance of Xfermode_Overlay will improve about 35% on desktop i7-3770. Here are the data: before: Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27 after: Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232783002 git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
* fix x86 emulator for Android framework.Gravatar commit-bot@chromium.org2014-04-23
| | | | | | | | | | | | | | The emulator is the one case where the android framework can be compiled without SSSE3 but be expected to run on a device with SSS3. In that case we just disable all SSSE3 options to be safe. R=scroggo@google.com Author: djsollen@google.com Review URL: https://codereview.chromium.org/249883004 git-svn-id: http://skia.googlecode.com/svn/trunk@14342 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of Properly enable S32_D16_filter_DX_SSE2 optimization. ↵Gravatar commit-bot@chromium.org2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/239453010/) Reason for revert: Broke GMs in 565 mode. To repro: out/Debug/gm --match filterbitmap_image_mandrill -w . --config 565 open filterbitmap_image_mandrill_512.png_565.png Original issue's description: > Properly enable S32_D16_filter_DX_SSE2 optimization. > > Currently, the S32_D16_filter_DX_SSE2 optimization is only used in > configurations where the maximum SSE level is SSE2. > This patch enables it for higher levels, as well. > Also, refactored the function a bit, to make future modifications > less error-prone. > > Author: henrik.smiding@intel.com > > Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> > > Committed: http://code.google.com/p/skia/source/detail?r=14333 R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, mtklein@google.com, reed@google.com, tomhudson@google.com NOTREECHECKS=true NOTRY=true Author: bsalomon@google.com Review URL: https://codereview.chromium.org/246393013 git-svn-id: http://skia.googlecode.com/svn/trunk@14336 2bbb7eff-a529-9590-31e7-b0007b416f81
* Properly enable S32_D16_filter_DX_SSE2 optimization.Gravatar commit-bot@chromium.org2014-04-23
| | | | | | | | | | | | | | | | | | | | Currently, the S32_D16_filter_DX_SSE2 optimization is only used in configurations where the maximum SSE level is SSE2. This patch enables it for higher levels, as well. Also, refactored the function a bit, to make future modifications less error-prone. Author: henrik.smiding@intel.com Signed-off-by: Henrik Smiding <henrik.smiding@intel.com> R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com Author: henrik.smiding@intel.com Review URL: https://codereview.chromium.org/239453010 git-svn-id: http://skia.googlecode.com/svn/trunk@14333 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of a number of simple transfer modesGravatar commit-bot@chromium.org2014-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These modes share some common code and not very complex, so group them together. This CL yields about 50% performance improvement on desktop i7-3770. Here are the data: before: Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81 Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06 Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51 Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53 Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42 Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25 Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53 Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39 Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50 Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40 Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42 after: Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73 Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97 Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71 Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62 Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74 Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40 Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85 Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84 Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72 Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79 Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15 Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42 BUG=skia: R=mtklein@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/232793002 git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
* Real fix for SK_API / Windows shared lib problems.Gravatar mtklein@google.com2014-04-09
| | | | | | | Ben reviewed this over my shoulder, and we tested on his machine. git-svn-id: http://skia.googlecode.com/svn/trunk@14122 2bbb7eff-a529-9590-31e7-b0007b416f81
* SK_API for SkXfermode_opts_SSE2 so Chrome can initialize flattenables.Gravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | BUG=skia:2401 R=bungeman@google.com, mtklein@google.com Author: mtklein@chromium.org Review URL: https://codereview.chromium.org/231423003 git-svn-id: http://skia.googlecode.com/svn/trunk@14120 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of multiply_modeprocGravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
* ARM Skia NEON patches - 22 - S32_D565_BlendGravatar commit-bot@chromium.org2014-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BlitRow565: new NEON version of S32_D565_Blend This new implementation brings a good speedup in most cases and gives exact results (removes one mismatch in gm). Here are the benchmark results (speedup vs. existing S32A_D565_Blend): +-------+-----------+------------+ | count | Cortex-A9 | Cortex-A15 | +-------+-----------+------------+ | 1 | -26,7% | -27,5% | +-------+-----------+------------+ | 2 | 0% | +53% | +-------+-----------+------------+ | 4 | +38,3% | +26,5% | +-------+-----------+------------+ | 8 | +10,9% | -4,5% | +-------+-----------+------------+ | 16 | +18,2% | +1,6% | +-------+-----------+------------+ | 64 | +22,3% | +8,75% | +-------+-----------+------------+ | 256 | +12,3% | +11,2% | +-------+-----------+------------+ | 1024 | +79,2% | +10,9% | +-------+-----------+------------+ Signed-off-by: Kévin PETIT <kevin.petit@arm.com> BUG=skia: R=djsollen@google.com, mtklein@google.com Author: kevin.petit@arm.com Review URL: https://codereview.chromium.org/181523002 git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
* Revert of Xfermode: SSE2 implementation of multiply_modeproc ↵Gravatar commit-bot@chromium.org2014-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (https://codereview.chromium.org/202903004/) Reason for revert: It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red Original issue's description: > Xfermode: SSE2 implementation of multiply_modeproc > > This patch implements basics for Xfermode SSE optimization. Based on > these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 > implementation for other modes will come in future. With this patch > performance of Xfermode_Multiply will improve about 45%. Here are the > data on desktop i7-3770. > before: > Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 > after: > Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 > > BUG= > > Committed: http://code.google.com/p/skia/source/detail?r=14006 > > Committed: http://code.google.com/p/skia/source/detail?r=14050 R=mtklein@google.com, qiankun.miao@intel.com TBR=mtklein@google.com, qiankun.miao@intel.com NOTREECHECKS=true NOTRY=true BUG= Author: robertphillips@google.com Review URL: https://codereview.chromium.org/224253003 git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
* Xfermode: SSE2 implementation of multiply_modeprocGravatar commit-bot@chromium.org2014-04-03
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements basics for Xfermode SSE optimization. Based on these basics, SSE2 implementation of multiply_modeproc is provided. SSE2 implementation for other modes will come in future. With this patch performance of Xfermode_Multiply will improve about 45%. Here are the data on desktop i7-3770. before: Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65 after: Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87 BUG= Committed: http://code.google.com/p/skia/source/detail?r=14006 R=mtklein@google.com, robertphillips@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/202903004 git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81