| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208
R=reed@google.com, mtklein@google.com, reed1
BUG=skia:
Author: george@mozilla.com
Review URL: https://codereview.chromium.org/337853003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/289473009/)
NOTREECHECKS=true
NOTRY=true
Reason for revert:
Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too:
> ld: warning: could not create compact unwind for
S32A_Opaque_BlitRow32_SSE4_asm:
> stack subq instruction is too different from dwarf stack size
> [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework",
> POSTBUILDS
> FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A &&
> (export
> BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium
> Framework.framework/Versions/A"; export
> DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export
> EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium
> Framework.framework/Versions/A/Chromium Framework"; export
> FULL_PRODUCT_NAME="Chromium Framework.framework"; export
> INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist";
> export
LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium
> Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export
> PRODUCT_NAME="Chromium Framework"; export
> PRODUCT_TYPE=com.apple.product-type.framework; export
>
SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk;
> export
>
SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome;
> export SOURCE_ROOT="${SRCROOT}"; export
> TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export TEMP_DIR="${TMPDIR}"; export
UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium
> Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium
> Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py
> "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1"
> "--branding=Chromium" && ln -fns Versions/Current/Libraries
> "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" &&
> tools/build/mac/verify_order _ChromeMain
> "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf
> 'Chromium Framework.framework') && exit $G) && touch "Chromium
> Framework.framework"
> tools/build/mac/verify_order: unordered symbols in
> /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium
> Framework.framework/Versions/A/Chromium Framework:
> S32A_Opaque_BlitRow32_SSE4_asm
> _S32A_Opaque_BlitRow32_SSE4_asm
> ninja: build stopped: subcommand failed.
Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
>
> Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9
R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/336413007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.
Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.
bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/289473009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
benches and bots. (https://codereview.chromium.org/331193004/)
Reason for revert:
Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got.
Original issue's description:
> Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.
>
> BUG=372232
>
> Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883
R=reed@google.com, mtklein@chromium.org
TBR=mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=372232
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/332213005
|
|
|
|
|
|
|
|
|
| |
BUG=372232
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/331193004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gain is ~40%
following function are optimized:
S32_D565_Blend
S32A_D565_Opaque_Dither
S32_D565_Opaque_Dither
S32_D565_Blend_Dither
S32A_D565_Opaque
S32A_D565_Blend
S32_Blend_BlitRow32
R=djsollen@google.com, teodora.petrovic@gmail.com
Author: djordje.pesut@imgtec.com
Review URL: https://codereview.chromium.org/326913004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables all 565 blitters except S32A_D565_Opaque.
Here are some performance results:
S32_D565_Opaque:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -18.37% | -13.04% |
+-------+------------+------------+
| 2 | -9.90% | -13.78% |
+-------+------------+------------+
| 4 | -8.28% | -6.77% |
+-------+------------+------------+
| 8 | 157.63% | 78.15% |
+-------+------------+------------+
| 16 | 72.67% | 44.81% |
+-------+------------+------------+
| 64 | 76.78% | 40.89% |
+-------+------------+------------+
| 256 | 73.85% | 36.05% |
+-------+------------+------------+
| 1024 | 75.73% | 36.70% |
+-------+------------+------------+
S32_D565_Blend:
===============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -9.99% | -13.79% |
+-------+------------+------------+
| 2 | -9.17% | -6.74% |
+-------+------------+------------+
| 4 | -6.73% | -4.42% |
+-------+------------+------------+
| 8 | 163.31% | 112.82% |
+-------+------------+------------+
| 16 | 55.21% | 44.68% |
+-------+------------+------------+
| 64 | 54.09% | 41.99% |
+-------+------------+------------+
| 256 | 52.63% | 40.64% |
+-------+------------+------------+
| 1024 | 52.46% | 40.45% |
+-------+------------+------------+
S32A_D565_Blend:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -5.88% | -6.06% |
+-------+------------+------------+
| 2 | -4.74% | -0.01% |
+-------+------------+------------+
| 4 | -5.42% | -3.03% |
+-------+------------+------------+
| 8 | 78.78% | 77.96% |
+-------+------------+------------+
| 16 | 98.19% | 79.61% |
+-------+------------+------------+
| 64 | 111.56% | 72.60% |
+-------+------------+------------+
| 256 | 113.80% | 69.96% |
+-------+------------+------------+
| 1024 | 114.42% | 70.85% |
+-------+------------+------------+
S32_D565_Opaque_Dither:
=======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.18% | -0.93% |
+-------+------------+------------+
| 2 | -2.43% | -2.04% |
+-------+------------+------------+
| 4 | -1.09% | -1.23% |
+-------+------------+------------+
| 8 | 184.89% | 136.53% |
+-------+------------+------------+
| 16 | 128.64% | 89.11% |
+-------+------------+------------+
| 64 | 132.68% | 100.98% |
+-------+------------+------------+
| 256 | 157.02% | 100.86% |
+-------+------------+------------+
| 1024 | 163.85% | 103.62% |
+-------+------------+------------+
S32_D565_Blend_Dither:
======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.87% | 0.01% |
+-------+------------+------------+
| 2 | -2.71% | 2.97% |
+-------+------------+------------+
| 4 | -2.20% | 0.28% |
+-------+------------+------------+
| 8 | 149.76% | 146.80% |
+-------+------------+------------+
| 16 | 85.69% | 95.77% |
+-------+------------+------------+
| 64 | 88.81% | 101.39% |
+-------+------------+------------+
| 256 | 97.32% | 107.22% |
+-------+------------+------------+
| 1024 | 98.08% | 115.71% |
+-------+------------+------------+
S32A_D565_Opaque_Dither:
========================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -1.86% | 0.02% |
+-------+------------+------------+
| 2 | -0.58% | -1.52% |
+-------+------------+------------+
| 4 | -0.75% | 1.16% |
+-------+------------+------------+
| 8 | 240.74% | 155.16% |
+-------+------------+------------+
| 16 | 181.97% | 132.15% |
+-------+------------+------------+
| 64 | 203.11% | 136.48% |
+-------+------------+------------+
| 256 | 223.45% | 133.05% |
+-------+------------+------------+
| 1024 | 225.96% | 134.05% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/317193003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/289473009/)
Reason for revert:
Buildbot failures on Mac 10.6 and Mac 10.7.
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=reed@google.com
NOTRY=True
Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
Author: jvanverth@google.com
Review URL: https://codereview.chromium.org/311053009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.
Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.
bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/289473009
|
|
|
|
|
|
|
|
|
|
|
| |
That's what it means. It keeps confusing us as named today.
BUG=skia:
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/314643004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable NEON on arm64 for most 8888 blitters
This patch enables NEON optimisation for the Color32, S32_Blend,
S32A_Opaque blitters on arm64.
Here are the perf improvements vs the existing code:
Color32:
========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.39% | 23.78% |
+-------+------------+------------+
| 2 | -5.46% | 8.88% |
+-------+------------+------------+
| 4 | -4.74% | 4.89% |
+-------+------------+------------+
| 8 | 67.74% | 107.12% |
+-------+------------+------------+
| 16 | 40.03% | 101.20% |
+-------+------------+------------+
| 64 | 11.09% | 98.40% |
+-------+------------+------------+
| 256 | -2.20% | 74.81% |
+-------+------------+------------+
| 1024 | -4.28% | 78.90% |
+-------+------------+------------+
S32_Blend:
==========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | 7.84% | -6.75% |
+-------+------------+------------+
| 2 | 28.95% | 39.77% |
+-------+------------+------------+
| 4 | 5.80% | 8.26% |
+-------+------------+------------+
| 8 | 1.35% | 33.80% |
+-------+------------+------------+
| 16 | -2.13% | 41.13% |
+-------+------------+------------+
| 64 | -4.91% | 42.84% |
+-------+------------+------------+
| 256 | -6.53% | 48.72% |
+-------+------------+------------+
| 1024 | -6.65% | 46.66% |
+-------+------------+------------+
S32A_Opaque:
============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -7.51% | -19.06% |
+-------+------------+------------+
| 2 | -5.02% | -27.70% |
+-------+------------+------------+
| 4 | 15.38% | -21.66% |
+-------+------------+------------+
| 8 | -0.98% | 1.05% |
+-------+------------+------------+
| 16 | -7.35% | 3.34% |
+-------+------------+------------+
| 64 | 50.53% | 94.63% |
+-------+------------+------------+
| 256 | 71.17% | 164.10% |
+-------+------------+------------+
| 1024 | 79.58% | 197.60% |
+-------+------------+------------+
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/302283003
|
|
|
|
|
|
|
|
|
|
|
|
| |
clone of https://codereview.chromium.org/305133006/
TBR=
BUG=skia:
Author: reed@google.com
Review URL: https://codereview.chromium.org/301233011
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading an SkSSE2ProcCoeffXfermode object, fProcSIMD should never be NULL. The reason for this is that it's not possible to create such an object through SkPlatformXfermodeFactory_impl_SSE2(), which is the only function used to create these objects, so if we're reading one, it's clearly invalid.
BUG=379181
R=reed@google.com, mtklein@google.com
Author: sugoi@chromium.org
Review URL: https://codereview.chromium.org/306183002
git-svn-id: http://skia.googlecode.com/svn/trunk@15000 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:
R=robertphillips@google.com
Author: reed@google.com
Review URL: https://codereview.chromium.org/303543009
git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp
has about 30% performance improvement. Here are the data on desktop
i7-3770.
before:
bitmap_scale_filter_90_90 8888: cmsecs = 2.01
bitmaprect_FF_filter_trans 8888: cmsecs = 3.61
bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57
bitmaprect_FF_filter_identity 8888: cmsecs = 3.53
bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53
bitmap_4444_update 8888: cmsecs = 4.84
bitmap_4444_update_volatile 8888: cmsecs = 4.81
bitmap_4444 8888: cmsecs = 4.81
after:
bitmap_scale_filter_90_90 8888: cmsecs = 1.83
bitmaprect_FF_filter_trans 8888: cmsecs = 2.36
bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36
bitmaprect_FF_filter_identity 8888: cmsecs = 2.60
bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63
bitmap_4444_update 8888: cmsecs = 3.30
bitmap_4444_update_volatile 8888: cmsecs = 3.30
bitmap_4444 8888: cmsecs = 3.29
BUG=skia:
R=mtklein@google.com, reed@google.com, bsalomon@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/285313002
git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=chromium:374796
NOTREECHECKS=true
R=fmalita@chromium.org, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/292563005
git-svn-id: http://skia.googlecode.com/svn/trunk@14816 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the missing include for smmintrin.h in the
SkBlurImage_opts_SSE2.cpp file.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
BUG=chromium:374796
TEST=Unknown
R=tomhudson@chromium.org, vapier@chromium.org, reed@chromium.org, bsalomon@chromium.org, dgreid@chromium.org, dgarrett@chromium.org, michaelpg@chromium.org, vandebo@chromium.org
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/290923002
git-svn-id: http://skia.googlecode.com/svn/trunk@14792 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a build-time SSE4 check to SkBlurImage_opts_SSE2.cpp
in the SkBoxBlur_SSE2 function.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, senorblanco@chromium.org
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/281963002
git-svn-id: http://skia.googlecode.com/svn/trunk@14750 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions are rather performance critical and already marked
'inline'. However, Chrome for Android will not have these functions
inlined due to it being compiled with -Os and a small -finline-limit.
This avoids one call in the filtering functions.
Does not increase the library size.
BUG=chromium:363073
R=mtklein@google.com
Author: kkinnunen@nvidia.com
Review URL: https://codereview.chromium.org/280403005
git-svn-id: http://skia.googlecode.com/svn/trunk@14709 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default
symbol the constant is not known at compile time (and is not a constant), but
instead is fetched through a double indirection through GOT. The function is
quite hot on one of the chromium benchmarks:
rasterize_and_record_micro.key_silk_cases.
This change replaces the symbol with a compile-time constant. As a bonus the
variable is not exported from the dynamic library, i. e. a cleaner library
interface.
See specific performance improvements on Android here:
http://goo.gl/iMuTDt
R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com
Author: pasko@chromium.org
Review URL: https://codereview.chromium.org/270473003
git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: http://code.google.com/p/skia/source/detail?r=14644
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/272503006
git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/272503006/)
Reason for revert:
Windows builders breaking. :(
Original issue's description:
> Improved x86 SSE build and run-time checks.
>
> Replaces the current build/run-time checks for SSE level in
> opts_check_x86.cpp with a simpler and more future-proof version.
> Also adds SSE versions 4.1 and 4.2 to the config file.
>
> Author: henrik.smiding@intel.com
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: http://code.google.com/p/skia/source/detail?r=14644
R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com
NOTREECHECKS=true
NOTRY=true
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/277593004
git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/272503006
git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
General cleanup of optimization files for x86/SSEx.
Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific
to SSE2. Commented out the ColorRect32 optimization, since it's
disabled anyway, to make it more visible.
Also fixed a lot of indentation, inclusion guards, spelling,
copyright headers, braces, whitespace, and sorting of includes.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/264603002
git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/263553008
git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert Color32 to intrinsics
This change is performance-neutral for high values of count and is
a big improvement for values smaller than 64.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com, borenet@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/258173005
git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
configurations where the maximum SSE level is SSE2.
This patch enables it for higher levels, as well as fixing a color
conversion bug when the subpixels are converted into RGB565 format.
Also, refactored the function a bit, to make future modifications
less error-prone.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: http://code.google.com/p/skia/source/detail?r=14333
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/239453010
git-svn-id: http://skia.googlecode.com/svn/trunk@14403 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of the related two benchmarks will
improve about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_Lighten 8888: cmsecs = 33.60 565: cmsecs = 48.84
Xfermode_Darken 8888: cmsecs = 34.16 565: cmsecs = 48.99
after:
Xfermode_Lighten 8888: cmsecs = 18.71 565: cmsecs = 25.41
Xfermode_Darken 8888: cmsecs = 18.39 565: cmsecs = 25.40
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/234653002
git-svn-id: http://skia.googlecode.com/svn/trunk@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of the related benchmarks will improve
about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on
desktop i7-3770. The little performance improvement for
Xfermode_ColorBurn is due to the portable version may mostly go the fast
if branch while the SSE2 version do the calculation for all the three
if-else branches. Here are the data:
before:
Xfermode_ColorDodge 8888: cmsecs = 73.71 565: cmsecs = 82.88
Xfermode_ColorBurn 8888: cmsecs = 46.46 565: cmsecs = 52.23
after:
Xfermode_ColorDodge 8888: cmsecs = 39.70 565: cmsecs = 47.45
Xfermode_ColorBurn 8888: cmsecs = 45.02 565: cmsecs = 51.15
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/224823004
git-svn-id: http://skia.googlecode.com/svn/trunk@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_SoftLight will improve
about 30% on desktop i7-3770. Here are the data:
before:
Xfermode_SoftLight 8888: cmsecs = 379.44 565: cmsecs = 387.74
after:
Xfermode_SoftLight 8888: cmsecs = 272.29 565: cmsecs = 284.31
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/236363012
git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Difference will improve
about 60% on desktop i7-3770. Here are the data:
before:
Xfermode_Difference 8888: cmsecs = 51.10 565: cmsecs = 66.39
after:
Xfermode_Difference 8888: cmsecs = 21.10 565: cmsecs = 29.33
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/234433003
git-svn-id: http://skia.googlecode.com/svn/trunk@14375 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_HardLight will improve
about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_HardLight 8888: cmsecs = 48.43 565: cmsecs = 63.11
after:
Xfermode_HardLight 8888: cmsecs = 25.71 565: cmsecs = 33.46
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/229003004
git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Exclusion will improve
about 50% on desktop i7-3770. Here are the data:
before:
Xfermode_Exclusion 8888: cmsecs = 40.17 565: cmsecs = 55.22
after:
Xfermode_Exclusion 8888: cmsecs = 18.53 565: cmsecs = 26.55
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/233733005
git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Overlay will improve
about 35% on desktop i7-3770. Here are the data:
before:
Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27
after:
Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/232783002
git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The emulator is the one case where the android framework can be
compiled without SSSE3 but be expected to run on a device with
SSS3. In that case we just disable all SSSE3 options to be safe.
R=scroggo@google.com
Author: djsollen@google.com
Review URL: https://codereview.chromium.org/249883004
git-svn-id: http://skia.googlecode.com/svn/trunk@14342 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/239453010/)
Reason for revert:
Broke GMs in 565 mode. To repro:
out/Debug/gm --match filterbitmap_image_mandrill -w . --config 565
open filterbitmap_image_mandrill_512.png_565.png
Original issue's description:
> Properly enable S32_D16_filter_DX_SSE2 optimization.
>
> Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
> configurations where the maximum SSE level is SSE2.
> This patch enables it for higher levels, as well.
> Also, refactored the function a bit, to make future modifications
> less error-prone.
>
> Author: henrik.smiding@intel.com
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: http://code.google.com/p/skia/source/detail?r=14333
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, mtklein@google.com, reed@google.com, tomhudson@google.com
NOTREECHECKS=true
NOTRY=true
Author: bsalomon@google.com
Review URL: https://codereview.chromium.org/246393013
git-svn-id: http://skia.googlecode.com/svn/trunk@14336 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
configurations where the maximum SSE level is SSE2.
This patch enables it for higher levels, as well.
Also, refactored the function a bit, to make future modifications
less error-prone.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/239453010
git-svn-id: http://skia.googlecode.com/svn/trunk@14333 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These modes share some common code and not very complex, so group them
together. This CL yields about 50% performance improvement on desktop
i7-3770. Here are the data:
before:
Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81
Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06
Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51
Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53
Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42
Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25
Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53
Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39
Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50
Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40
Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42
after:
Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73
Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97
Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71
Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62
Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74
Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40
Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85
Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84
Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72
Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79
Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/232793002
git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
| |
Ben reviewed this over my shoulder, and we tested on his machine.
git-svn-id: http://skia.googlecode.com/svn/trunk@14122 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:2401
R=bungeman@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/231423003
git-svn-id: http://skia.googlecode.com/svn/trunk@14120 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=14006
Committed: http://code.google.com/p/skia/source/detail?r=14050
R=mtklein@google.com, robertphillips@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/202903004
git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: new NEON version of S32_D565_Blend
This new implementation brings a good speedup in most cases and
gives exact results (removes one mismatch in gm).
Here are the benchmark results (speedup vs. existing S32A_D565_Blend):
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | -26,7% | -27,5% |
+-------+-----------+------------+
| 2 | 0% | +53% |
+-------+-----------+------------+
| 4 | +38,3% | +26,5% |
+-------+-----------+------------+
| 8 | +10,9% | -4,5% |
+-------+-----------+------------+
| 16 | +18,2% | +1,6% |
+-------+-----------+------------+
| 64 | +22,3% | +8,75% |
+-------+-----------+------------+
| 256 | +12,3% | +11,2% |
+-------+-----------+------------+
| 1024 | +79,2% | +10,9% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/181523002
git-svn-id: http://skia.googlecode.com/svn/trunk@14103 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/202903004/)
Reason for revert:
It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red
Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
> after:
> Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006
>
> Committed: http://code.google.com/p/skia/source/detail?r=14050
R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/224253003
git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=14006
R=mtklein@google.com, robertphillips@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/202903004
git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Aarch64 support
This change contains the necessary modifications to have Skia build and
run properly on an ARMv8 processor in aarch64 execution state.
Here's a list of the changes:
- add an arm64 target to the build system + SK_CPU_ARM64 flag
- MatrixTest was failing when built in Release mode. Fused MAC
instructions were generated which made some intermediate results
more accurate. As the test relies on result comparison, the more
precise results when compared to others led to a gap bigger than
what was tolerated. As I don't know if some actual skia code relies
on results being comparable, I've disabled fused MAC instruction
with -ffp-contract=off for arm64.
- Modify include/core/SkOnce.h to have barriers work.
- SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.
- use existing Xfermode optimisations with modifications that can be
removed in the future when toolchains are ready. Also save a few
instructions is two Xfermodes (will apply to ARM too).
- use existing SkBoxBlur and SkMorphology optimisations.
- use existing SkBlitMask optimisations
- use existing BitmapProcState and Convolution optimisations.
Future changes will include:
- Blitters (only partialy merged upstream)
- SkUtils (there's little value in sending asm optimisations without
having them benchmarked on real hardware).
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
Committed: http://code.google.com/p/skia/source/detail?r=13980
R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/143423004
git-svn-id: http://skia.googlecode.com/svn/trunk@14025 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/202903004/)
Reason for revert:
Breaking builds
Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
> after:
> Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006
R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/219243009
git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
BUG=
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/202903004
git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/143423004/)
Reason for revert:
GYP's failing on most (all?) bots.
Original issue's description:
> ARM Skia NEON patches - 35 - First AArch64 support
>
> Aarch64 support
>
> This change contains the necessary modifications to have Skia build and
> run properly on an ARMv8 processor in aarch64 execution state.
>
> Here's a list of the changes:
>
> - add an arm64 target to the build system + SK_CPU_ARM64 flag
>
> - MatrixTest was failing when built in Release mode. Fused MAC
> instructions were generated which made some intermediate results
> more accurate. As the test relies on result comparison, the more
> precise results when compared to others led to a gap bigger than
> what was tolerated. As I don't know if some actual skia code relies
> on results being comparable, I've disabled fused MAC instruction
> with -ffp-contract=off for arm64.
>
> - Modify include/core/SkOnce.h to have barriers work.
>
> - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.
>
> - use existing Xfermode optimisations with modifications that can be
> removed in the future when toolchains are ready. Also save a few
> instructions is two Xfermodes (will apply to ARM too).
>
> - use existing SkBoxBlur and SkMorphology optimisations.
>
> - use existing SkBlitMask optimisations
>
> - use existing BitmapProcState and Convolution optimisations.
>
> Future changes will include:
>
> - Blitters (only partialy merged upstream)
>
> - SkUtils (there's little value in sending asm optimisations without
> having them benchmarked on real hardware).
>
> Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
>
> BUG=skia:
>
> Committed: http://code.google.com/p/skia/source/detail?r=13980
R=djsollen@google.com, reed@google.com, halcanary@google.com, kevin.petit@arm.com
TBR=djsollen@google.com, halcanary@google.com, kevin.petit@arm.com, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/216113005
git-svn-id: http://skia.googlecode.com/svn/trunk@13983 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Aarch64 support
This change contains the necessary modifications to have Skia build and
run properly on an ARMv8 processor in aarch64 execution state.
Here's a list of the changes:
- add an arm64 target to the build system + SK_CPU_ARM64 flag
- MatrixTest was failing when built in Release mode. Fused MAC
instructions were generated which made some intermediate results
more accurate. As the test relies on result comparison, the more
precise results when compared to others led to a gap bigger than
what was tolerated. As I don't know if some actual skia code relies
on results being comparable, I've disabled fused MAC instruction
with -ffp-contract=off for arm64.
- Modify include/core/SkOnce.h to have barriers work.
- SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.
- use existing Xfermode optimisations with modifications that can be
removed in the future when toolchains are ready. Also save a few
instructions is two Xfermodes (will apply to ARM too).
- use existing SkBoxBlur and SkMorphology optimisations.
- use existing SkBlitMask optimisations
- use existing BitmapProcState and Convolution optimisations.
Future changes will include:
- Blitters (only partialy merged upstream)
- SkUtils (there's little value in sending asm optimisations without
having them benchmarked on real hardware).
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/143423004
git-svn-id: http://skia.googlecode.com/svn/trunk@13980 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is motivated by the desire to see the text information in the debugger when not in developer mode. It is structured so user's can disable it if the capability is not wanted.
R=bsalomon@google.com
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/197763008
git-svn-id: http://skia.googlecode.com/svn/trunk@13795 2bbb7eff-a529-9590-31e7-b0007b416f81
|