| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
BUG=391016
R=tomhudson@chromium.org, mtklein@google.com, rnk@chromium.org, thakis@chromium.org
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/363983004
|
|
|
|
|
|
|
|
|
|
| |
MemorySanitizer is an unitialized memory use detector which is used in
Chromium, and does not presently support assembly code.
BUG=chromium:344505, chromium:373739
R=mtklein@google.com
Review URL: https://codereview.chromium.org/367973005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Marks the symbols in the S32A_Opaque_BlitRow32_SSE4 files as hidden,
so Chromium can build.
Also enables the optimizations.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=mtklein@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/368573002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reason for revert:
Mac Chrome builders still failing.
Original issue's description:
> Re-enable SSE4.
>
> I will roll this into Chrome with https://codereview.chromium.org/332393003.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/a75b0fadbdec4214afec6dd727fd224d34ed164f
R=reed@google.com, mtklein@chromium.org
TBR=mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/337093004
|
|
|
|
|
|
|
|
|
|
|
| |
I will roll this into Chrome with https://codereview.chromium.org/332393003.
BUG=skia:
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/357593003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the NEON code for Xfermodes performs well on arm64
targets except for dstout and dstin which are significantly
slower than the C code. This patch fixes this and gives
further improvements on other modes.
Here are some perf results:
+------------+------------+------------+
| mode | Cortex-A53 | Cortex-A57 |
+------------+------------+------------+
| multiply | +24.58% | +23.71% |
+------------+------------+------------+
| exclusion | +22.72% | +22.05% |
+------------+------------+------------+
| difference | +34.67% | +36.82% |
+------------+------------+------------+
| hardlight | +17.07% | +14.74% |
+------------+------------+------------+
| lighten | +38.21% | +32.87% |
+------------+------------+------------+
| darken | +37.59% | +32.99% |
+------------+------------+------------+
| overlay | +17.36% | +16.88% |
+------------+------------+------------+
| screen | +52.56% | +54.43% |
+------------+------------+------------+
| modulate | +62.85% | +61.32% |
+------------+------------+------------+
| plus | +91.52% | +117.41% |
+------------+------------+------------+
| xor | +42.86% | +43.38% |
+------------+------------+------------+
| dstatop | +48.46% | +48.99% |
+------------+------------+------------+
| srcatop | +50.50% | +48.51% |
+------------+------------+------------+
| dstout | +67.83% | +78.09% |
+------------+------------+------------+
| srcout | +69.02% | +78.26% |
+------------+------------+------------+
| dstin | +70.92% | +79.24% |
+------------+------------+------------+
| srcin | +68.90% | +78.23% |
+------------+------------+------------+
| dstover | +73.80% | +68.10% |
+------------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia
R=mtklein@google.com, djsollen@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/350343002
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chrome canary failing to link chrome:
http://108.170.220.120:10115/builders/Canary-Chrome-Ubuntu13.10-Ninja-x86_64-ToT/builds/1009/steps/BuildChrome/logs/stdio
BUG=skia:
NOTRY=true
R=mtklein@google.com, rmistry@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/361493002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the set of platform-specific function pointers to do fast convolution (e.g., neon, SSE) were passed in a structure to the scaler.
I refactored this so that the scaler fills in these function pointers after it's called, so the caller doesn't have to worry about it.
R=mtklein@google.com
TBR=mtklein
NOTRY=True
Author: humper@google.com
Review URL: https://codereview.chromium.org/354193002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.
Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.
bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/289473009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here are some perf results:
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.54% | -5.39% |
+-------+------------+------------+
| 2 | -0.66% | -2.08% |
+-------+------------+------------+
| 4 | -11.13% | 0.00% |
+-------+------------+------------+
| 8 | -5.79% | -1.30% |
+-------+------------+------------+
| 16 | 71.60% | 93.27% |
+-------+------------+------------+
| 64 | 30.99% | 57.35% |
+-------+------------+------------+
| 256 | 25.41% | 52.59% |
+-------+------------+------------+
| 1024 | 25.56% | 53.76% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=mtklein@google.com, djsollen@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/346843003
|
|
|
|
|
|
|
|
|
|
|
| |
Original Mozilla bug: https://bugzilla.mozilla.org/show_bug.cgi?id=901208
R=reed@google.com, mtklein@google.com, reed1
BUG=skia:
Author: george@mozilla.com
Review URL: https://codereview.chromium.org/337853003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/289473009/)
NOTREECHECKS=true
NOTRY=true
Reason for revert:
Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too:
> ld: warning: could not create compact unwind for
S32A_Opaque_BlitRow32_SSE4_asm:
> stack subq instruction is too different from dwarf stack size
> [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework",
> POSTBUILDS
> FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A &&
> (export
> BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium
> Framework.framework/Versions/A"; export
> DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export
> EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium
> Framework.framework/Versions/A/Chromium Framework"; export
> FULL_PRODUCT_NAME="Chromium Framework.framework"; export
> INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist";
> export
LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium
> Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export
> PRODUCT_NAME="Chromium Framework"; export
> PRODUCT_TYPE=com.apple.product-type.framework; export
>
SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk;
> export
>
SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome;
> export SOURCE_ROOT="${SRCROOT}"; export
> TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export TEMP_DIR="${TMPDIR}"; export
UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium
> Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium
> Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py
> "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1"
> "--branding=Chromium" && ln -fns Versions/Current/Libraries
> "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" &&
> tools/build/mac/verify_order _ChromeMain
> "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf
> 'Chromium Framework.framework') && exit $G) && touch "Chromium
> Framework.framework"
> tools/build/mac/verify_order: unordered symbols in
> /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium
> Framework.framework/Versions/A/Chromium Framework:
> S32A_Opaque_BlitRow32_SSE4_asm
> _S32A_Opaque_BlitRow32_SSE4_asm
> ninja: build stopped: subcommand failed.
Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
>
> Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9
R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/336413007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.
Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.
bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/289473009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
benches and bots. (https://codereview.chromium.org/331193004/)
Reason for revert:
Experiment is over: disabling SSSE3 is a 25-50% perf regression for bitmap scaling on every machine we've got.
Original issue's description:
> Temporarily limit x86 SIMD to SSE2 only, to see effect on all benches and bots.
>
> BUG=372232
>
> Committed: https://skia.googlesource.com/skia/+/f1e5a04832e4d350f9ebf5d556c6d3897345f883
R=reed@google.com, mtklein@chromium.org
TBR=mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=372232
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/332213005
|
|
|
|
|
|
|
|
|
| |
BUG=372232
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/331193004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gain is ~40%
following function are optimized:
S32_D565_Blend
S32A_D565_Opaque_Dither
S32_D565_Opaque_Dither
S32_D565_Blend_Dither
S32A_D565_Opaque
S32A_D565_Blend
S32_Blend_BlitRow32
R=djsollen@google.com, teodora.petrovic@gmail.com
Author: djordje.pesut@imgtec.com
Review URL: https://codereview.chromium.org/326913004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables all 565 blitters except S32A_D565_Opaque.
Here are some performance results:
S32_D565_Opaque:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -18.37% | -13.04% |
+-------+------------+------------+
| 2 | -9.90% | -13.78% |
+-------+------------+------------+
| 4 | -8.28% | -6.77% |
+-------+------------+------------+
| 8 | 157.63% | 78.15% |
+-------+------------+------------+
| 16 | 72.67% | 44.81% |
+-------+------------+------------+
| 64 | 76.78% | 40.89% |
+-------+------------+------------+
| 256 | 73.85% | 36.05% |
+-------+------------+------------+
| 1024 | 75.73% | 36.70% |
+-------+------------+------------+
S32_D565_Blend:
===============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -9.99% | -13.79% |
+-------+------------+------------+
| 2 | -9.17% | -6.74% |
+-------+------------+------------+
| 4 | -6.73% | -4.42% |
+-------+------------+------------+
| 8 | 163.31% | 112.82% |
+-------+------------+------------+
| 16 | 55.21% | 44.68% |
+-------+------------+------------+
| 64 | 54.09% | 41.99% |
+-------+------------+------------+
| 256 | 52.63% | 40.64% |
+-------+------------+------------+
| 1024 | 52.46% | 40.45% |
+-------+------------+------------+
S32A_D565_Blend:
================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -5.88% | -6.06% |
+-------+------------+------------+
| 2 | -4.74% | -0.01% |
+-------+------------+------------+
| 4 | -5.42% | -3.03% |
+-------+------------+------------+
| 8 | 78.78% | 77.96% |
+-------+------------+------------+
| 16 | 98.19% | 79.61% |
+-------+------------+------------+
| 64 | 111.56% | 72.60% |
+-------+------------+------------+
| 256 | 113.80% | 69.96% |
+-------+------------+------------+
| 1024 | 114.42% | 70.85% |
+-------+------------+------------+
S32_D565_Opaque_Dither:
=======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.18% | -0.93% |
+-------+------------+------------+
| 2 | -2.43% | -2.04% |
+-------+------------+------------+
| 4 | -1.09% | -1.23% |
+-------+------------+------------+
| 8 | 184.89% | 136.53% |
+-------+------------+------------+
| 16 | 128.64% | 89.11% |
+-------+------------+------------+
| 64 | 132.68% | 100.98% |
+-------+------------+------------+
| 256 | 157.02% | 100.86% |
+-------+------------+------------+
| 1024 | 163.85% | 103.62% |
+-------+------------+------------+
S32_D565_Blend_Dither:
======================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -4.87% | 0.01% |
+-------+------------+------------+
| 2 | -2.71% | 2.97% |
+-------+------------+------------+
| 4 | -2.20% | 0.28% |
+-------+------------+------------+
| 8 | 149.76% | 146.80% |
+-------+------------+------------+
| 16 | 85.69% | 95.77% |
+-------+------------+------------+
| 64 | 88.81% | 101.39% |
+-------+------------+------------+
| 256 | 97.32% | 107.22% |
+-------+------------+------------+
| 1024 | 98.08% | 115.71% |
+-------+------------+------------+
S32A_D565_Opaque_Dither:
========================
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -1.86% | 0.02% |
+-------+------------+------------+
| 2 | -0.58% | -1.52% |
+-------+------------+------------+
| 4 | -0.75% | 1.16% |
+-------+------------+------------+
| 8 | 240.74% | 155.16% |
+-------+------------+------------+
| 16 | 181.97% | 132.15% |
+-------+------------+------------+
| 64 | 203.11% | 136.48% |
+-------+------------+------------+
| 256 | 223.45% | 133.05% |
+-------+------------+------------+
| 1024 | 225.96% | 134.05% |
+-------+------------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/317193003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/289473009/)
Reason for revert:
Buildbot failures on Mac 10.6 and Mac 10.7.
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=reed@google.com
NOTRY=True
Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
Author: jvanverth@google.com
Review URL: https://codereview.chromium.org/311053009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.
Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.
bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/289473009
|
|
|
|
|
|
|
|
|
|
|
| |
That's what it means. It keeps confusing us as named today.
BUG=skia:
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/314643004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable NEON on arm64 for most 8888 blitters
This patch enables NEON optimisation for the Color32, S32_Blend,
S32A_Opaque blitters on arm64.
Here are the perf improvements vs the existing code:
Color32:
========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -2.39% | 23.78% |
+-------+------------+------------+
| 2 | -5.46% | 8.88% |
+-------+------------+------------+
| 4 | -4.74% | 4.89% |
+-------+------------+------------+
| 8 | 67.74% | 107.12% |
+-------+------------+------------+
| 16 | 40.03% | 101.20% |
+-------+------------+------------+
| 64 | 11.09% | 98.40% |
+-------+------------+------------+
| 256 | -2.20% | 74.81% |
+-------+------------+------------+
| 1024 | -4.28% | 78.90% |
+-------+------------+------------+
S32_Blend:
==========
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | 7.84% | -6.75% |
+-------+------------+------------+
| 2 | 28.95% | 39.77% |
+-------+------------+------------+
| 4 | 5.80% | 8.26% |
+-------+------------+------------+
| 8 | 1.35% | 33.80% |
+-------+------------+------------+
| 16 | -2.13% | 41.13% |
+-------+------------+------------+
| 64 | -4.91% | 42.84% |
+-------+------------+------------+
| 256 | -6.53% | 48.72% |
+-------+------------+------------+
| 1024 | -6.65% | 46.66% |
+-------+------------+------------+
S32A_Opaque:
============
+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
| 1 | -7.51% | -19.06% |
+-------+------------+------------+
| 2 | -5.02% | -27.70% |
+-------+------------+------------+
| 4 | 15.38% | -21.66% |
+-------+------------+------------+
| 8 | -0.98% | 1.05% |
+-------+------------+------------+
| 16 | -7.35% | 3.34% |
+-------+------------+------------+
| 64 | 50.53% | 94.63% |
+-------+------------+------------+
| 256 | 71.17% | 164.10% |
+-------+------------+------------+
| 1024 | 79.58% | 197.60% |
+-------+------------+------------+
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/302283003
|
|
|
|
|
|
|
|
|
|
|
|
| |
clone of https://codereview.chromium.org/305133006/
TBR=
BUG=skia:
Author: reed@google.com
Review URL: https://codereview.chromium.org/301233011
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading an SkSSE2ProcCoeffXfermode object, fProcSIMD should never be NULL. The reason for this is that it's not possible to create such an object through SkPlatformXfermodeFactory_impl_SSE2(), which is the only function used to create these objects, so if we're reading one, it's clearly invalid.
BUG=379181
R=reed@google.com, mtklein@google.com
Author: sugoi@chromium.org
Review URL: https://codereview.chromium.org/306183002
git-svn-id: http://skia.googlecode.com/svn/trunk@15000 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:
R=robertphillips@google.com
Author: reed@google.com
Review URL: https://codereview.chromium.org/303543009
git-svn-id: http://skia.googlecode.com/svn/trunk@14959 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp
has about 30% performance improvement. Here are the data on desktop
i7-3770.
before:
bitmap_scale_filter_90_90 8888: cmsecs = 2.01
bitmaprect_FF_filter_trans 8888: cmsecs = 3.61
bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57
bitmaprect_FF_filter_identity 8888: cmsecs = 3.53
bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53
bitmap_4444_update 8888: cmsecs = 4.84
bitmap_4444_update_volatile 8888: cmsecs = 4.81
bitmap_4444 8888: cmsecs = 4.81
after:
bitmap_scale_filter_90_90 8888: cmsecs = 1.83
bitmaprect_FF_filter_trans 8888: cmsecs = 2.36
bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36
bitmaprect_FF_filter_identity 8888: cmsecs = 2.60
bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63
bitmap_4444_update 8888: cmsecs = 3.30
bitmap_4444_update_volatile 8888: cmsecs = 3.30
bitmap_4444 8888: cmsecs = 3.29
BUG=skia:
R=mtklein@google.com, reed@google.com, bsalomon@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/285313002
git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=chromium:374796
NOTREECHECKS=true
R=fmalita@chromium.org, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/292563005
git-svn-id: http://skia.googlecode.com/svn/trunk@14816 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the missing include for smmintrin.h in the
SkBlurImage_opts_SSE2.cpp file.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
BUG=chromium:374796
TEST=Unknown
R=tomhudson@chromium.org, vapier@chromium.org, reed@chromium.org, bsalomon@chromium.org, dgreid@chromium.org, dgarrett@chromium.org, michaelpg@chromium.org, vandebo@chromium.org
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/290923002
git-svn-id: http://skia.googlecode.com/svn/trunk@14792 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a build-time SSE4 check to SkBlurImage_opts_SSE2.cpp
in the SkBoxBlur_SSE2 function.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, senorblanco@chromium.org
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/281963002
git-svn-id: http://skia.googlecode.com/svn/trunk@14750 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions are rather performance critical and already marked
'inline'. However, Chrome for Android will not have these functions
inlined due to it being compiled with -Os and a small -finline-limit.
This avoids one call in the filtering functions.
Does not increase the library size.
BUG=chromium:363073
R=mtklein@google.com
Author: kkinnunen@nvidia.com
Review URL: https://codereview.chromium.org/280403005
git-svn-id: http://skia.googlecode.com/svn/trunk@14709 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to optimize SkAlphaMulQ() in PIC mode. With the visibility=default
symbol the constant is not known at compile time (and is not a constant), but
instead is fetched through a double indirection through GOT. The function is
quite hot on one of the chromium benchmarks:
rasterize_and_record_micro.key_silk_cases.
This change replaces the symbol with a compile-time constant. As a bonus the
variable is not exported from the dynamic library, i. e. a cleaner library
interface.
See specific performance improvements on Android here:
http://goo.gl/iMuTDt
R=skyostil@chromium.org, tomhudson@chromium.org, mtklein@google.com, reed@google.com, tomhudson@google.com
Author: pasko@chromium.org
Review URL: https://codereview.chromium.org/270473003
git-svn-id: http://skia.googlecode.com/svn/trunk@14696 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: http://code.google.com/p/skia/source/detail?r=14644
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/272503006
git-svn-id: http://skia.googlecode.com/svn/trunk@14693 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/272503006/)
Reason for revert:
Windows builders breaking. :(
Original issue's description:
> Improved x86 SSE build and run-time checks.
>
> Replaces the current build/run-time checks for SSE level in
> opts_check_x86.cpp with a simpler and more future-proof version.
> Also adds SSE versions 4.1 and 4.2 to the config file.
>
> Author: henrik.smiding@intel.com
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: http://code.google.com/p/skia/source/detail?r=14644
R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, reed@google.com, tomhudson@google.com
NOTREECHECKS=true
NOTRY=true
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/277593004
git-svn-id: http://skia.googlecode.com/svn/trunk@14646 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the current build/run-time checks for SSE level in
opts_check_x86.cpp with a simpler and more future-proof version.
Also adds SSE versions 4.1 and 4.2 to the config file.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/272503006
git-svn-id: http://skia.googlecode.com/svn/trunk@14644 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
General cleanup of optimization files for x86/SSEx.
Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific
to SSE2. Commented out the ColorRect32 optimization, since it's
disabled anyway, to make it more visible.
Also fixed a lot of indentation, inclusion guards, spelling,
copyright headers, braces, whitespace, and sorting of includes.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/264603002
git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/263553008
git-svn-id: http://skia.googlecode.com/svn/trunk@14456 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert Color32 to intrinsics
This change is performance-neutral for high values of count and is
a big improvement for values smaller than 64.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, mtklein@google.com, borenet@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/258173005
git-svn-id: http://skia.googlecode.com/svn/trunk@14435 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
configurations where the maximum SSE level is SSE2.
This patch enables it for higher levels, as well as fixing a color
conversion bug when the subpixels are converted into RGB565 format.
Also, refactored the function a bit, to make future modifications
less error-prone.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Committed: http://code.google.com/p/skia/source/detail?r=14333
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/239453010
git-svn-id: http://skia.googlecode.com/svn/trunk@14403 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of the related two benchmarks will
improve about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_Lighten 8888: cmsecs = 33.60 565: cmsecs = 48.84
Xfermode_Darken 8888: cmsecs = 34.16 565: cmsecs = 48.99
after:
Xfermode_Lighten 8888: cmsecs = 18.71 565: cmsecs = 25.41
Xfermode_Darken 8888: cmsecs = 18.39 565: cmsecs = 25.40
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/234653002
git-svn-id: http://skia.googlecode.com/svn/trunk@14395 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of the related benchmarks will improve
about 45% for Xfermode_ColorDodge and little for Xfermode_ColorBurn on
desktop i7-3770. The little performance improvement for
Xfermode_ColorBurn is due to the portable version may mostly go the fast
if branch while the SSE2 version do the calculation for all the three
if-else branches. Here are the data:
before:
Xfermode_ColorDodge 8888: cmsecs = 73.71 565: cmsecs = 82.88
Xfermode_ColorBurn 8888: cmsecs = 46.46 565: cmsecs = 52.23
after:
Xfermode_ColorDodge 8888: cmsecs = 39.70 565: cmsecs = 47.45
Xfermode_ColorBurn 8888: cmsecs = 45.02 565: cmsecs = 51.15
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/224823004
git-svn-id: http://skia.googlecode.com/svn/trunk@14377 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_SoftLight will improve
about 30% on desktop i7-3770. Here are the data:
before:
Xfermode_SoftLight 8888: cmsecs = 379.44 565: cmsecs = 387.74
after:
Xfermode_SoftLight 8888: cmsecs = 272.29 565: cmsecs = 284.31
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/236363012
git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Difference will improve
about 60% on desktop i7-3770. Here are the data:
before:
Xfermode_Difference 8888: cmsecs = 51.10 565: cmsecs = 66.39
after:
Xfermode_Difference 8888: cmsecs = 21.10 565: cmsecs = 29.33
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/234433003
git-svn-id: http://skia.googlecode.com/svn/trunk@14375 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_HardLight will improve
about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_HardLight 8888: cmsecs = 48.43 565: cmsecs = 63.11
after:
Xfermode_HardLight 8888: cmsecs = 25.71 565: cmsecs = 33.46
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/229003004
git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Exclusion will improve
about 50% on desktop i7-3770. Here are the data:
before:
Xfermode_Exclusion 8888: cmsecs = 40.17 565: cmsecs = 55.22
after:
Xfermode_Exclusion 8888: cmsecs = 18.53 565: cmsecs = 26.55
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/233733005
git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With SSE2 optimization, performance of Xfermode_Overlay will improve
about 35% on desktop i7-3770. Here are the data:
before:
Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27
after:
Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/232783002
git-svn-id: http://skia.googlecode.com/svn/trunk@14370 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The emulator is the one case where the android framework can be
compiled without SSSE3 but be expected to run on a device with
SSS3. In that case we just disable all SSSE3 options to be safe.
R=scroggo@google.com
Author: djsollen@google.com
Review URL: https://codereview.chromium.org/249883004
git-svn-id: http://skia.googlecode.com/svn/trunk@14342 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(https://codereview.chromium.org/239453010/)
Reason for revert:
Broke GMs in 565 mode. To repro:
out/Debug/gm --match filterbitmap_image_mandrill -w . --config 565
open filterbitmap_image_mandrill_512.png_565.png
Original issue's description:
> Properly enable S32_D16_filter_DX_SSE2 optimization.
>
> Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
> configurations where the maximum SSE level is SSE2.
> This patch enables it for higher levels, as well.
> Also, refactored the function a bit, to make future modifications
> less error-prone.
>
> Author: henrik.smiding@intel.com
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: http://code.google.com/p/skia/source/detail?r=14333
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=djsollen@google.com, henrik.smiding@intel.com, joakim.landberg@intel.com, mtklein@google.com, reed@google.com, tomhudson@google.com
NOTREECHECKS=true
NOTRY=true
Author: bsalomon@google.com
Review URL: https://codereview.chromium.org/246393013
git-svn-id: http://skia.googlecode.com/svn/trunk@14336 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the S32_D16_filter_DX_SSE2 optimization is only used in
configurations where the maximum SSE level is SSE2.
This patch enables it for higher levels, as well.
Also, refactored the function a bit, to make future modifications
less error-prone.
Author: henrik.smiding@intel.com
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com
Author: henrik.smiding@intel.com
Review URL: https://codereview.chromium.org/239453010
git-svn-id: http://skia.googlecode.com/svn/trunk@14333 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These modes share some common code and not very complex, so group them
together. This CL yields about 50% performance improvement on desktop
i7-3770. Here are the data:
before:
Xfermode_Screen 8888: cmsecs = 30.25 565: cmsecs = 46.81
Xfermode_Modulate 8888: cmsecs = 22.48 565: cmsecs = 40.06
Xfermode_Plus 8888: cmsecs = 21.04 565: cmsecs = 37.51
Xfermode_Xor 8888: cmsecs = 37.18 565: cmsecs = 52.53
Xfermode_DstATop 8888: cmsecs = 28.97 565: cmsecs = 46.42
Xfermode_SrcATop 8888: cmsecs = 29.74 565: cmsecs = 46.25
Xfermode_DstOut 8888: cmsecs = 5.34 565: cmsecs = 24.53
Xfermode_SrcOut 8888: cmsecs = 12.25 565: cmsecs = 24.39
Xfermode_DstIn 8888: cmsecs = 5.30 565: cmsecs = 24.50
Xfermode_SrcIn 8888: cmsecs = 12.05 565: cmsecs = 25.40
Xfermode_DstOver 8888: cmsecs = 12.45 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.68 565: cmsecs = 4.42
after:
Xfermode_Screen 8888: cmsecs = 13.68 565: cmsecs = 21.73
Xfermode_Modulate 8888: cmsecs = 13.25 565: cmsecs = 20.97
Xfermode_Plus 8888: cmsecs = 9.77 565: cmsecs = 16.71
Xfermode_Xor 8888: cmsecs = 17.64 565: cmsecs = 25.62
Xfermode_DstATop 8888: cmsecs = 15.99 565: cmsecs = 23.74
Xfermode_SrcATop 8888: cmsecs = 15.69 565: cmsecs = 23.40
Xfermode_DstOut 8888: cmsecs = 4.77 565: cmsecs = 11.85
Xfermode_SrcOut 8888: cmsecs = 4.98 565: cmsecs = 11.84
Xfermode_DstIn 8888: cmsecs = 4.68 565: cmsecs = 11.72
Xfermode_SrcIn 8888: cmsecs = 4.93 565: cmsecs = 11.79
Xfermode_DstOver 8888: cmsecs = 5.04 565: cmsecs = 0.15
Xfermode_SrcOver 8888: cmsecs = 2.69 565: cmsecs = 4.42
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/232793002
git-svn-id: http://skia.googlecode.com/svn/trunk@14176 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
| |
Ben reviewed this over my shoulder, and we tested on his machine.
git-svn-id: http://skia.googlecode.com/svn/trunk@14122 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
| |
BUG=skia:2401
R=bungeman@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/231423003
git-svn-id: http://skia.googlecode.com/svn/trunk@14120 2bbb7eff-a529-9590-31e7-b0007b416f81
|