| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
https://codereview.chromium.org/109403004/)
git-svn-id: http://skia.googlecode.com/svn/trunk@12581 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
| |
https://codereview.chromium.org/109403004) due to image quality regressions on the N4.
git-svn-id: http://skia.googlecode.com/svn/trunk@12578 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve a little on Blur
Grouping operations gives a 5-15% speed improvement on a Cortex-A15 based Chromebook.
before:
running bench [640 480] blur_image_filter_large_10.00_10.00 8888: cmsecs = 30887.69
running bench [640 480] blur_image_filter_small_10.00_10.00 8888: cmsecs = 30751.35
running bench [640 480] blur_image_filter_large_1.00_1.00 8888: cmsecs = 30757.92
running bench [640 480] blur_image_filter_small_1.00_1.00 8888: cmsecs = 30673.88
running bench [640 480] blur_image_filter_large_0.00_1.00 8888: cmsecs = 19602.17
running bench [640 480] blur_image_filter_large_0.00_10.00 8888: cmsecs = 20613.81
running bench [640 480] blur_image_filter_large_1.00_0.00 8888: cmsecs = 17855.46
running bench [640 480] blur_image_filter_large_10.00_0.00 8888: cmsecs = 17957.79
after:
running bench [640 480] blur_image_filter_large_10.00_10.00 8888: cmsecs = 27015.75
running bench [640 480] blur_image_filter_small_10.00_10.00 8888: cmsecs = 27148.02
running bench [640 480] blur_image_filter_large_1.00_1.00 8888: cmsecs = 27241.60
running bench [640 480] blur_image_filter_small_1.00_1.00 8888: cmsecs = 27077.44
running bench [640 480] blur_image_filter_large_0.00_1.00 8888: cmsecs = 18458.10
running bench [640 480] blur_image_filter_large_0.00_10.00 8888: cmsecs = 19643.42
running bench [640 480] blur_image_filter_large_1.00_0.00 8888: cmsecs = 16176.73
running bench [640 480] blur_image_filter_large_10.00_0.00 8888: cmsecs = 16450.50
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=senorblanco@chromium.org, mtklein@google.com, luisjoseromeroesclusa@hotmail.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/109403004
git-svn-id: http://skia.googlecode.com/svn/trunk@12568 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, it's easy to provide a NEON version of the 1-pixel modeprocs.
Combined with https://codereview.chromium.org/23724013/ (merged) it allows
up to 35% speed improvement on Xfermodes when aa is non-NULL.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, reed@google.com, mtklein@google.com, luisjoseromeroesclusa@hotmail.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/104883004
git-svn-id: http://skia.googlecode.com/svn/trunk@12525 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
| |
R=mtklein@google.com, mtklein
BUG=
Review URL: https://codereview.chromium.org/105423002
git-svn-id: http://skia.googlecode.com/svn/trunk@12493 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
| |
TBR=mtklein
BUG=
Review URL: https://codereview.chromium.org/98373003
git-svn-id: http://skia.googlecode.com/svn/trunk@12490 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
speedup on Nexus-10.
R=mtklein@google.com, mtklein
before:
running bench [640 480] blur_image_filter_large_10.00_10.00 8888: cmsecs = 33063.23
running bench [640 480] blur_image_filter_small_10.00_10.00 8888: cmsecs = 32800.25
running bench [640 480] blur_image_filter_large_1.00_1.00 8888: cmsecs = 33017.88
running bench [640 480] blur_image_filter_small_1.00_1.00 8888: cmsecs = 32743.35
running bench [640 480] blur_image_filter_large_0.00_1.00 8888: cmsecs = 21024.04
running bench [640 480] blur_image_filter_large_0.00_10.00 8888: cmsecs = 22904.15
running bench [640 480] blur_image_filter_large_1.00_0.00 8888: cmsecs = 18738.08
running bench [640 480] blur_image_filter_large_10.00_0.00 8888: cmsecs = 18798.98
after:
running bench [640 480] blur_image_filter_large_10.00_10.00 8888: cmsecs = 30180.96
running bench [640 480] blur_image_filter_small_10.00_10.00 8888: cmsecs = 29861.90
running bench [640 480] blur_image_filter_large_1.00_1.00 8888: cmsecs = 30178.98
running bench [640 480] blur_image_filter_small_1.00_1.00 8888: cmsecs = 29911.25
running bench [640 480] blur_image_filter_large_0.00_1.00 8888: cmsecs = 19344.35
running bench [640 480] blur_image_filter_large_0.00_10.00 8888: cmsecs = 19957.07
running bench [640 480] blur_image_filter_large_1.00_0.00 8888: cmsecs = 17158.84
running bench [640 480] blur_image_filter_large_10.00_0.00 8888: cmsecs = 17330.73
Review URL: https://codereview.chromium.org/99933004
git-svn-id: http://skia.googlecode.com/svn/trunk@12486 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: add a NEON version of SkFourByteInterp
Brings a modest performance improvement on its own in
ProcXfermodes when aa is neither zero nor FF. Combined
with 1-pixel NEON modeprocs, it brings up to 35% speed
improvement on the aa case.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/23724013
git-svn-id: http://skia.googlecode.com/svn/trunk@12448 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@12428 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@12427 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@12425 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitmask: NEON optimised version of the D32_A8 functions
Here are the microbenchmark results I got for the D32_A8
functions:
Cortex-A9:
==========
+-------+--------+--------+--------+
| count | Black | Opaque | Color |
+-------+--------+--------+--------+
| 1 | -14% | -39,5% | -37,5% |
+-------+--------+--------+--------+
| 2 | -3% | -29,9% | -25% |
+-------+--------+--------+--------+
| 4 | -11,3% | -22% | -14,5% |
+-------+--------+--------+--------+
| 8 | +128% | +66,6% | +105% |
+-------+--------+--------+--------+
| 16 | +159% | +102% | +149% |
+-------+--------+--------+--------+
| 64 | +189% | +136% | +189% |
+-------+--------+--------+--------+
| 256 | +126% | +102% | +149% |
+-------+--------+--------+--------+
| 1024 | +67,5% | +81,4% | +123% |
+-------+--------+--------+--------+
Cortex-A15:
===========
+-------+--------+--------+--------+
| count | Black | Opaque | Color |
+-------+--------+--------+--------+
| 1 | -24% | -46,5% | -37,5% |
+-------+--------+--------+--------+
| 2 | -18,5% | -35,5% | -28% |
+-------+--------+--------+--------+
| 4 | -5,2% | -17,5% | -15,5% |
+-------+--------+--------+--------+
| 8 | +72% | +65,8% | +84,7% |
+-------+--------+--------+--------+
| 16 | +168% | +117% | +149% |
+-------+--------+--------+--------+
| 64 | +165% | +110% | +145% |
+-------+--------+--------+--------+
| 256 | +106% | +99,6% | +141% |
+-------+--------+--------+--------+
| 1024 | +93,7% | +94,7% | +130% |
+-------+--------+--------+--------+
Blitmask: add NEON optimised PlatformBlitRowProcs16
Here are the microbenchmark results (speedup vs. C code):
+-------+-----------------+-----------------+
| | Cortex-A9 | Cortex-A15 |
| count +--------+--------+--------+--------+
| | Blend | Opaque | Blend | Opaque |
+-------+--------+--------+--------+--------+
| 1 | -19,2% | -36,7% | -33,6% | -44,7% |
+-------+--------+--------+--------+--------+
| 2 | -12,6% | -27,8% | -39% | -48% |
+-------+--------+--------+--------+--------+
| 4 | -11,5% | -21,6% | -37,7% | -44,3% |
+-------+--------+--------+--------+--------+
| 8 | +141% | +59,7% | +123% | +48,7% |
+-------+--------+--------+--------+--------+
| 16 | +213% | +119% | +214% | +121% |
+-------+--------+--------+--------+--------+
| 64 | +212% | +105% | +242% | +167% |
+-------+--------+--------+--------+--------+
| 256 | +289% | +167% | +249% | +207% |
+-------+--------+--------+--------+--------+
| 1024 | +273% | +169% | +146% | +220% |
+-------+--------+--------+--------+--------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/23719002
git-svn-id: http://skia.googlecode.com/svn/trunk@12420 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
| |
R=mtklein@google.com, mtklein, reed@google.com
BUG=
Review URL: https://codereview.chromium.org/66413007
git-svn-id: http://skia.googlecode.com/svn/trunk@12227 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
| |
Tegra3.
R=mtklein@google.com, mtklein, reed@google.com
Review URL: https://codereview.chromium.org/68123003
git-svn-id: http://skia.googlecode.com/svn/trunk@12219 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
| |
Xeon ES-2690.
R=mtklein@google.com
Review URL: https://codereview.chromium.org/61643011
git-svn-id: http://skia.googlecode.com/svn/trunk@12204 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: xfer16
This adds support for 16bit Xfermodes. It also tunes the gcc test
macros in xfer32() to add compatibility for gcc > 4.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/33063002
git-svn-id: http://skia.googlecode.com/svn/trunk@12192 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@12186 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NEON version of the convolutionProcs
The bitmap_scale benchmark is now twice as fast on ARM.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=12154
R=djsollen@google.com, mtklein@google.com, humper@google.com, epoger@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/27533004
git-svn-id: http://skia.googlecode.com/svn/trunk@12166 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
BUG=skia:1807
git-svn-id: http://skia.googlecode.com/svn/trunk@12156 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NEON version of the convolutionProcs
The bitmap_scale benchmark is now twice as fast on ARM.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, humper@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/27533004
git-svn-id: http://skia.googlecode.com/svn/trunk@12154 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
TBR=robertphillips
Review URL: https://codereview.chromium.org/45963007
git-svn-id: http://skia.googlecode.com/svn/trunk@12039 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
| |
erode). This gives a 3-5X speedup over the naive implementation, and also mitigates a timing-based security attack in Chrome (https://code.google.com/p/chromium/issues/detail?id=251711).
NOTE: this will require a corresponding GYP change on the Skia roll into Chrome: https://codereview.chromium.org/52453004/
R=mtklein@google.com, reed@google.com
Review URL: https://codereview.chromium.org/52603004
git-svn-id: http://skia.googlecode.com/svn/trunk@12038 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before:
$ objdump -x out/Release/libskia_opts.a | grep "\.data" | c++filt
1 .data 00000000 0000000000000000 0000000000000000 000004ec 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000f58 2**2
0000000000000000 l d .data 0000000000000000 .data
2 .data 00000008 0000000000000000 0000000000000000 00001774 2**2
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g O .data 0000000000000004 debug_x
0000000000000004 g O .data 0000000000000004 debug_y
1 .data 00000000 0000000000000000 0000000000000000 00001d8c 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000054 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 000001f0 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000044 2**2
0000000000000000 l d .data 0000000000000000 .data
After:
$ objdump -x out/Release/libskia_opts.a | grep "\.data" | c++filt
1 .data 00000000 0000000000000000 0000000000000000 000004ec 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000f58 2**2
0000000000000000 l d .data 0000000000000000 .data
2 .data 00000000 0000000000000000 0000000000000000 00001774 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00001d8c 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000054 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 000001f0 2**2
0000000000000000 l d .data 0000000000000000 .data
1 .data 00000000 0000000000000000 0000000000000000 00000044 2**2
0000000000000000 l d .data 0000000000000000 .data
Not sure why clang didn't catch them.
R=mtklein@google.com
BUG=
Author: tfarina@chromium.org
Review URL: https://codereview.chromium.org/50013002
git-svn-id: http://skia.googlecode.com/svn/trunk@11999 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: NEON implementation of SIMD procs
This patch contains a NEON implementation for a number of Xfermodes.
It provides a big speedup on Xfermode benchmarks (currently up to 3x
with gcc4.7 but up to 10x when gcc produces optimal code for it).
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=11777
Committed: http://code.google.com/p/skia/source/detail?r=11813
R=djsollen@google.com, mtklein@google.com, reed@google.com, robertphillips@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/26627004
git-svn-id: http://skia.googlecode.com/svn/trunk@11843 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
| |
https://codereview.chromium.org/26627004) due to Chromium compilation faliures.
git-svn-id: http://skia.googlecode.com/svn/trunk@11833 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: NEON implementation of SIMD procs
This patch contains a NEON implementation for a number of Xfermodes.
It provides a big speedup on Xfermode benchmarks (currently up to 3x
with gcc4.7 but up to 10x when gcc produces optimal code for it).
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=11777
R=djsollen@google.com, mtklein@google.com, reed@google.com, robertphillips@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/26627004
git-svn-id: http://skia.googlecode.com/svn/trunk@11813 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
| |
to Chromium compilation failure
git-svn-id: http://skia.googlecode.com/svn/trunk@11799 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: NEON implementation of SIMD procs
This patch contains a NEON implementation for a number of Xfermodes.
It provides a big speedup on Xfermode benchmarks (currently up to 3x
with gcc4.7 but up to 10x when gcc produces optimal code for it).
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/26627004
git-svn-id: http://skia.googlecode.com/svn/trunk@11777 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit b8162cb840f4cb6002ef68d5ac775c6a122c52a9.
Fixed was call-sites in benches that used the (now gone) setIsOpaque api.
R=scroggo@google.com
Review URL: https://codereview.chromium.org/26572006
git-svn-id: http://skia.googlecode.com/svn/trunk@11695 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 1c0ff422868b3badf5ffe0790a5d051d1896e2f7.
BUG=
Review URL: https://codereview.chromium.org/26709002
git-svn-id: http://skia.googlecode.com/svn/trunk@11677 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
| |
BUG=
R=scroggo@google.com
Review URL: https://codereview.chromium.org/25353002
git-svn-id: http://skia.googlecode.com/svn/trunk@11676 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: allow for SIMD modeprocs
This patch introduces the ability to have SIMD Xfermode modeprocs.
In the NEON implementation, SIMD modeprocs will process 8 pixels
at a time.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=11654
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/23644006
git-svn-id: http://skia.googlecode.com/svn/trunk@11669 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
This reverts http://code.google.com/p/skia/source/detail?r=11654
Review URL: https://codereview.chromium.org/26340010
git-svn-id: http://skia.googlecode.com/svn/trunk@11655 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xfermode: allow for SIMD modeprocs
This patch introduces the ability to have SIMD Xfermode modeprocs.
In the NEON implementation, SIMD modeprocs will process 8 pixels
at a time.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/23644006
git-svn-id: http://skia.googlecode.com/svn/trunk@11654 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: S32_D565_Blend_Dither, slight speedup + bugfix
This patch adds a rewrite of S32_D565_Blend_Dither in intrinsics.
The newer version is faster (10-20% depending on the value of count)
and also supports ARGB as well as ABGR. It also adds the missing
assert at the beginning of the function.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22566002
git-svn-id: http://skia.googlecode.com/svn/trunk@11473 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@11426 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: NEON version of S32_D565_Opaque
Here's a new implementation of S32_D565_Opaque in NEON. It
improves dramatically the speed compared to S32A_D565_Opaque.
Here are the benchmark results (speedup vs. existing NEON):
+-------+-----------+------------+
| count | Cortex-A9 | Cortex-A15 |
+-------+-----------+------------+
| 1 | +130% | +139% |
+-------+-----------+------------+
| 2 | +65,2% | +51% |
+-------+-----------+------------+
| 4 | -25,5% | +10,2% |
+-------+-----------+------------+
| 8 | +63,8% | +32,1% |
+-------+-----------+------------+
| 16 | +110% | +49,2% |
+-------+-----------+------------+
| 64 | +153% | +123,5% |
+-------+-----------+------------+
| 256 | +151% | +144,7% |
+-------+-----------+------------+
| 1024 | +272% | +157,2% |
+-------+-----------+------------+
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22351006
git-svn-id: http://skia.googlecode.com/svn/trunk@11415 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlitRow565: S32_D565_Opaque_Dither: cleaning / bugfix
This patch brings a little code cleaning (spaces/comments) and a little
speed improvement (by using post-incrementation in the asm) but more
importantly it fixes a bug on Linux. The new code now supports ARGB
as well as ABGR.
I removed the comment as I have confirmed with benchmarks that this
code bring a *massive* (3x-7x) speedup compared to the C code.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/22269003
git-svn-id: http://skia.googlecode.com/svn/trunk@11339 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BitmapProcState: translate the filtering routines to intrinsics
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/21915004
git-svn-id: http://skia.googlecode.com/svn/trunk@11246 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@11120 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
| |
BUG=
R=humper@google.com
Review URL: https://codereview.chromium.org/23796005
git-svn-id: http://skia.googlecode.com/svn/trunk@11118 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BitmapProcState: clean a little and get rid of some asm
replacing the apparently stupid dx+dx+dx leads to more instructions
being generated.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BitmapProcState: move code common to C and NEON to a separate header
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/21931002
git-svn-id: http://skia.googlecode.com/svn/trunk@11109 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@10992 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitmask: copy empty factory functions to a new file
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/21120007
git-svn-id: http://skia.googlecode.com/svn/trunk@10980 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Testing consisted of:
1) ninja -C out/Debug gm && gm -i resources --match mandrill_512 -w /tmp/gm
2) notice that gm didn't segfault
3) look in /tmp/gm and see a bunch of handsome monkeys
BUG=skia:1517
R=humper@google.com
Review URL: https://codereview.chromium.org/22801016
git-svn-id: http://skia.googlecode.com/svn/trunk@10917 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
| |
R=mtklein@google.com
Review URL: https://codereview.chromium.org/22229002
git-svn-id: http://skia.googlecode.com/svn/trunk@10652 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using explicitly indexed references allows some compilers to generate more efficient loops. For gcc 4.6.3:
613c18: 83 ea 10 sub $0x10,%edx
613c1b: 66 0f 7f 07 movdqa %xmm0,(%rdi)
613c1f: 66 0f 7f 47 10 movdqa %xmm0,0x10(%rdi)
613c24: 66 0f 7f 47 20 movdqa %xmm0,0x20(%rdi)
613c29: 66 0f 7f 47 30 movdqa %xmm0,0x30(%rdi)
613c2e: 48 83 c7 40 add $0x40,%rdi
613c32: 83 fa 0f cmp $0xf,%edx
613c35: 7f e1 jg 613c18 <_Z16sk_memset32_SSE2Pjji+0x38>
vs. previous:
613c18: 83 ea 10 sub $0x10,%edx
613c1b: 66 0f 7f 07 movdqa %xmm0,(%rdi)
613c1f: 66 0f 7f 47 10 movdqa %xmm0,0x10(%rdi)
613c24: 66 0f 7f 47 20 movdqa %xmm0,0x20(%rdi)
613c29: 48 83 c7 40 add $0x40,%rdi
613c2d: 83 fa 0f cmp $0xf,%edx
613c30: 66 0f 7f 47 f0 movdqa %xmm0,-0x10(%rdi)
613c35: 7f e1 jg 613c18 <_Z16sk_memset32_SSE2Pjji+0x38>
This yields a 0.2% - 1% improvement with the memset micro benchmarks, presumably due to avoiding a stall on the next store after the %rdi increment.
R=reed@google.com, senorblanco@chromium.org
Author: fmalita@chromium.org
Review URL: https://chromiumcodereview.appspot.com/21703003
git-svn-id: http://skia.googlecode.com/svn/trunk@10545 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blitrow32: S32A_Blend new NEON version
Adding a NEON version of S32A_Blend_BlitRow32. Here are the
benchmark results:
+-------+--------------------------+--------------------------+
| | Speedup vs. C | Speedup vs. ARM asm |
| count +------------+-------------+------------+-------------+
| | Cortex A-9 | Cortex A-15 | Cortex A-9 | Cortex A-15 |
+-------+------------+-------------+------------+-------------+
| 1 | +8,5% | +18,5% | +0.9% | +2,9% |
+-------+------------+-------------+------------+-------------+
| 2 | +65,6% | +94% | +70,3% | +80% |
+-------+------------+-------------+------------+-------------+
| 4 | +42,4% | +87,8% | +56,8% | +84,4% |
+-------+------------+-------------+------------+-------------+
| 8 | +30% | +90% | +49,9% | +82,7% |
+-------+------------+-------------+------------+-------------+
| 16 | +23,1% | +95,4% | +46,6% | +87,6% |
+-------+------------+-------------+------------+-------------+
| 64 | +23,1% | +95,7% | +46,1% | +89,4% |
+-------+------------+-------------+------------+-------------+
| 256 | +35,5% | +122% | +53,6% | +99,2% |
+-------+------------+-------------+------------+-------------+
| 1024 | +61,8% | +101% | +64,2% | +91,2% |
+-------+------------+-------------+------------+-------------+
BUG=
R=djsollen@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://chromiumcodereview.appspot.com/18614010
git-svn-id: http://skia.googlecode.com/svn/trunk@10480 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
$ compare-android.sh bench --match bitmap_ --repeat 30
master -> ssse3
N=30 p=0.001000 (corrected to 0.000033)
sig? speedup bench
n -1.16% bitmap_scale_filter_256_64
y -0.72% bitmap_8888_A_scale_bicubic
y -0.21% bitmap_index8_A
n -0.00% bitmap_565
n -0.00% bitmap_scale_filter_90_80
n 0.03% bitmap_8888_A_source_transparent
y 0.06% bitmap_index8
y 0.30% bitmap_8888_A_source_stripes_two
n 0.34% bitmap_scale_filter_80_90
y 0.42% bitmap_8888_A
y 0.44% bitmap_8888_A_source_opaque
n 0.53% bitmap_scale_filter_90_10
y 0.71% bitmap_8888_A_source_stripes_three
y 0.91% bitmap_8888_A_scale_rotate_bicubic
y 1.04% bitmap_8888_update
n 1.19% bitmap_scale_filter_10_90
n 1.39% bitmap_scale_filter_90_90
y 1.77% bitmap_8888_update_volatile
y 1.89% bitmap_8888
y 2.37% bitmap_scale_filter_30_90
y 9.57% bitmap_scale_filter_64_256
n 17.86% bitmap_scale_filter_90_30
y 25.40% bitmap_8888_A_scale_rotate_bilerp
y 27.19% bitmap_8888_scale_rotate_bilerp
y 27.23% bitmap_8888_update_scale_rotate_bilerp
y 27.29% bitmap_8888_update_volatile_scale_rotate_bilerp
y 55.08% bitmap_8888_A_scale_bilerp
y 58.75% bitmap_8888_update_volatile_scale_bilerp
y 58.90% bitmap_8888_scale_bilerp
y 58.92% bitmap_8888_update_scale_bilerp
Overall speedup: 10.52%
BUG=skia:1111
R=djsollen@google.com
Review URL: https://codereview.chromium.org/21203005
git-svn-id: http://skia.googlecode.com/svn/trunk@10474 2bbb7eff-a529-9590-31e7-b0007b416f81
|
|
|
|
| |
git-svn-id: http://skia.googlecode.com/svn/trunk@10449 2bbb7eff-a529-9590-31e7-b0007b416f81
|