aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/core/SkBlitRow_D32.cpp
Commit message (Collapse)AuthorAge
* make most of SkColorPriv.h privateGravatar Cary Clark2017-09-15
| | | | | | | | | | | | | created new file src/core/SkColorData.h for internal consumption. Note that many of the functions there are unused as well. Bug: skia: 6898 R: reed@google.com Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d Reviewed-on: https://skia-review.googlesource.com/46848 Commit-Queue: Cary Clark <caryclark@google.com> Reviewed-by: Mike Reed <reed@google.com>
* SkBlendARGB32 and S32[A]_Blend_BlitRow32 are currently formulated as: ↵Gravatar lsalzman2016-08-05
| | | | | | | | | | | | | | | | | | | | | | SkAlphaMulQ(src, src_scale) + SkAlphaMulQ(dst, dst_scale), which boils down to ((src*src_scale)>>8) + ((dst*dst_scale)>>8). In particular, note that the intermediate precision is discarded before the two parts are added together, causing the final result to possibly inaccurate. In Firefox, we use SkCanvas::saveLayer in combination with a backdrop that initializes the layer to the background. When this is blended back onto background using transparency, where the source and destination pixel colors are the same, the resulting color after the blend is not preserved due to the lost precision mentioned above. In cases where this operation is repeatedly performed, this causes substantially noticeable differences in color as evidenced in this downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1200684 In the test-case in the downstream report, essentially it does blend(src=0xFF2E3338, dst=0xFF2E3338, scale=217), which gives the result 0xFF2E3237, while we would expect to get back 0xFF2E3338. This problem goes away if the blend is instead reformulated to effectively do (src*src_scale + dst*dst_scale)>>8, which keeps the intermediate precision during the addition before shifting it off. This modifies the blending operations thusly. The performance should remain mostly unchanged, or possibly improve slightly, so there should be no real downside to doing this, with the benefit of making the results more accurate. Without this, it is currently unsafe for Firefox to blend a layer back onto itself that was initialized with a copy of its background. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2097883002 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot [mtklein adds...] No public API changes. TBR=reed@google.com Review-Url: https://codereview.chromium.org/2097883002
* Port S32A_opaque blit row to SkOpts.Gravatar mtklein2016-03-23
| | | | | | | | | | This should be a pixel-for-pixel (i.e. bug-for-bug) port. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1820313002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review URL: https://codereview.chromium.org/1820313002
* Port SkBlitRow::Color32 to SkOpts.Gravatar mtklein2015-09-10
| | | | | | | | | | This was a pre-SkOpts attempt that we can bring under its wing now. This should be a perf no-op, deo volente. BUG=skia:4117 Review URL: https://codereview.chromium.org/1314863006
* Style Change: NULL->nullptrGravatar halcanary2015-08-27
| | | | | | DOCS_PREVIEW= https://skia.org/?cl=1316233002 Review URL: https://codereview.chromium.org/1316233002
* Remove sk_memcpy32Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | It's only implemented on x86, where the exisiting benchmark says memcpy() is faster for all cases: Timer overhead: 24ns curr/maxrss loops min median mean max stddev samples config bench 10/10 MB 1 35.9µs 36.2µs 36.2µs 36.6µs 1% ▁▂▄▅▅▃█▄▄▅ nonrendering sk_memcpy32_100000 10/10 MB 13 2.27µs 2.28µs 2.28µs 2.29µs 0% █▄▃▅▃▁▃▅▁▄ nonrendering sk_memcpy32_10000 11/11 MB 677 91.6ns 95.9ns 94.5ns 99.4ns 3% ▅▅▅▅▅█▁▁▁▁ nonrendering sk_memcpy32_1000 11/11 MB 1171 20ns 20.9ns 21.3ns 23.4ns 6% ▁▁▇▃▃▃█▇▃▃ nonrendering sk_memcpy32_100 11/11 MB 1952 14ns 14ns 14.3ns 15.2ns 3% ▁▁██▁▁▁▁▁▁ nonrendering sk_memcpy32_10 11/11 MB 5 33.6µs 33.7µs 34.1µs 35.2µs 2% ▆▇█▁▁▁▁▁▁▁ nonrendering memcpy32_memcpy_100000 11/11 MB 18 2.12µs 2.22µs 2.24µs 2.39µs 5% ▂█▄▇█▄▇▁▁▁ nonrendering memcpy32_memcpy_10000 11/11 MB 1112 87.3ns 87.3ns 89.1ns 93.7ns 3% ▄██▄▁▁▁▁▁▁ nonrendering memcpy32_memcpy_1000 11/11 MB 2124 12.8ns 13.3ns 13.5ns 14.8ns 6% ▁▁▁█▃▃█▇▃▃ nonrendering memcpy32_memcpy_100 11/11 MB 3077 9ns 9.41ns 9.52ns 10.2ns 4% ▃█▁█▃▃▃▃▃▃ nonrendering memcpy32_memcpy_10 (Why? One fewer thing to port to SkOpts.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1256763003
* Update some Sk4px APIs.Gravatar mtklein2015-06-22
| | | | | | | | | | | | | | | Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002
* Re-proc SkBlitRow::Color32 for ARM.Gravatar mtklein2015-05-22
| | | | | | | | | | This is a spiritual revert of http://crrev.com/1104183004. BUG=skia: Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004 Review URL: https://codereview.chromium.org/1145283003
* Revert of Re-proc SkBlitRow::Color32 for ARM. (patchset #3 id:40001 of ↵Gravatar mtklein2015-05-22
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1145283003/) Reason for revert: http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_rel_device_ninja/builds/70016/steps/compile%20%28with%20patch%29/logs/stdio Original issue's description: > Re-proc SkBlitRow::Color32 for ARM. > > This is a spiritual revert of http://crrev.com/1104183004. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/4e13a23d8f720e17660f26657b45b89fe4339004 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1157633003
* Re-proc SkBlitRow::Color32 for ARM.Gravatar mtklein2015-05-21
| | | | | | | | This is a spiritual revert of http://crrev.com/1104183004. BUG=skia: Review URL: https://codereview.chromium.org/1145283003
* Sk4pxGravatar mtklein2015-05-12
| | | | | | | | | | Xfermode_SrcOver: SSE: 2.08ms -> 2.03ms (~2% faster) NEON: my N5 is noisy, but there appears to be no perf change BUG=skia: Review URL: https://codereview.chromium.org/1132273004
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Mips-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1104183004
* Revert of De-proc Color32 (patchset #5 id:80001 of ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1104183004/) Reason for revert: duh Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b > > Committed: https://skia.googlesource.com/skia/+/d65dc0cedd5b50dd407b6ff8fdc39123f11511cc TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1102363006
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b Review URL: https://codereview.chromium.org/1104183004
* Revert of De-proc Color32 (patchset #4 id:60001 of ↵Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1104183004/) Reason for revert: MIPS Original issue's description: > De-proc Color32 > > Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, > which is no longer needed. > > Seems handy to have SkTypes include the relevant intrinsics when > we know we've got them, but I'm not married to it. > > Locally this looks like a pointlessly small perf win, but I'm mostly > keen to get all the code together. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/376e9bc206b69d9190f38dfebb132a8769bbd72b TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1108163002
* De-proc Color32Gravatar mtklein2015-04-27
| | | | | | | | | | | | | | | Also strips SK_SUPPORT_LEGACY_COLOR32_MATH, which is no longer needed. Seems handy to have SkTypes include the relevant intrinsics when we know we've got them, but I'm not married to it. Locally this looks like a pointlessly small perf win, but I'm mostly keen to get all the code together. BUG=skia: Review URL: https://codereview.chromium.org/1104183004
* Revert of Convert Color32 code to perfect blend. (patchset #6 id:100001 of ↵Gravatar mtklein2015-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1098913002/) Reason for revert: Xfermode_SrcOver not looking encouraging. Up to 50% regressions. https://perf.skia.org/#3242 Original issue's description: > Convert Color32 code to perfect blend. > > Before we commit to blend_256_round_alt, let's make sure blend_perfect is > really slower in practice (i.e. regresses on perf.skia.org). > > blend_perfect is really the most desirable algorithm if we can afford it. Not > only is it correct, but it's easy to think about and break into correct pieces: > for instance, its div255() doesn't require any coordination with the multiply. > > This looks like a 30% hit according to microbenches. That said, microbenches > said my previous change would be a 20-25% perf improvement, but it didn't end > up showing a significant effect at a high level. > > As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt > (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a > bunch of overdraw. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/61221e7f87a99765b0e034020e06bb018e2a08c2 TBR=reed@google.com,fmalita@chromium.org,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1083923006
* Convert Color32 code to perfect blend.Gravatar mtklein2015-04-20
| | | | | | | | | | | | | | | | | | | | | Before we commit to blend_256_round_alt, let's make sure blend_perfect is really slower in practice (i.e. regresses on perf.skia.org). blend_perfect is really the most desirable algorithm if we can afford it. Not only is it correct, but it's easy to think about and break into correct pieces: for instance, its div255() doesn't require any coordination with the multiply. This looks like a 30% hit according to microbenches. That said, microbenches said my previous change would be a 20-25% perf improvement, but it didn't end up showing a significant effect at a high level. As for correctness, I see a bunch of off-by-1 compared to blend_256_round_alt (exactly what we'd expect), and one off-by-3 in a GM that looks like it has a bunch of overdraw. BUG=skia: Review URL: https://codereview.chromium.org/1098913002
* Rework SSE and NEON Color32 algorithms to be more correct and faster.Gravatar mtklein2015-04-17
| | | | | | | | | | | | | | | | | | | | This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges. If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`. This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard. I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as: - multiply 8-bits and 8-bits producing 16-bits - add 16-bits to 16-bits, returning the top 8 bits. All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two. We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction. (I am probably missing several more related bugs here.) BUG=skia:3738,skia:420,chromium:111470 Review URL: https://codereview.chromium.org/1092433002
* Clean up ColorRectProc plumbing.Gravatar mtklein2015-02-25
| | | | | | | | | | | | | | | | | | | We've always been using the portable ColorRect32, so we don't need the ColorRectProc plumbing. Furthermore, ColorRect32 doesn't seem to be very important (we're only using it in the opaque case, which our row-by-row procs already specialize for). Remove that too. If we find we want specialization for really narrow rects again, let's put it in blitRect() directly. It's pretty unlikely we're going to get platform-specific speedup for blits to non-contiguous memory. My local SKP comparison is +- 3%... most neutral I've ever seen. BUG=skia: Review URL: https://codereview.chromium.org/959873002
* SSE2 implementation of memcpy32Gravatar commit-bot@chromium.org2014-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp has about 30% performance improvement. Here are the data on desktop i7-3770. before: bitmap_scale_filter_90_90 8888: cmsecs = 2.01 bitmaprect_FF_filter_trans 8888: cmsecs = 3.61 bitmaprect_FF_nofilter_trans 8888: cmsecs = 3.57 bitmaprect_FF_filter_identity 8888: cmsecs = 3.53 bitmaprect_FF_nofilter_identity 8888: cmsecs = 3.53 bitmap_4444_update 8888: cmsecs = 4.84 bitmap_4444_update_volatile 8888: cmsecs = 4.81 bitmap_4444 8888: cmsecs = 4.81 after: bitmap_scale_filter_90_90 8888: cmsecs = 1.83 bitmaprect_FF_filter_trans 8888: cmsecs = 2.36 bitmaprect_FF_nofilter_trans 8888: cmsecs = 2.36 bitmaprect_FF_filter_identity 8888: cmsecs = 2.60 bitmaprect_FF_nofilter_identity 8888: cmsecs = 2.63 bitmap_4444_update 8888: cmsecs = 3.30 bitmap_4444_update_volatile 8888: cmsecs = 3.30 bitmap_4444 8888: cmsecs = 3.29 BUG=skia: R=mtklein@google.com, reed@google.com, bsalomon@google.com Author: qiankun.miao@intel.com Review URL: https://codereview.chromium.org/285313002 git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
* Sanitizing source files in Skia_Periodic_House_KeepingGravatar skia.committer@gmail.com2013-01-26
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@7406 2bbb7eff-a529-9590-31e7-b0007b416f81
* remove outdated test code for TEST_SRC_ALPHAGravatar reed@google.com2012-07-30
| | | | | | Review URL: https://codereview.appspot.com/6457056 git-svn-id: http://skia.googlecode.com/svn/trunk@4840 2bbb7eff-a529-9590-31e7-b0007b416f81
* special-case filling narrow rects, where we can be faster than the SSE2 asmGravatar reed@google.com2012-05-15
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@3932 2bbb7eff-a529-9590-31e7-b0007b416f81
* (SSE2) acceleration for rectangular opaque erases.Gravatar tomhudson@google.com2012-03-19
| | | | | | | | | | 15% speedup for rectangles < 31 px wide, 5% for larger. http://codereview.appspot.com/5843050/ git-svn-id: http://skia.googlecode.com/svn/trunk@3423 2bbb7eff-a529-9590-31e7-b0007b416f81
* don't blend with zero in colorproc (forgot to return after memcpy check).Gravatar reed@google.com2011-10-25
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@2527 2bbb7eff-a529-9590-31e7-b0007b416f81
* move LCD blits into opts, so they can have assembly versionsGravatar reed@google.com2011-10-18
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@2484 2bbb7eff-a529-9590-31e7-b0007b416f81
* separate SkBlitMask decl into its own headerGravatar reed@google.com2011-10-12
| | | | git-svn-id: http://skia.googlecode.com/svn/trunk@2461 2bbb7eff-a529-9590-31e7-b0007b416f81
* add SK_RESTRICT to mask procsGravatar reed@google.com2011-09-13
| | | | | | | | separate out opaque and non for lcd16 blits git-svn-id: http://skia.googlecode.com/svn/trunk@2253 2bbb7eff-a529-9590-31e7-b0007b416f81
* Automatic update of all copyright notices to reflect new license terms.Gravatar epoger@google.com2011-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have manually examined all of these diffs and restored a few files that seem to require manual adjustment. The following files still need to be modified manually, in a separate CL: android_sample/SampleApp/AndroidManifest.xml android_sample/SampleApp/res/layout/layout.xml android_sample/SampleApp/res/menu/sample.xml android_sample/SampleApp/res/values/strings.xml android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java experimental/CiCarbonSampleMain.c experimental/CocoaDebugger/main.m experimental/FileReaderApp/main.m experimental/SimpleCocoaApp/main.m experimental/iOSSampleApp/Shared/SkAlertPrompt.h experimental/iOSSampleApp/Shared/SkAlertPrompt.m experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig gpu/src/android/GrGLDefaultInterface_android.cpp gyp/common.gypi gyp_skia include/ports/SkHarfBuzzFont.h include/views/SkOSWindow_wxwidgets.h make.bat make.py src/opts/memset.arm.S src/opts/memset16_neon.S src/opts/memset32_neon.S src/opts/opts_check_arm.cpp src/ports/SkDebug_brew.cpp src/ports/SkMemory_brew.cpp src/ports/SkOSFile_brew.cpp src/ports/SkXMLParser_empty.cpp src/utils/ios/SkImageDecoder_iOS.mm src/utils/ios/SkOSFile_iOS.mm src/utils/ios/SkStream_NSData.mm tests/FillPathTest.cpp Review URL: http://codereview.appspot.com/4816058 git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81
* re-enable SSE2 blitmask procs, only excluding if we're black (in which caseGravatar reed@google.com2011-07-07
| | | | | | | | the protable version is still faster) git-svn-id: http://skia.googlecode.com/svn/trunk@1819 2bbb7eff-a529-9590-31e7-b0007b416f81
* Correct blitmask procs to recognize that we pass them an SkColor, and if theyGravatar reed@google.com2011-03-09
| | | | | | | | | | want a SkPMColor, they need to call SkPreMultiplyColor() Add Opaque and Black optimizations for blitmask_d32 git-svn-id: http://skia.googlecode.com/svn/trunk@911 2bbb7eff-a529-9590-31e7-b0007b416f81
* http://codereview.appspot.com/3980041/Gravatar reed@google.com2011-03-09
| | | | | | | | | Add blitmask procs (with optional platform acceleration) patch by yaojie.yan git-svn-id: http://skia.googlecode.com/svn/trunk@910 2bbb7eff-a529-9590-31e7-b0007b416f81
* merge from android tree:Gravatar djsollen@google.com2011-02-23
| | | | | | | | | | | | | - optional parameters added to descriptorProc and allocPixels - clip options to image decoders - check for xfermode in blitter_a8 - UNROLL loops in blitrow reviewed by reed@google.com git-svn-id: http://skia.googlecode.com/svn/trunk@841 2bbb7eff-a529-9590-31e7-b0007b416f81
* Fix perf regression in Color32.Gravatar senorblanco@chromium.org2010-12-16
| | | | | | | | | | | | | | | | The regression was due to the fact that we were calling PlatformColorProc() for every span (which in turns makes CPUID, a fairly expensive call). Since we draw a lot of rects, and rects have 1-pixel wide spans for the vertical segments, that's a lot of CPUID. Fixed by cacheing the result of PlatformColorProc(), as is done for the other platform-specific blitters. Review URL: http://codereview.appspot.com/3669042/ git-svn-id: http://skia.googlecode.com/svn/trunk@636 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 optimizations for 32bit Color operation.Gravatar senorblanco@chromium.org2010-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [Patch from weiwei.li@intel.com] SSE2 optimization has been added by Stephen White before, this improves the skia performance on SSE2-supporting platform. (please refer to below issues) Issue 171055: More SSE2ification Issue 157141: More SSE2ification Issue 150060: minor tweaks to SSE2 code for -fPIC Issue 144072: SSE2 optimizations for 32bit blending blitters This CL implements SSE2 optimizations for the 32bit Color operation. Like above issues, it uses CPUID to detect for SSE2 and changes the platform procs at runtime as well. The 32bit Color operation is heavily used on Chrome HTML5 canvas operations. Take Microsoft IE test drives Pulsating Bubbles as example (http://ie.microsoft.com/testdrive/Performance/PulsatingBubbles/Default.xhtml), if running this cases on Chrome, the overhead of 32bit Color operation is about 40~50%. So this CL will make skia performance more better, and also make Chrome HTML5 canvas performance more better. Additional, this CL has passed the skia bench & tests validation, the result is pretty good. We also apply this CL to the latest chromium, and re-run Microsoft IE test drives Pulsating Bubbles, the performance is improved by almost 9~10%. git-svn-id: http://skia.googlecode.com/svn/trunk@633 2bbb7eff-a529-9590-31e7-b0007b416f81
* SSE2 optimizations for 32bit blending blitters.Gravatar senorblanco@chromium.org2009-11-04
| | | | | | | | | | | | | | | This CL implements SSE2 optimizations for 3 of the 32bit blending blitters. It uses CPUID to detect for SSE2 at runtime. In order to accomodate runtime detection, it changes the platform procs from static arrays to static functions. It also includes an implementation of SkTime for Win32. http://codereview.appspot.com/144072 git-svn-id: http://skia.googlecode.com/svn/trunk@418 2bbb7eff-a529-9590-31e7-b0007b416f81
* add BlitRow procs for 32->32, to allow for neon and other optimizations.Gravatar reed@android.com2009-09-23
call these new procs in (nearly) all the places we had inlined loops before. In once instance (blitter_argb32::blitAntiH) we get different results by a tiny bit. The new code is more accurate, and exactly inline with all of the other like-minded blits, so I think the change is good going forward. git-svn-id: http://skia.googlecode.com/svn/trunk@366 2bbb7eff-a529-9590-31e7-b0007b416f81