aboutsummaryrefslogtreecommitdiffhomepage
path: root/src
Commit message (Collapse)AuthorAge
...
* Pass the destination pointer to next() in SkSwizzlerGravatar msarett2015-07-27
| | | | | | | | | | | Per our discussion, we can make the swizzler simpler and more usable for SkCodec and SkScanlineDecoder by only having a single version of next() which takes a pointer to the srcRow and a pointer to the dstRow. BUG=skia: Review URL: https://codereview.chromium.org/1256373002
* Fixing src rect constraint support for drawImage with SkPictureGravatar junov2015-07-27
| | | | | | | | Follow-up to https://codereview.chromium.org/1228083004 BUG=skia: Review URL: https://codereview.chromium.org/1256853004
* remove pixel assert from ctable validatorGravatar reed2015-07-27
| | | | | | BUG=514143 Review URL: https://codereview.chromium.org/1252973003
* Cleanup Default Geo Proc APIGravatar joshualitt2015-07-27
| | | | | | | TBR=bsalomon@google.com BUG=skia: Review URL: https://codereview.chromium.org/1253393002
* No one calls SkXfermode::GetProc16Gravatar mtklein2015-07-27
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/1253493002
* Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of ↵Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1255193002/) Reason for revert: Chromium doesn't call SkGraphics::Init(). This setup won't work. Original issue's description: > Lay groundwork for SkOpts. > > This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. > > BUG=skia:4117 > > Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827 TBR=djsollen@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:4117 Review URL: https://codereview.chromium.org/1261743002
* fix up GrImmediateDrawTarget.cppGravatar joshualitt2015-07-27
| | | | | | | TBR=bsalomon@google.com BUG=skia: Review URL: https://codereview.chromium.org/1260023003
* Remove sk_memcpy32Gravatar mtklein2015-07-27
| | | | | | | | | | | | | | | | | | | | | | | | It's only implemented on x86, where the exisiting benchmark says memcpy() is faster for all cases: Timer overhead: 24ns curr/maxrss loops min median mean max stddev samples config bench 10/10 MB 1 35.9µs 36.2µs 36.2µs 36.6µs 1% ▁▂▄▅▅▃█▄▄▅ nonrendering sk_memcpy32_100000 10/10 MB 13 2.27µs 2.28µs 2.28µs 2.29µs 0% █▄▃▅▃▁▃▅▁▄ nonrendering sk_memcpy32_10000 11/11 MB 677 91.6ns 95.9ns 94.5ns 99.4ns 3% ▅▅▅▅▅█▁▁▁▁ nonrendering sk_memcpy32_1000 11/11 MB 1171 20ns 20.9ns 21.3ns 23.4ns 6% ▁▁▇▃▃▃█▇▃▃ nonrendering sk_memcpy32_100 11/11 MB 1952 14ns 14ns 14.3ns 15.2ns 3% ▁▁██▁▁▁▁▁▁ nonrendering sk_memcpy32_10 11/11 MB 5 33.6µs 33.7µs 34.1µs 35.2µs 2% ▆▇█▁▁▁▁▁▁▁ nonrendering memcpy32_memcpy_100000 11/11 MB 18 2.12µs 2.22µs 2.24µs 2.39µs 5% ▂█▄▇█▄▇▁▁▁ nonrendering memcpy32_memcpy_10000 11/11 MB 1112 87.3ns 87.3ns 89.1ns 93.7ns 3% ▄██▄▁▁▁▁▁▁ nonrendering memcpy32_memcpy_1000 11/11 MB 2124 12.8ns 13.3ns 13.5ns 14.8ns 6% ▁▁▁█▃▃█▇▃▃ nonrendering memcpy32_memcpy_100 11/11 MB 3077 9ns 9.41ns 9.52ns 10.2ns 4% ▃█▁█▃▃▃▃▃▃ nonrendering memcpy32_memcpy_10 (Why? One fewer thing to port to SkOpts.) BUG=skia:4117 Review URL: https://codereview.chromium.org/1256763003
* Lay groundwork for SkOpts.Gravatar mtklein2015-07-27
| | | | | | | | This doesn't really do anything yet. It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks. BUG=skia:4117 Review URL: https://codereview.chromium.org/1255193002
* Make peekPixels() usable with raster surface snapshotsGravatar fmalita2015-07-27
| | | | | | | | | | | | | | | SkSurface_Raster snapshots do not lock their backing bitmaps when the pixel ref is shared - they only lock on deep-copy. But since for raster surfaces the pixels are always in memory, I think it would be OK to also lock in the former case. This allows for optimized (zero-copy) reads of raster surface snapshot data. R=reed@google.com Review URL: https://codereview.chromium.org/1256993002
* Steal refs from other TextBatch in onCombineIfPossibleGravatar bsalomon2015-07-27
| | | | Review URL: https://codereview.chromium.org/1257683004
* NEON has a ternary instruction.Gravatar mtklein2015-07-27
| | | | | | | | Nothing seems to run any faster or slower, but it is terser. BUG=skia: Review URL: https://codereview.chromium.org/1255913004
* Make allocation count in TextBatch implicitGravatar bsalomon2015-07-27
| | | | Review URL: https://codereview.chromium.org/1254903002
* Added GrGLBlend.h|cpp with helper function AppendPorterDuffBlend() in ↵Gravatar wangyix2015-07-24
| | | | | | | | preparation for SkComposeShader gpu backend BUG=skia: Review URL: https://codereview.chromium.org/1254833003
* mixed text blobs really draws LCDGravatar joshualitt2015-07-24
| | | | | | | TBR=bsalomon@google.com BUG=skia: Review URL: https://codereview.chromium.org/1261483002
* fix for GrAtlasTextContext occasionally crashes on mixed runsGravatar joshualitt2015-07-24
| | | | | | | TBR=bsalomon@google.com BUG=510931 Review URL: https://codereview.chromium.org/1252423002
* Minimize retrieving SkGlyph in GrTextContextGravatar joshualitt2015-07-24
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/1257603005
* Fix Ganesh drawAtlas bug with quad colorsGravatar robertphillips2015-07-24
| | | | | | | | Ganesh was not expanding the quad colors to vertex colors before calling drawVertices. The new GM would've caught this bug and reveals Ganesh's limitations re the various xfer modes used with drawAtlas (i.e., w/o AA Ganesh only supports kModulate, w/ AA Ganesh only supports the coefficient-based xfer modes). Review URL: https://codereview.chromium.org/1254943002
* Set preallocated TextBatch geometries to 4 rather than 32Gravatar bsalomon2015-07-24
| | | | Review URL: https://codereview.chromium.org/1256843003
* skia: wrong calling convention on eglGetPlatformDisplayEXTGravatar hendrikw2015-07-24
| | | | | | | | | | | | When attempting to run the release compile of nanobench on windows, I would immediately crash due to c++'s buffer security check. This was caused by calling the function with the wrong calling convention. I'm not sure how this ever worked for anyone. Anyway, fix is to use eglext.h's version of the function definition. Review URL: https://codereview.chromium.org/1250383002
* fix path ops fuzz busterGravatar caryclark2015-07-23
| | | | | | | | | | | | Mark collapsed segments as done and remove collapsed segment references from the coincidence array. Also add test names to global debugging. R=fmalita@chromium.org BUG=512592 Review URL: https://codereview.chromium.org/1250293002
* Fix ImageNewSurface test on S4.Gravatar bsalomon2015-07-23
| | | | | | This still leaves the SkImage_NewFromTexture broken. Review URL: https://codereview.chromium.org/1253513004
* Fix variable shadowing in SkMorphologyImageFilterGravatar robertphillips2015-07-23
| | | | Review URL: https://codereview.chromium.org/1245883005
* fix comment on GrBatchTextStrikeGravatar joshualitt2015-07-23
| | | | | | | TBR=bsalomon@google.com BUG=skia: Review URL: https://codereview.chromium.org/1252783002
* Attempt to somewhat simplify GrContext::readSurfacePixels interaction with ↵Gravatar bsalomon2015-07-23
| | | | | | GrGpu. Review URL: https://codereview.chromium.org/1255483005
* Name of primitive processor will now be printed in generated shader codeGravatar wangyix2015-07-23
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/1253513003
* Added GrGLFragmentProcessor::EmitArgs struct for use with emitCode()Gravatar wangyix2015-07-22
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/1251173002
* Misc cleanupGravatar robertphillips2015-07-22
| | | | | | | | | | This is split off of https://codereview.chromium.org/1225923010/ (Start tightening correspondence betweeen GrDrawContext and GrRenderTarget). It: fixes some style nits replaces some passing of GrContext with GrTextureProvider & GrDrawContext does a bit of the finer grained creation of GrDrawContexts Review URL: https://codereview.chromium.org/1245183002
* 565 support for SIMD xfermodesGravatar mtklein2015-07-22
| | | | | | | | | | | | | | | | | | | | | | This uses the most basic approach possible: - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. Clearly, we can optimize these loads and stores. That's a TODO. The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. This will cause minor GM diffs, but I don't think any layout test changes. BUG=skia: Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066 Review URL: https://codereview.chromium.org/1245673002
* Add drawImage{Rect,} support to SkDebugCanvasGravatar fmalita2015-07-22
| | | | | | R=robertphillips@google.com,mtklein@google.com Review URL: https://codereview.chromium.org/1253473002
* Moved GrGLFragmentProcessor definition to its own fileGravatar wangyix2015-07-22
| | | | | | BUG=skia: Review URL: https://codereview.chromium.org/1246193002
* Remove some redundant fields from BitmapTextBatch (and rename to TextBatch).Gravatar bsalomon2015-07-22
| | | | Review URL: https://codereview.chromium.org/1244093004
* skia: GrGLAssembleGLInterface update load chromium extension functionsGravatar hendrikw2015-07-22
| | | | | | | | Command buffer will expose GL_CHROMIUM_framebuffer_multisample and GL_CHROMIUM_map_sub, added support for these to enable interface validation to succeed. Review URL: https://codereview.chromium.org/1248853003
* Fix tile drop-out on S4 for texture decal mode.Gravatar jvanverth2015-07-22
| | | | | | | | | Switch to use highp on interpolants. Also removes some unnecessary formatting. BUG=skia:3381 Review URL: https://codereview.chromium.org/1245703004
* Add the ability to decode a subset to SkCodecGravatar scroggo2015-07-22
| | | | | | | | | | | | | | | | | | | | | | | | This allows codecs that support subsets natively (i.e. WEBP) to do so. Add a field on SkCodec::Options representing the subset. Add a method on SkCodec to find a valid subset which approximately matches a desired subset. Implement subset decodes in SkWebpCodec. Add a test in DM for decoding subsets. Notice that we only start on even boundaries. This is due to the way libwebp's API works. SkWEBPImageDecoder does not take this into account, which results in visual artifacts. FIXME: Subsets with scaling are not pixel identical, but close. (This may be fine, though - they are not perceptually different. We'll just need to mark another set of images in gold as valid, once https://skbug.com/4038 is fixed, so we can tests scaled webp without generating new images on each run.) Review URL: https://codereview.chromium.org/1240143002
* Fix SkCanvas::wouldOverwriteEntireSurface() contains testGravatar fmalita2015-07-22
| | | | | | R=reed@google.com,robertphillips@google.com,bsalomon@google.com Review URL: https://codereview.chromium.org/1244093005
* Revert of 565 support for SIMD xfermodes (patchset #4 id:60001 of ↵Gravatar mtklein2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1245673002/) Reason for revert: NEON 565 gold images have gone ugly. This is what I get for writing and testing SSE and just writing NEON. E.g. colortype_xfermodes, dstreadshuffle, bigbitmaprect, pictures, textbloblooper, aaxfermodes (only Plus) Original issue's description: > 565 support for SIMD xfermodes > > This uses the most basic approach possible: > - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. > - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. > > Clearly, we can optimize these loads and stores. That's a TODO. > > The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. > > The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. > > This will cause minor GM diffs, but I don't think any layout test changes. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 > > Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066 TBR=msarett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1248893004
* 565 support for SIMD xfermodesGravatar mtklein2015-07-21
| | | | | | | | | | | | | | | | | | | | This uses the most basic approach possible: - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors. - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially. Clearly, we can optimize these loads and stores. That's a TODO. The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling. The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster. This will cause minor GM diffs, but I don't think any layout test changes. BUG=skia: Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0 Review URL: https://codereview.chromium.org/1245673002
* Change the GlyphCache to use a hash table instead of doing its own ad-hocGravatar herb2015-07-21
| | | | | | | hashing. This change appears to be performance neutral. BUG=skia: Review URL: https://codereview.chromium.org/1216983003
* ANGLE deps rollGravatar hendrikw2015-07-21
| | | | | | | | | | | | | | | If we ever want to allow the command buffer as a skia gles2 backend, we need a more up to date version of ANGLE, specifically there are 4 defines that differ between newer and older versions of ANGLE which we use in skia, I've updated these in this change. I'm not quite sure if what I've done for the 'angle_path' is correct, I tried setting it to a path relative to skia, and to '<(DEPTH)', both of which do not compile correctly, only '../' worked. Committed: https://skia.googlesource.com/skia/+/db0b1e796ddbd08e6be8a666537318b1c0e2ce56 Review URL: https://codereview.chromium.org/1244843003
* Revert of Bilinear optimization for 1D convolution. (patchset #5 id:200001 ↵Gravatar ericrk2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | of https://codereview.chromium.org/1216623003/) Reason for revert: Ok, I am now seeing a couple issues. going to revert and investigate further. Original issue's description: > Bilinear optimization for 1D convolution. > > Splits GrGLConvolutionEffect into GrGLBilerpConvolutionEffect and > GrGLBoundedConvolutionEffect. When doing a non-bounded convolution we now > always use the GrGLBilerpConvolutionEffect which uses bilinear filtering to > perform half as many samples in the texture. > > BUG=skia:3986 > > Committed: https://skia.googlesource.com/skia/+/91abe10af417148939548551e210c001022d3bda > > Committed: https://skia.googlesource.com/skia/+/0f38612b0facf585854aba4556433b858cbf7da8 TBR=bsalomon@google.com,senorblanco@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia:3986 Review URL: https://codereview.chromium.org/1247063005
* Revert of skia: ANGLE deps roll (patchset #1 id:1 of ↵Gravatar hendrikw2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1244843003/) Reason for revert: Compile error that the try bots didn't catch :( Original issue's description: > ANGLE deps roll > > If we ever want to allow the command buffer as a skia gles2 backend, > we need a more up to date version of ANGLE, specifically there are > 4 defines that differ between newer and older versions of ANGLE which > we use in skia, I've updated these in this change. > > I'm not quite sure if what I've done for the 'angle_path' is correct, > I tried setting it to a path relative to skia, and to '<(DEPTH)', both > of which do not compile correctly, only '../' worked. > > Committed: https://skia.googlesource.com/skia/+/db0b1e796ddbd08e6be8a666537318b1c0e2ce56 TBR=bsalomon@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Review URL: https://codereview.chromium.org/1245223007
* ANGLE deps rollGravatar hendrikw2015-07-21
| | | | | | | | | | | | | If we ever want to allow the command buffer as a skia gles2 backend, we need a more up to date version of ANGLE, specifically there are 4 defines that differ between newer and older versions of ANGLE which we use in skia, I've updated these in this change. I'm not quite sure if what I've done for the 'angle_path' is correct, I tried setting it to a path relative to skia, and to '<(DEPTH)', both of which do not compile correctly, only '../' worked. Review URL: https://codereview.chromium.org/1244843003
* Bilinear optimization for 1D convolution.Gravatar ericrk2015-07-21
| | | | | | | | | | | | | Splits GrGLConvolutionEffect into GrGLBilerpConvolutionEffect and GrGLBoundedConvolutionEffect. When doing a non-bounded convolution we now always use the GrGLBilerpConvolutionEffect which uses bilinear filtering to perform half as many samples in the texture. BUG=skia:3986 Committed: https://skia.googlesource.com/skia/+/91abe10af417148939548551e210c001022d3bda Review URL: https://codereview.chromium.org/1216623003
* Clean up more SkXfermode.cpp dead code.Gravatar mtklein2015-07-21
| | | | | | | | | | | | | These handwritten xfermodes for Clear, Src, DstIn, and DstOut are actually dead code: they're all covered by Sk4pxXfermode, which we'd already have returned. Tidies up the xfermode creation logic to make this clearer. This cuts 20-40K off SkXfermode.o, depending on the platform. BUG=skia: Review URL: https://codereview.chromium.org/1249773004
* adding assert to GrAtlasTextContextGravatar joshualitt2015-07-21
| | | | | | | TBR=bsalomon@google.com BUG=skia: Review URL: https://codereview.chromium.org/1241263003
* De-templatize Sk4pxXfermode code a bit.Gravatar mtklein2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This deduplicates a few pieces of code: - we end up with one copy of each xfer32() driver loop instead of one per xfermode; - we end up with two* copies of each xfermode implementation instead of ten**. * For a given Mode: Mode() itself and xfer_aa<Mode>(). ** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA. This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON. If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings. Performance is affected in proportion to the original xfermode speed: fast modes like Plus take the largest proportional hit, and slow modes like HardLight or SoftLight see essentially no hit at all. This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty. For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things. TBR=reed@google.com No public API changes. BUG=skia: Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49da CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon-Trybot Review URL: https://codereview.chromium.org/1242743004
* Revert of De-templatize Sk4pxXfermode code a bit. (patchset #2 id:20001 of ↵Gravatar mtklein2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://codereview.chromium.org/1242743004/) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon/builds/1168/steps/build%20most/logs/stdio Original issue's description: > De-templatize Sk4pxXfermode code a bit. > > This deduplicates a few pieces of code: > - we end up with one copy of each xfer32() driver loop instead of one per xfermode; > - we end up with two* copies of each xfermode implementation instead of ten**. > > * For a given Mode: Mode() itself and xfer_aa<Mode>(). > ** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA. > > This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON. > > If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings. > > Performance is affected in proportion to the original xfermode speed: > fast modes like Plus take the largest proportional hit, and slow modes > like HardLight or SoftLight see essentially no hit at all. > > This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty. > > For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things. > > TBR=reed@google.com > No public API changes. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49da TBR=msarett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1245273005
* Possible fix Moto E compilation failureGravatar robertphillips2015-07-21
| | | | | | | | It appears that the Adreno compiler is even more twitchy about gl_FragCoord handling than expected. BUG=skia:4078 Review URL: https://codereview.chromium.org/1246773003
* De-templatize Sk4pxXfermode code a bit.Gravatar mtklein2015-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This deduplicates a few pieces of code: - we end up with one copy of each xfer32() driver loop instead of one per xfermode; - we end up with two* copies of each xfermode implementation instead of ten**. * For a given Mode: Mode() itself and xfer_aa<Mode>(). ** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA. This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON. If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings. Performance is affected in proportion to the original xfermode speed: fast modes like Plus take the largest proportional hit, and slow modes like HardLight or SoftLight see essentially no hit at all. This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty. For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things. TBR=reed@google.com No public API changes. BUG=skia: Review URL: https://codereview.chromium.org/1242743004