aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/jumper/build_stages.py
Commit message (Collapse)AuthorAge
* add a guide to contributing to SkJumperGravatar Mike Klein2017-05-08
| | | | | | | Change-Id: Icd40cf5eff3d2156a3ca00d7950059d5b77f48bf Reviewed-on: https://skia-review.googlesource.com/15890 Reviewed-by: Ben Wagner <bungeman@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* finish up constantsGravatar Mike Klein2017-05-01
| | | | | | | | | | | | | | | | | | For whatever reason, if I swap the condition in the if_then_else tests from < to >= and swap the then/else values, I can use constants in hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it has something to do with SSE, IEEE, and NaN, but I don't care enough to speculate any more concretely. This does that, removes C() and _f, updates some comments, and adds a guard in build_stages.py to yell if it sees trouble like LCPI40_4... This reminds me to try -ffast-math soon. I think that was mostly held back by constants. Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3 Reviewed-on: https://skia-review.googlesource.com/14942 Reviewed-by: Herb Derby <herb@google.com>
* some float constantsGravatar Mike Klein2017-05-01
| | | | | | | | | | | | Trying to go slowly to find where problems arise. Weirdly, I think I got everything except hsl_to_rgb. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2 Change-Id: I4d85a4c1f40bd87e7cb18fc9b5ce020812dc31db Reviewed-on: https://skia-review.googlesource.com/14905 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* prep for more constantsGravatar Mike Klein2017-04-26
| | | | | | | | | | | | | - Add -z to print zero bytes instead of ... - avx+hsw will create 32-byte constants in .const, so we should disassemble those too, and align to 32 bytes. - The default _text section on Windows is 16-byte aligned, so we make a new one that's 32-byte aligned. Change-Id: Icb2a962baa4c3735e98a992f2285eaf5cb1680fd Reviewed-on: https://skia-review.googlesource.com/14364 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* jumper, maybe we can just use constantsGravatar Mike Klein2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | As long as everything is laid out the same way they were originally, I don't think there's any reason we can't just use %rip-relative addressing on x86-64. Basically, we just need to keep all the sections together in order. Somewhat subtly we cannot just use -D to disassemble all sections. -D will double-disassemble[1] some bytes, which throws off our %rip-relative addressing of constants. You can see this in PS1. So we whitelist sections instead. [1], from man objdump: This option also has a subtle effect on the disassembly of instructions in code sections. When option -d is in effect objdump will assume that any symbols present in a code section occur on the boundary between instructions and it will refuse to disassemble across such a boundary. When option -D is in effect however this assumption is supressed. This means that it is possible for the output of -d and -D to differ if, for example, data is stored in code sections. Change-Id: Idbcfe08e67113b3f7d75749931c640ff90aa0bf4 Reviewed-on: https://skia-review.googlesource.com/14029 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* refine .type ...,%functionGravatar Mike Klein2017-04-13
| | | | | | | | | | | | | | | | .type is an ELF thing, not understood by Mach. So do the same sort of #define trick we do for HIDDEN. This expands the use of .type ...,%function to everywhere that supports it, rather than just where we needed it. Feels cozier this way. CQ_INCLUDE_TRYBOTS=skia.primary:Build-Mac-Clang-arm-Debug-iOS,Build-Mac-Clang-arm-Release-iOS,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Release,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Debug Change-Id: Iaff01b0f3f70ceedf743d7a553915792cdd7e569 Reviewed-on: https://skia-review.googlesource.com/13469 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* jumper, explicitly tag functions as functionsGravatar Mike Klein2017-04-13
| | | | | | | | | | | | | | | | | | This helps the linker decide to rewrite bl -> blx when linking Thumb2 SkJumper.o code with ARM SkJumper_generated.o. The reason Android wasn't failing is because it somehow figured out to do this without these .type directives. We use a different toolchain for ChromeOS builds that I guess needs more handholding. BUG=skia:6471 CQ_INCLUDE_TRYBOTS=skia.primary:Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Release,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Debug Change-Id: I4a5c50b6ab7683512776c70aec6e9a75a0999787 Reviewed-on: https://skia-review.googlesource.com/13464 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, turn off a few fancy featuresGravatar Mike Klein2017-04-06
| | | | | | | | | | | | | | | | | | | | | | | This doesn't change any of the generated .S files, but it does cut a few misc. sections from the intermediate .o files. It's nice to get those sections out of the way, and one day we might be able to find ways to cut everything but .text... that'd allow us to switch the supicious section sniffing code from a blacklist (no .const, no .literal, etc.) to a more foolproof whitelist (.text or bust). The remaining sections are only in ELF objects (aarch64.o, vfp4.o): .comment (notes the version of Clang/LLVM that compiled it) .note.GNU-stack (we manually add this back in build_stages.py) and vfp4.o has two more sections that I don't understand yet: .ARM.exidx (I'd have thought -fno-unwind-tables would cut this) .ARM.attributes While doing this, I've tried to make the ARM flags a bit more compact. Change-Id: I30ef6acb2a917ec938c5358c3f970fe04b6d7afa Reviewed-on: https://skia-review.googlesource.com/11485 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, drop Android NDK dependencyGravatar Mike Klein2017-04-03
| | | | | | | | | | | | | | We don't _really_ need the Android NDK. We just need <arm_neon.h> (which comes from Clang, not the NDK) and a smattering of <stdint.h> ([u]intN_t), <string.h> (memcpy) and <stddef.h> (size_t). The idea here is solely to make it easier to run build_stages.py. If this becomes a pain to maintain, I'm happy to go back to the NDK. Change-Id: Ic6bb287646b6160ac42ac6e4d5290a66a7e92425 Reviewed-on: https://skia-review.googlesource.com/10980 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, only ignore red zone on WindowsGravatar Mike Klein2017-03-31
| | | | | | | | | | | | | | | On Linux and Mac there's always a red zone of 128 bytes of stack space for us to use without touching the stack pointer. We'd been generating stage code as if that's not there because it's not there on Windows. We have a separate .S file for Windows anyway, so there's no need to ignore the red zone when we know it's there. Change-Id: I81a7841020bb8aad68bf35feac851727ef1d0758 Reviewed-on: https://skia-review.googlesource.com/10965 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* make build_stages a little less mtklein-centricGravatar Mike Klein2017-03-31
| | | | | | | Change-Id: I69069aeaefd1c8c90de83eb86bb935e82a74bc9f Reviewed-on: https://skia-review.googlesource.com/10923 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* tell Google3 we do not need executable stackGravatar Mike Klein2017-03-31
| | | | | | | Change-Id: I5ff3599448d027fcac43a53e98a801ce672ce5ee Reviewed-on: https://skia-review.googlesource.com/10861 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* make _win.S know if it's 64-bitGravatar Mike Klein2017-03-30
| | | | | | | | | | | | | I think this is the root of my Windows / Chrome problems. Even on 32-bit builds, Chrome compiles nacl64.exe in 64-bit mode. So to make things simple, always put _win.S in the sources, and no-op it away when assembling for 32-bit. Change-Id: I19f163491739a6c0cbdedd0ce353f1d2289907ae Reviewed-on: https://skia-review.googlesource.com/10637 Reviewed-by: Matt Sarett <msarett@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Don't export stage symbols.Gravatar Mike Klein2017-03-30
| | | | | | | | | | | | Chromium Mac bots are getting tripped up by stages being visible. .hidden and .private_extern are -fvisibilty=hidden for ELF and MACH-O. CQ_INCLUDE_TRYBOTS=skia.primary:Build-Mac-Clang-arm-Debug-iOS Change-Id: I8dbb04f514eead4ab480664f2674db4b57611b84 Reviewed-on: https://skia-review.googlesource.com/10622 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* jumper, revert to generating .S filesGravatar Mike Klein2017-03-29
| | | | | | | | | | | | | | | | | | | I went with the unified-in-one-.cpp approach mostly to make it easy to roll out SkJumper. I no longer see any difficultly rolling out the assembly files, and it's possible the unified .cpp approach just makes things harder. Let's see if it's any easier to get Chrome's official build to work with normal assembly files. It's not going to be a problem to roll out. This is a partial revert of https://skia-review.googlesource.com/c/9336. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Idfdbd2d322452b44bc0adaf6dc299cc7649bc51e Reviewed-on: https://skia-review.googlesource.com/10561 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: update to Clang 4.0Gravatar Mike Klein2017-03-15
| | | | | | | | | | | | | This Clang makes some new decisions about what (not) to inline. Luckily, liberal use of the 'inline' keyword steers it back in the right direction. This new code draws the same, and generally looks improved. Change-Id: I0ab6e1c884e6b339d01ae46a08a848e36dcc535a Reviewed-on: https://skia-review.googlesource.com/9702 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: more constants, _f and _i literals.Gravatar Mike Klein2017-03-14
| | | | | | | | | | | Generalize section types to avoid, adding another type (.rodata). I've kept K for iota only. Maybe one day... Change-Id: Ie5678a2ea00fefe550bc0e6dcab32f98c31d3fae Reviewed-on: https://skia-review.googlesource.com/9403 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* Back to code as data arrays, this time in .text.Gravatar Mike Klein2017-03-07
| | | | | | | | | | | | | | | | | This technique lets us generate a single source file, use the C++ preprocessor, and avoid the pain of working with assemblers. By using the section attribute or declspec allocate, we can put these data arrays into the .text section, making them ordinary code. This is like the previous solution, except it should actually run. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug Change-Id: Ide7675f6cf32eb4831ff02906acbdc3faaeaa684 Reviewed-on: https://skia-review.googlesource.com/9336 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: be more precise by rejecting data sections.Gravatar Mike Klein2017-03-02
| | | | | | | | | | | | | | This allows %rip addressing as long as it's not going into a data section. This lets us use switch tables, avoiding loops and stack. On HSW, SkRasterPipeline_f16: 90 -> 63 SkRasterPipeline_srgb: 170 -> 97 Change-Id: I3ca2e4ff819b70beea78be75579f9d80c06979e8 Reviewed-on: https://skia-review.googlesource.com/9146 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: allow the compiler to generate FMAsGravatar Mike Klein2017-03-02
| | | | | | | | | | | | | | | Today we use mad() to get FMAs where possible. -ffp-contract=fast lets the compiler generate them if it spots an opportunity. It looks like it's found a mix of FMAs and FMSs. I will follow up by seeing if we can relax the use of mad(). Quick experiments say no, but less quick experiments may say otherwise. Change-Id: I5228811cfbf11cccc0d715672a464fd1e1cea3b0 Reviewed-on: https://skia-review.googlesource.com/9136 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Some small SkJumper refactoring.Gravatar Mike Klein2017-03-02
| | | | | | | | | No generated code changes. Change-Id: I2d480b5391f8246a01118766a9522d528a87f75a Reviewed-on: https://skia-review.googlesource.com/9129 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: upgrade to Clang 3.9Gravatar Mike Klein2017-03-01
| | | | | | | | | | | Mostly I think this will help me handle the AVX tails better. But there are some wins here already, particularly in AVX and ARM code. Change-Id: Ie79b4c2c4ab455277c313f15d360cbf8e4bb7836 Reviewed-on: https://skia-review.googlesource.com/9126 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: reformat .S filesGravatar Mike Klein2017-02-23
| | | | | | | | | | | | Decimal byte encoding makes more horizontal space for comments, which are the only thing you really want to read. No code change here. Change-Id: I674d78c898976063b0d89b747af41c62dc294303 Reviewed-on: https://skia-review.googlesource.com/8899 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* Add AVX to the SkJumper mix.Gravatar Mike Klein2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | AVX is a nice little halfway point between SSE4.1 and HSW, in terms of instructions available, performance, and availability. Intel chips have had AVX since ~2011, compared to ~2013 for HSW and ~2007 for SSE4.1. Like HSW it's got 8-wide 256-bit float vectors, but integer (and double) operations are essentially still only 128-bit. It also doesn't have F16 conversion or FMA instructions. It doesn't look like this is going to be a burden to maintain, and only adds a few KB of code size. In exchange, we now run 8x wide on 45% to 70% of x86 machines, depending on the OS. In my brief testing, speed eerily resembles exact geometric progression: SSE4.1: 1x speed (baseline) AVX: ~sqrt(2)x speed HSW: ~2x speed This adds all the basic plumbing for AVX but leaves it disabled. I'll flip it on once I've implemented the f16 TODOs. Change-Id: I1c378dabb8a06386646371bf78ade9e9432b006f Reviewed-on: https://skia-review.googlesource.com/8898 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: WindowsGravatar Mike Klein2017-02-21
| | | | | | | | | | | | | - Compile stages with -DWIN to pick up MS-specific start_pipeline(). - Add SkJumper_generated_win.S with MS-specific assembly. - Add a minimal asm tool to our GN Windows toolchain. The SkRasterPipeline_f16 benchmark run ~4x faster on my desktop. Change-Id: Ia45afb4ecb6a055e2c0e43f0f54f59e081c23b7f Reviewed-on: https://skia-review.googlesource.com/8778 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumper: aarch64 and armv7Gravatar Mike Klein2017-02-18
| | | | | | | Change-Id: Ie356b062372af3516a437d27bafa20d98e28edd6 Reviewed-on: https://skia-review.googlesource.com/8678 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>
* SkJumper: start on asmGravatar Mike Klein2017-02-17
| | | | | | | | | | | | | Will follow up with Linux, then Android aarch64 and armv7, then iOS, then Windows. I took some opportunities to refactor. CQ_INCLUDE_trybots=skia.primary:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Perf-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug Change-Id: Ifcf1edabdfe5df0a91bd089f09523aba95cdf5ef Reviewed-on: https://skia-review.googlesource.com/8611 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>
* SkJumper: make some room for wider instructions.Gravatar Mike Klein2017-02-16
| | | | | | | | | No real change here. Change-Id: I56449c292585038901d78902e6aeb68203e36351 Reviewed-on: https://skia-review.googlesource.com/8476 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>
* SkJumperGravatar Mike Klein2017-02-16
Change-Id: If9f73e712e429564fef58ccb838c212ec8d2e68c Reviewed-on: https://skia-review.googlesource.com/8525 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>