| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
Change-Id: Icd40cf5eff3d2156a3ca00d7950059d5b77f48bf
Reviewed-on: https://skia-review.googlesource.com/15890
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For whatever reason, if I swap the condition in the if_then_else tests
from < to >= and swap the then/else values, I can use constants in
hsl_to_rgb. Still don't understand why, but I'll take it. I suspect it
has something to do with SSE, IEEE, and NaN, but I don't care enough to
speculate any more concretely.
This does that, removes C() and _f, updates some comments, and adds a
guard in build_stages.py to yell if it sees trouble like LCPI40_4...
This reminds me to try -ffast-math soon. I think that was mostly held
back by constants.
Change-Id: I3f8a37a4d4642f77422ce3261b750061e9e604a3
Reviewed-on: https://skia-review.googlesource.com/14942
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trying to go slowly to find where problems arise.
Weirdly, I think I got everything except hsl_to_rgb.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win7-MSVC-Golo-CPU-AVX-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE41,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Release-SK_CPU_LIMIT_SSE2
Change-Id: I4d85a4c1f40bd87e7cb18fc9b5ce020812dc31db
Reviewed-on: https://skia-review.googlesource.com/14905
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add -z to print zero bytes instead of ...
- avx+hsw will create 32-byte constants in .const,
so we should disassemble those too, and align to 32 bytes.
- The default _text section on Windows is 16-byte aligned,
so we make a new one that's 32-byte aligned.
Change-Id: Icb2a962baa4c3735e98a992f2285eaf5cb1680fd
Reviewed-on: https://skia-review.googlesource.com/14364
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As long as everything is laid out the same way they were originally, I don't
think there's any reason we can't just use %rip-relative addressing on x86-64.
Basically, we just need to keep all the sections together in order.
Somewhat subtly we cannot just use -D to disassemble all sections. -D will
double-disassemble[1] some bytes, which throws off our %rip-relative addressing
of constants. You can see this in PS1. So we whitelist sections instead.
[1], from man objdump:
This option also has a subtle effect on the disassembly of instructions in code
sections. When option -d is in effect objdump will assume that any symbols
present in a code section occur on the boundary between instructions and it will
refuse to disassemble across such a boundary. When option -D is in effect however
this assumption is supressed. This means that it is possible for the output of -d
and -D to differ if, for example, data is stored in code sections.
Change-Id: Idbcfe08e67113b3f7d75749931c640ff90aa0bf4
Reviewed-on: https://skia-review.googlesource.com/14029
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
.type is an ELF thing, not understood by Mach.
So do the same sort of #define trick we do for HIDDEN.
This expands the use of .type ...,%function to everywhere
that supports it, rather than just where we needed it.
Feels cozier this way.
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Mac-Clang-arm-Debug-iOS,Build-Mac-Clang-arm-Release-iOS,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Release,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Debug
Change-Id: Iaff01b0f3f70ceedf743d7a553915792cdd7e569
Reviewed-on: https://skia-review.googlesource.com/13469
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps the linker decide to rewrite bl -> blx when linking
Thumb2 SkJumper.o code with ARM SkJumper_generated.o.
The reason Android wasn't failing is because it somehow figured
out to do this without these .type directives. We use a different
toolchain for ChromeOS builds that I guess needs more handholding.
BUG=skia:6471
CQ_INCLUDE_TRYBOTS=skia.primary:Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Release,Test-ChromeOS-Clang-Chromebook_C100p-GPU-MaliT764-arm-Debug
Change-Id: I4a5c50b6ab7683512776c70aec6e9a75a0999787
Reviewed-on: https://skia-review.googlesource.com/13464
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This doesn't change any of the generated .S files, but it does cut a few
misc. sections from the intermediate .o files. It's nice to get those
sections out of the way, and one day we might be able to find ways to
cut everything but .text... that'd allow us to switch the supicious
section sniffing code from a blacklist (no .const, no .literal, etc.) to
a more foolproof whitelist (.text or bust).
The remaining sections are only in ELF objects (aarch64.o, vfp4.o):
.comment (notes the version of Clang/LLVM that compiled it)
.note.GNU-stack (we manually add this back in build_stages.py)
and vfp4.o has two more sections that I don't understand yet:
.ARM.exidx (I'd have thought -fno-unwind-tables would cut this)
.ARM.attributes
While doing this, I've tried to make the ARM flags a bit more compact.
Change-Id: I30ef6acb2a917ec938c5358c3f970fe04b6d7afa
Reviewed-on: https://skia-review.googlesource.com/11485
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't _really_ need the Android NDK. We just need <arm_neon.h>
(which comes from Clang, not the NDK) and a smattering of <stdint.h>
([u]intN_t), <string.h> (memcpy) and <stddef.h> (size_t).
The idea here is solely to make it easier to run build_stages.py.
If this becomes a pain to maintain, I'm happy to go back to the NDK.
Change-Id: Ic6bb287646b6160ac42ac6e4d5290a66a7e92425
Reviewed-on: https://skia-review.googlesource.com/10980
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux and Mac there's always a red zone of 128 bytes of stack space
for us to use without touching the stack pointer. We'd been generating
stage code as if that's not there because it's not there on Windows.
We have a separate .S file for Windows anyway, so there's no need to
ignore the red zone when we know it's there.
Change-Id: I81a7841020bb8aad68bf35feac851727ef1d0758
Reviewed-on: https://skia-review.googlesource.com/10965
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
| |
Change-Id: I69069aeaefd1c8c90de83eb86bb935e82a74bc9f
Reviewed-on: https://skia-review.googlesource.com/10923
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: I5ff3599448d027fcac43a53e98a801ce672ce5ee
Reviewed-on: https://skia-review.googlesource.com/10861
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think this is the root of my Windows / Chrome problems.
Even on 32-bit builds, Chrome compiles nacl64.exe in 64-bit mode.
So to make things simple, always put _win.S in the sources,
and no-op it away when assembling for 32-bit.
Change-Id: I19f163491739a6c0cbdedd0ce353f1d2289907ae
Reviewed-on: https://skia-review.googlesource.com/10637
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chromium Mac bots are getting tripped up by stages being visible.
.hidden and .private_extern are -fvisibilty=hidden for ELF and MACH-O.
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Mac-Clang-arm-Debug-iOS
Change-Id: I8dbb04f514eead4ab480664f2674db4b57611b84
Reviewed-on: https://skia-review.googlesource.com/10622
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I went with the unified-in-one-.cpp approach mostly to make it easy to
roll out SkJumper. I no longer see any difficultly rolling out the
assembly files, and it's possible the unified .cpp approach just makes
things harder.
Let's see if it's any easier to get Chrome's official build to work with
normal assembly files. It's not going to be a problem to roll out.
This is a partial revert of https://skia-review.googlesource.com/c/9336.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug
Change-Id: Idfdbd2d322452b44bc0adaf6dc299cc7649bc51e
Reviewed-on: https://skia-review.googlesource.com/10561
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This Clang makes some new decisions about what (not) to inline.
Luckily, liberal use of the 'inline' keyword steers it back in
the right direction.
This new code draws the same, and generally looks improved.
Change-Id: I0ab6e1c884e6b339d01ae46a08a848e36dcc535a
Reviewed-on: https://skia-review.googlesource.com/9702
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Generalize section types to avoid, adding another type (.rodata).
I've kept K for iota only. Maybe one day...
Change-Id: Ie5678a2ea00fefe550bc0e6dcab32f98c31d3fae
Reviewed-on: https://skia-review.googlesource.com/9403
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This technique lets us generate a single source file, use the C++
preprocessor, and avoid the pain of working with assemblers.
By using the section attribute or declspec allocate, we can put these
data arrays into the .text section, making them ordinary code.
This is like the previous solution, except it should actually run.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug
Change-Id: Ide7675f6cf32eb4831ff02906acbdc3faaeaa684
Reviewed-on: https://skia-review.googlesource.com/9336
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows %rip addressing as long as it's not going into a data
section. This lets us use switch tables, avoiding loops and stack.
On HSW,
SkRasterPipeline_f16: 90 -> 63
SkRasterPipeline_srgb: 170 -> 97
Change-Id: I3ca2e4ff819b70beea78be75579f9d80c06979e8
Reviewed-on: https://skia-review.googlesource.com/9146
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today we use mad() to get FMAs where possible.
-ffp-contract=fast lets the compiler generate them if it spots an opportunity.
It looks like it's found a mix of FMAs and FMSs.
I will follow up by seeing if we can relax the use of mad().
Quick experiments say no, but less quick experiments may say otherwise.
Change-Id: I5228811cfbf11cccc0d715672a464fd1e1cea3b0
Reviewed-on: https://skia-review.googlesource.com/9136
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
No generated code changes.
Change-Id: I2d480b5391f8246a01118766a9522d528a87f75a
Reviewed-on: https://skia-review.googlesource.com/9129
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly I think this will help me handle the AVX tails better.
But there are some wins here already, particularly in AVX and ARM code.
Change-Id: Ie79b4c2c4ab455277c313f15d360cbf8e4bb7836
Reviewed-on: https://skia-review.googlesource.com/9126
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Decimal byte encoding makes more horizontal space for comments,
which are the only thing you really want to read.
No code change here.
Change-Id: I674d78c898976063b0d89b747af41c62dc294303
Reviewed-on: https://skia-review.googlesource.com/8899
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AVX is a nice little halfway point between SSE4.1 and HSW, in terms
of instructions available, performance, and availability.
Intel chips have had AVX since ~2011, compared to ~2013 for HSW and
~2007 for SSE4.1. Like HSW it's got 8-wide 256-bit float vectors,
but integer (and double) operations are essentially still only 128-bit.
It also doesn't have F16 conversion or FMA instructions.
It doesn't look like this is going to be a burden to maintain, and only
adds a few KB of code size. In exchange, we now run 8x wide on 45% to
70% of x86 machines, depending on the OS.
In my brief testing, speed eerily resembles exact geometric progression:
SSE4.1: 1x speed (baseline)
AVX: ~sqrt(2)x speed
HSW: ~2x speed
This adds all the basic plumbing for AVX but leaves it disabled.
I'll flip it on once I've implemented the f16 TODOs.
Change-Id: I1c378dabb8a06386646371bf78ade9e9432b006f
Reviewed-on: https://skia-review.googlesource.com/8898
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Compile stages with -DWIN to pick up MS-specific start_pipeline().
- Add SkJumper_generated_win.S with MS-specific assembly.
- Add a minimal asm tool to our GN Windows toolchain.
The SkRasterPipeline_f16 benchmark run ~4x faster on my desktop.
Change-Id: Ia45afb4ecb6a055e2c0e43f0f54f59e081c23b7f
Reviewed-on: https://skia-review.googlesource.com/8778
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: Ie356b062372af3516a437d27bafa20d98e28edd6
Reviewed-on: https://skia-review.googlesource.com/8678
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Will follow up with Linux, then Android aarch64 and armv7, then iOS, then Windows.
I took some opportunities to refactor.
CQ_INCLUDE_trybots=skia.primary:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Perf-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug
Change-Id: Ifcf1edabdfe5df0a91bd089f09523aba95cdf5ef
Reviewed-on: https://skia-review.googlesource.com/8611
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
| |
No real change here.
Change-Id: I56449c292585038901d78902e6aeb68203e36351
Reviewed-on: https://skia-review.googlesource.com/8476
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
Change-Id: If9f73e712e429564fef58ccb838c212ec8d2e68c
Reviewed-on: https://skia-review.googlesource.com/8525
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|