| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This Clang makes some new decisions about what (not) to inline.
Luckily, liberal use of the 'inline' keyword steers it back in
the right direction.
This new code draws the same, and generally looks improved.
Change-Id: I0ab6e1c884e6b339d01ae46a08a848e36dcc535a
Reviewed-on: https://skia-review.googlesource.com/9702
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Generalize section types to avoid, adding another type (.rodata).
I've kept K for iota only. Maybe one day...
Change-Id: Ie5678a2ea00fefe550bc0e6dcab32f98c31d3fae
Reviewed-on: https://skia-review.googlesource.com/9403
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This technique lets us generate a single source file, use the C++
preprocessor, and avoid the pain of working with assemblers.
By using the section attribute or declspec allocate, we can put these
data arrays into the .text section, making them ordinary code.
This is like the previous solution, except it should actually run.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Test-Ubuntu-Clang-GCE-CPU-AVX2-x86_64-Debug,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug
Change-Id: Ide7675f6cf32eb4831ff02906acbdc3faaeaa684
Reviewed-on: https://skia-review.googlesource.com/9336
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows %rip addressing as long as it's not going into a data
section. This lets us use switch tables, avoiding loops and stack.
On HSW,
SkRasterPipeline_f16: 90 -> 63
SkRasterPipeline_srgb: 170 -> 97
Change-Id: I3ca2e4ff819b70beea78be75579f9d80c06979e8
Reviewed-on: https://skia-review.googlesource.com/9146
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today we use mad() to get FMAs where possible.
-ffp-contract=fast lets the compiler generate them if it spots an opportunity.
It looks like it's found a mix of FMAs and FMSs.
I will follow up by seeing if we can relax the use of mad().
Quick experiments say no, but less quick experiments may say otherwise.
Change-Id: I5228811cfbf11cccc0d715672a464fd1e1cea3b0
Reviewed-on: https://skia-review.googlesource.com/9136
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
| |
No generated code changes.
Change-Id: I2d480b5391f8246a01118766a9522d528a87f75a
Reviewed-on: https://skia-review.googlesource.com/9129
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly I think this will help me handle the AVX tails better.
But there are some wins here already, particularly in AVX and ARM code.
Change-Id: Ie79b4c2c4ab455277c313f15d360cbf8e4bb7836
Reviewed-on: https://skia-review.googlesource.com/9126
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Decimal byte encoding makes more horizontal space for comments,
which are the only thing you really want to read.
No code change here.
Change-Id: I674d78c898976063b0d89b747af41c62dc294303
Reviewed-on: https://skia-review.googlesource.com/8899
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AVX is a nice little halfway point between SSE4.1 and HSW, in terms
of instructions available, performance, and availability.
Intel chips have had AVX since ~2011, compared to ~2013 for HSW and
~2007 for SSE4.1. Like HSW it's got 8-wide 256-bit float vectors,
but integer (and double) operations are essentially still only 128-bit.
It also doesn't have F16 conversion or FMA instructions.
It doesn't look like this is going to be a burden to maintain, and only
adds a few KB of code size. In exchange, we now run 8x wide on 45% to
70% of x86 machines, depending on the OS.
In my brief testing, speed eerily resembles exact geometric progression:
SSE4.1: 1x speed (baseline)
AVX: ~sqrt(2)x speed
HSW: ~2x speed
This adds all the basic plumbing for AVX but leaves it disabled.
I'll flip it on once I've implemented the f16 TODOs.
Change-Id: I1c378dabb8a06386646371bf78ade9e9432b006f
Reviewed-on: https://skia-review.googlesource.com/8898
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Compile stages with -DWIN to pick up MS-specific start_pipeline().
- Add SkJumper_generated_win.S with MS-specific assembly.
- Add a minimal asm tool to our GN Windows toolchain.
The SkRasterPipeline_f16 benchmark run ~4x faster on my desktop.
Change-Id: Ia45afb4ecb6a055e2c0e43f0f54f59e081c23b7f
Reviewed-on: https://skia-review.googlesource.com/8778
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
| |
Change-Id: Ie356b062372af3516a437d27bafa20d98e28edd6
Reviewed-on: https://skia-review.googlesource.com/8678
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Will follow up with Linux, then Android aarch64 and armv7, then iOS, then Windows.
I took some opportunities to refactor.
CQ_INCLUDE_trybots=skia.primary:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug,Perf-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Debug
Change-Id: Ifcf1edabdfe5df0a91bd089f09523aba95cdf5ef
Reviewed-on: https://skia-review.googlesource.com/8611
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|
|
|
|
|
|
|
|
|
| |
No real change here.
Change-Id: I56449c292585038901d78902e6aeb68203e36351
Reviewed-on: https://skia-review.googlesource.com/8476
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
|
|
Change-Id: If9f73e712e429564fef58ccb838c212ec8d2e68c
Reviewed-on: https://skia-review.googlesource.com/8525
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
|