| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
The old code ensured that failed sub-streams would not be re-used, but
had two flaws:
1) It only checked for failed sub-streams during Return.
2) It didn't actually remove the failed sub-streams from our state.
The new code fixes these two flaws, and adds an extra test that
explains why (1) is insufficient.
PiperOrigin-RevId: 207333296
|
|
|
|
|
|
| |
RELNOTES: n/a
PiperOrigin-RevId: 207333246
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a follow-on cleanup for cl/206980796 ("Overhaul XLA:CPU's calling
convention.") I want to introduce a BufferInfo class that encapsulates whether a
buffer is a constant, an entry parameter or a temp without using the fragile
"size < 0" scheme I have today. To do this efficiently I need a place to put
the BufferInfo class that will be visible to MallocContiguousBuffers. Instead
of creating (what seemed to me) an odd layering with BufferInfo in aot/runtime.h
I decided to pull in the runtime into xla_compiled_cpu_function since that's the
only user.
PiperOrigin-RevId: 207333245
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like:
5624727 cycles (100.% 100?) :: 3865.8 usec [...] TOTAL
2121832 cycles (37.72% 38?) :: 1458.3 usec
1932379 cycles (34.36% 72?) :: 1328.1 usec
264366 cycles ( 4.70% 77?) :: 181.7 usec
The first line with the total is a little wird, but I figured it was
better to do it this way than to waste a precious character of
horizontal space.
I also considered rendering it as e.g. "?38%". This is slightly more
expressive, but it gets hard to read pretty fast with two characters
smushed against both of the numbers.
I put the sigma at the end because I find it easier to read: With the
sigma at the beginning, its tips often blend in with the first number;
e.g. I find "?77" less readable than "77?".
Similarly I considered displaying more than two significant figures in
the percent, but since it's cumulative *anyway*, I didn't think these
were relevant.
This formatting is somewhat inconsistent with how we do the categories
tables:
258 ( 6.68% ?87.81%) non-fusion elementwise (12 ops)
I can change these to match if we want, but I sort of think of them as a
different case. The categories tables have a lot more whitespace in
between entries (namely, one line per instruction in the category), so
noisiness is not nearly as significant a concern.
PiperOrigin-RevId: 207329731
|
|\
| |
| |
| | |
PiperOrigin-RevId: 207329479
|
| |
| |
| |
| | |
PiperOrigin-RevId: 207326276
|
|\ \
| | |
| | |
| | | |
PiperOrigin-RevId: 207325536
|
|\ \ \
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 207325529
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 207325109
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of
********** microseconds above estimated optimum report **********
[...]
********** categories table **********
The left hand side numbers are microseconds above estimated optimum.
[...]
we now print
********** microseconds above estimated optimum report **********
[...]
********** categories table for microseconds above estimated optimum **********
[...]
which I think is more explicit and harder to misread.
PiperOrigin-RevId: 207325046
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 207323298
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 207320100
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
PiperOrigin-RevId: 207319780
|
| | | | |
| | | | |
| | | | |
| | | | | |
PiperOrigin-RevId: 207319608
|
| | | | |
| | | | |
| | | | |
| | | | | |
PiperOrigin-RevId: 207317857
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
I choose round-half-to-even, which matches
cudnnConvolutionBiasActivationForward and cudnnTransformTensor. I can add an
attribute for the rounding mode in the future if necessary.
PiperOrigin-RevId: 207316630
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
- Fixes divide by zero error when all batch weights are 0.
- Unifies the logic between the existing keras metrics and the new metrics module.
- This change is not backward compatible (since logic is different).
PiperOrigin-RevId: 207311700
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Any time that the server def is updated, the context is effectively "reset" by clearing all the caches.
- Check that the FLR returned is not a nullptr instead of seg faulting.
- Consolidate caches within the context object.
PiperOrigin-RevId: 207308086
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207306967
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
ops only).
PiperOrigin-RevId: 207306198
|
| | | | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Do not add placeholders to the function body as XLA cannot compile them.
PiperOrigin-RevId: 207299427
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
implementation
Also increase test coverage for C64 a bit.
PiperOrigin-RevId: 207297946
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
--xla_hlo_profile is enabled.
I often find myself searching for the last profiling run in a
ReplayComputation log. This makes it much easier to find.
PiperOrigin-RevId: 207294644
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207294037
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
I don't think adding whitebox tests is necessary for the code that's checked in
today, but I'm working on a CL for which I'd prefer writing whitebox tests.
Also fix a minor issue with SymbolPredicate::ToString() where we were dropping
the must_be_true() bit.
PiperOrigin-RevId: 207289695
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207289283
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
It's unfortunate that this was only added in 9.1, but I haven't found a good
way of emulating the behavior on 9.0 without falling back to non-batched gemms.
PiperOrigin-RevId: 207286575
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.
For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.
PiperOrigin-RevId: 207285707
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207284323
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207283527
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 207282495
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207278109
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207268708
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
addition to loop fusions.
PiperOrigin-RevId: 207253181
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This gives a huge speedup for users of batchdot. This is a minimal implementation without autotuning and without support for strided batch gemm.
PiperOrigin-RevId: 207247740
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207238096
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207215423
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207215039
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207213865
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207210333
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
* Add link to updating scope on a running VM
* Add code formatting and Python syntax highlighting
* Clarify kwargs argument formatting
* Fix method name in docstring
PiperOrigin-RevId: 207204628
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This became unnecessary with cl/206243319 "Implement constant buffer allocation
for XLA:GPU".
PiperOrigin-RevId: 207204478
|
| | | | | | | |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
non-public TF APIs
PiperOrigin-RevId: 207197647
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 207195679
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
to record latency on each edge of dataset input pipeline.
PiperOrigin-RevId: 207190025
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
For logging copies, we can set the device_policy to DEVICE_PLACEMENT_WARN
PiperOrigin-RevId: 207186848
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
signal group_size_tensor_ready_ immediately, without initialization.
PiperOrigin-RevId: 207184621
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PiperOrigin-RevId: 207183550
|