| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 216525613
|
|
|
|
|
|
|
|
|
|
| |
RemoveInstructionAndUnusedOperands
If the caller explicitly asks to remove a side effceting instruction
(e.g. all-reduce) then we should respect it instead of silently ignoring
the request.
PiperOrigin-RevId: 216505133
|
|
|
|
|
| |
absl::flat_hash_set have better performance than a std::unordered_set, which can improve overall compile time.
PiperOrigin-RevId: 216498767
|
|
|
|
|
|
|
| |
Previosuly we emitted xla::Add what isn't supported by some XLA backend
on PRED types.
PiperOrigin-RevId: 216497939
|
|
|
|
|
|
| |
No support in any of the backends, and not yet exposed through XlaBuilder.
PiperOrigin-RevId: 216465753
|
|
|
|
|
|
| |
No functional change.
PiperOrigin-RevId: 216451881
|
|
|
|
|
|
|
| |
So that when resolving some global data, we don't have to worry whether
"Resolve" is going to mutate the real data.
PiperOrigin-RevId: 216448145
|
|
|
|
|
|
|
|
| |
We have a 1-element thunk sequence if we're not copying. That's still two
thunks and hlo profiling gets confused if it sees two thunks for the same
instruction and one of them claims to be the whole instruction.
PiperOrigin-RevId: 216448063
|
|
|
|
|
|
| |
from proto and verifying it with HloVerifier.
PiperOrigin-RevId: 216447947
|
|
|
|
|
|
| |
This avoids a copy.
PiperOrigin-RevId: 216437329
|
|
|
|
|
|
| |
Otherwise we'd emit a CAS loop.
PiperOrigin-RevId: 216421161
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simple has a kernel that runs on every element of the updates tensor,
figure out the right indices to perform the update, and applies it with an
atomic operation.
Currently we emit a CAS for plain (i.e. non-add) updates, which is inefficient.
Also TuplePointsToAnalysis doesn't know that it should alias the operand and
output buffers of a scatter, which would avoid a copy.
PiperOrigin-RevId: 216412467
|
|
|
|
| |
PiperOrigin-RevId: 216375421
|
|
|
|
|
|
|
|
|
|
| |
Previously we pre-reserverd the visit state based on the number of
instructions but then started to index it with the instruction unique ID
what can be larger then the instruction count. This resulted in some
very expensive re-allocations what can be eliminated by reserving the
correctly sized buffer.
PiperOrigin-RevId: 216369849
|
|
|
|
| |
PiperOrigin-RevId: 216350134
|
|
|
|
| |
PiperOrigin-RevId: 216315110
|
|
|
|
|
|
|
|
| |
- This CL intruduces input/output alias config in HLO module that allows any HLO pass to configure it. Once the alias_config is set, each backend needs to follow the contract during execution time to make sure the input and output are indeed aliased.
- Copy insertion / buffer assignment and alias analysis has been updated to correctly honor the config and avoid any possible liveness interference.
PiperOrigin-RevId: 216299501
|
|
|
|
| |
PiperOrigin-RevId: 216263039
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The calculation of a spatial coordinate in the kernel and activations is not
dependent on which part of the contracted dimension (input feature) we are in.
Rather than nesting the loops, the loops can be siblings:
- One loop over spatial dimensions
- One loop over the input feature group
This reduces the nesting depth which makes the code a little more readable and
might be slightly faster due work invariant in the spatial loop getting hoisted
out.
PiperOrigin-RevId: 216255839
|
|
|
|
| |
PiperOrigin-RevId: 216252980
|
|
|
|
|
|
| |
Add a variant of CustomCall which specifies arbitrary layout constraints on the operands and result. The existing non-layout-constrained CustomCall is changed to have no layout preference and can now be assigned arbitrary layouts by layout assignment.
PiperOrigin-RevId: 216249615
|
|\
| |
| |
| | |
PiperOrigin-RevId: 216228494
|
| |
| |
| |
| | |
PiperOrigin-RevId: 216189458
|
| |
| |
| |
| | |
PiperOrigin-RevId: 216041507
|
| |
| |
| |
| |
| |
| |
| | |
Heuristic NCHW/NHWC layout assignment works great; we've never had to flip this
flag. Might as well remove it and simplify things a bit.
PiperOrigin-RevId: 215989807
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CudnnConvolutionAlgorithmPicker::PickBestAlgorithm.
Using a struct lets us return additional data -- namely, the elapsed time to
run the best algo -- without adding a fourth entry to the tuple, which would be
confusing.
No functional change.
PiperOrigin-RevId: 215987795
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215957327
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215946205
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
preserve operand layouts.
Add an std::function member to the HloVerifier for a backend to specify the
function object used to determine whether an instruction can change layouts.
Use the function object to find out the non-layout-changing instructions and
check that such instructions should produce results with the same layouts as
its operands.
Add test cases.
PiperOrigin-RevId: 215941282
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215917470
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
library.
Change XlaBinaryOp::Broadcast to use the BroadcastTo lowering, since it produces fewer extraneous reshapes and transposes. Even if the reshapes and transposes would later optimize away, this yields more readable output and makes life easier for HLO rewrites that run early.
Change in preparation for removing reshapes from SoftmaxCrossEntropyWithLogits.
PiperOrigin-RevId: 215906847
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215905026
|
| |
| |
| |
| |
| |
| |
| |
| | |
This was completely broken for CUDA versions > 9 and resulted in spurious warnings.
Reported in #22706#issuecomment-426861394 -- thank you!
PiperOrigin-RevId: 215841354
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215819072
|
| |
| |
| |
| |
| |
| |
| |
| | |
* Add kernels for TensorListReserve. EmptyTensorList, TensorListElementShape, TensorListPushBack, TensorlistPopBack;
* Treat list type pretty much identical to Stack in the bridge for now;
* Support variant output by treating variant like a uint8 and leaving the interpretation up to the XlaExpression (variant type does not support tensor_data());
PiperOrigin-RevId: 215809335
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215801897
|
| |
| |
| |
| |
| |
| | |
avoid double-counting.
PiperOrigin-RevId: 215795640
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215795518
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215794086
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215783391
|
| |
| |
| |
| |
| |
| | |
Switch or Merge node.".
PiperOrigin-RevId: 215772272
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215764305
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead return a friendlier failed Status from the following two methods which
used to CHECK-fail before: GetIncomingPreds, FindUniqueBackedge.
While at it, also rename GetIncomingPreds to GetInputPreds to be consistent with
the variable names.
PiperOrigin-RevId: 215758757
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215757701
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move these checks to RET_CHECKs in the HloVerifier. Added a new visitor class
InstructionVerifier inside of hlo_verifier.cc for handling these random
non-result-shape verifications.
PiperOrigin-RevId: 215745043
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215724324
|
| |
| |
| |
| |
| |
| |
| |
| | |
Also clear "_kernel" attributes of nodes if they are set to "host".
This is not meaningful when processing the graph for XLA, and it
would prevent finding the registered XLA kernel.
PiperOrigin-RevId: 215722216
|
| |
| |
| |
| |
| |
| |
| |
| | |
This CL replaces a std::unordered_map with an absl::flat_hash_map and
removes an unnecessary map lookup. This two change can improve the
performance of the scheduler on large graphs by up to 2x.
PiperOrigin-RevId: 215707921
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215687800
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215681153
|