| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 216525613
|
|
|
|
|
|
|
|
|
|
| |
RemoveInstructionAndUnusedOperands
If the caller explicitly asks to remove a side effceting instruction
(e.g. all-reduce) then we should respect it instead of silently ignoring
the request.
PiperOrigin-RevId: 216505133
|
|
|
|
|
| |
absl::flat_hash_set have better performance than a std::unordered_set, which can improve overall compile time.
PiperOrigin-RevId: 216498767
|
|
|
|
|
|
| |
No support in any of the backends, and not yet exposed through XlaBuilder.
PiperOrigin-RevId: 216465753
|
|
|
|
|
|
| |
No functional change.
PiperOrigin-RevId: 216451881
|
|
|
|
|
|
|
| |
So that when resolving some global data, we don't have to worry whether
"Resolve" is going to mutate the real data.
PiperOrigin-RevId: 216448145
|
|
|
|
|
|
|
|
| |
We have a 1-element thunk sequence if we're not copying. That's still two
thunks and hlo profiling gets confused if it sees two thunks for the same
instruction and one of them claims to be the whole instruction.
PiperOrigin-RevId: 216448063
|
|
|
|
|
|
| |
from proto and verifying it with HloVerifier.
PiperOrigin-RevId: 216447947
|
|
|
|
|
|
| |
This avoids a copy.
PiperOrigin-RevId: 216437329
|
|
|
|
|
|
| |
Otherwise we'd emit a CAS loop.
PiperOrigin-RevId: 216421161
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simple has a kernel that runs on every element of the updates tensor,
figure out the right indices to perform the update, and applies it with an
atomic operation.
Currently we emit a CAS for plain (i.e. non-add) updates, which is inefficient.
Also TuplePointsToAnalysis doesn't know that it should alias the operand and
output buffers of a scatter, which would avoid a copy.
PiperOrigin-RevId: 216412467
|
|
|
|
| |
PiperOrigin-RevId: 216375421
|
|
|
|
|
|
|
|
|
|
| |
Previously we pre-reserverd the visit state based on the number of
instructions but then started to index it with the instruction unique ID
what can be larger then the instruction count. This resulted in some
very expensive re-allocations what can be eliminated by reserving the
correctly sized buffer.
PiperOrigin-RevId: 216369849
|
|
|
|
| |
PiperOrigin-RevId: 216350134
|
|
|
|
| |
PiperOrigin-RevId: 216315110
|
|
|
|
|
|
|
|
| |
- This CL intruduces input/output alias config in HLO module that allows any HLO pass to configure it. Once the alias_config is set, each backend needs to follow the contract during execution time to make sure the input and output are indeed aliased.
- Copy insertion / buffer assignment and alias analysis has been updated to correctly honor the config and avoid any possible liveness interference.
PiperOrigin-RevId: 216299501
|
|
|
|
| |
PiperOrigin-RevId: 216263039
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The calculation of a spatial coordinate in the kernel and activations is not
dependent on which part of the contracted dimension (input feature) we are in.
Rather than nesting the loops, the loops can be siblings:
- One loop over spatial dimensions
- One loop over the input feature group
This reduces the nesting depth which makes the code a little more readable and
might be slightly faster due work invariant in the spatial loop getting hoisted
out.
PiperOrigin-RevId: 216255839
|
|
|
|
| |
PiperOrigin-RevId: 216252980
|
|
|
|
|
|
| |
Add a variant of CustomCall which specifies arbitrary layout constraints on the operands and result. The existing non-layout-constrained CustomCall is changed to have no layout preference and can now be assigned arbitrary layouts by layout assignment.
PiperOrigin-RevId: 216249615
|
|\
| |
| |
| | |
PiperOrigin-RevId: 216228494
|
| |
| |
| |
| | |
PiperOrigin-RevId: 216189458
|
| |
| |
| |
| | |
PiperOrigin-RevId: 216041507
|
| |
| |
| |
| |
| |
| |
| | |
Heuristic NCHW/NHWC layout assignment works great; we've never had to flip this
flag. Might as well remove it and simplify things a bit.
PiperOrigin-RevId: 215989807
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
CudnnConvolutionAlgorithmPicker::PickBestAlgorithm.
Using a struct lets us return additional data -- namely, the elapsed time to
run the best algo -- without adding a fourth entry to the tuple, which would be
confusing.
No functional change.
PiperOrigin-RevId: 215987795
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215957327
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
preserve operand layouts.
Add an std::function member to the HloVerifier for a backend to specify the
function object used to determine whether an instruction can change layouts.
Use the function object to find out the non-layout-changing instructions and
check that such instructions should produce results with the same layouts as
its operands.
Add test cases.
PiperOrigin-RevId: 215941282
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215917470
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215905026
|
| |
| |
| |
| |
| |
| |
| |
| | |
This was completely broken for CUDA versions > 9 and resulted in spurious warnings.
Reported in #22706#issuecomment-426861394 -- thank you!
PiperOrigin-RevId: 215841354
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215801897
|
| |
| |
| |
| |
| |
| | |
avoid double-counting.
PiperOrigin-RevId: 215795640
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215794086
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215783391
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215764305
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215757701
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move these checks to RET_CHECKs in the HloVerifier. Added a new visitor class
InstructionVerifier inside of hlo_verifier.cc for handling these random
non-result-shape verifications.
PiperOrigin-RevId: 215745043
|
| |
| |
| |
| |
| |
| |
| |
| | |
This CL replaces a std::unordered_map with an absl::flat_hash_map and
removes an unnecessary map lookup. This two change can improve the
performance of the scheduler on large graphs by up to 2x.
PiperOrigin-RevId: 215707921
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215687800
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215681153
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215676675
|
| |
| |
| |
| |
| |
| | |
constants during compilation.
PiperOrigin-RevId: 215663002
|
| |
| |
| |
| |
| |
| |
| |
| | |
Subtract the size of the aliased buffers from the subcomputation estimate instead of from the current computation. This way, the memory estimate for the current computation is more accurate.
For the newly added test, the heap simulation calculates 48 bytes at head instead of the correct 64 bytes.
PiperOrigin-RevId: 215653047
|
| |
| |
| |
| |
| |
| |
| | |
Use #ifdef XLA_TEST_BACKEND_CPU to protect the test instead of disabling it for
all the other backends except for the CPU backend.
PiperOrigin-RevId: 215651036
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rename the test to make it obvious that it is for testing the codegen
correctness in handling layout changing elementwise operations.
Keep the test only for the CPU backend.
PiperOrigin-RevId: 215630611
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the layout of a single tensor in a tuple is different from its use, then
CreateCopyWithNewLayout will do a deep copy of the entire tuple. Not only does
this operation create unnecessary copies of elements where the layout is the
same, it will throw an error if the tuple contains elements like token[] that
cannot be copied. As a result, layout assignment on TPU occassionally causes
mysterious compilation failures for code that runs correctly on CPU and GPU.
PiperOrigin-RevId: 215615731
|
| |
| |
| |
| |
| |
| | |
Otherwise, when parsing a single instruction, the parsed module doesn't have a name, which won't pass the hlo verifier check.
PiperOrigin-RevId: 215519412
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215517752
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
layout so that it can be used by the HLO verifier.
Change the function to a static member function of the LayoutAssignment class.
Add an std::function member to LayoutAssignment to store the function object
passed down from the backend compiler class and use it to decide whether an
instruction can change layouts.
Fix affected test cases.
PiperOrigin-RevId: 215515611
|
| |
| |
| |
| |
| |
| | |
one function.
PiperOrigin-RevId: 215501702
|