| Commit message (Collapse) | Author | Age |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rename the test to make it obvious that it is for testing the codegen
correctness in handling layout changing elementwise operations.
Keep the test only for the CPU backend.
PiperOrigin-RevId: 215630611
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215624875
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the layout of a single tensor in a tuple is different from its use, then
CreateCopyWithNewLayout will do a deep copy of the entire tuple. Not only does
this operation create unnecessary copies of elements where the layout is the
same, it will throw an error if the tuple contains elements like token[] that
cannot be copied. As a result, layout assignment on TPU occassionally causes
mysterious compilation failures for code that runs correctly on CPU and GPU.
PiperOrigin-RevId: 215615731
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215608349
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215580891
|
| |
| |
| |
| |
| |
| | |
Otherwise, when parsing a single instruction, the parsed module doesn't have a name, which won't pass the hlo verifier check.
PiperOrigin-RevId: 215519412
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215517752
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
layout so that it can be used by the HLO verifier.
Change the function to a static member function of the LayoutAssignment class.
Add an std::function member to LayoutAssignment to store the function object
passed down from the backend compiler class and use it to decide whether an
instruction can change layouts.
Fix affected test cases.
PiperOrigin-RevId: 215515611
|
| |
| |
| |
| |
| |
| | |
one function.
PiperOrigin-RevId: 215501702
|
| |
| |
| |
| |
| |
| | |
Also stop truncating operands in the canonical format.
PiperOrigin-RevId: 215466465
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
consistent HLO instructions.
Fix a dot test that disables layout assignment pass to not generate layout
inconsistent HLO instructions. This includes only adding the dot result to an
addend with the same layout, and disabling algebraic simplification which may
transform a dot to a multiplication with inconsistent layouts.
PiperOrigin-RevId: 215463477
|
| |
| |
| |
| | |
PiperOrigin-RevId: 215460064
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If one misreads the semantics of this instruction, it's easy to cause
an out of bounds access into the dimensions here. Add an extra check
to return a proper error to the user rather than crashing in that
case.
Ref #22130
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Many of the HLO parser tests verify that an text form of an HLO module preserves all information when running through ToString then parsing. It makes sense to also use these tests to exercise proto serialization/deserialization. This is done by adding additional instantiations of the parameterized parsing tests. This caught several bugs which are fixed in this CL:
(1) Domain instructions were not being serialized properly.
(2) Host send/recv instructions did not preserve the is_host_transfer bit.
(3) Sparse literals could not be serialized or deserialized.
PiperOrigin-RevId: 215445200
|
| |
| |
| |
| |
| |
| | |
A hash map for 18 pointers is just a waste of space.
PiperOrigin-RevId: 215428176
|
| |
| |
| |
| |
| |
| | |
Also convert unordered_map to flat/node_hash_map where the comments allow.
PiperOrigin-RevId: 215410566
|
| |
| |
| |
| |
| |
| |
| |
| | |
stateless_random_normal.
Fixes #22611
PiperOrigin-RevId: 215385610
|
| |
| |
| |
| |
| |
| | |
can use to accelerate transfers.
PiperOrigin-RevId: 215362667
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously we could have ended up with the different HLOs being assigned
different layouts what made lowering impossible. This change enforces a
consistent layout between the communicating nodes the same way it is
done for send&recv pairs.
PiperOrigin-RevId: 215359420
|
|\ \
| | |
| | |
| | | |
PiperOrigin-RevId: 215331087
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215324035
|
| | |
| | |
| | |
| | |
| | |
| | | |
EffectiveOperandPrecisionIsOutputPrecision list.
PiperOrigin-RevId: 215311766
|
| | |
| | |
| | |
| | |
| | |
| | | |
The previous version was hitting a very slow path in `GetNodeAttr()`, which is expensive when the named attr is not found. This change inlines the logic of finding the two relevant attrs inside `GetFunctionNameAttr()` and avoids constructing a status object with a serialized `NodeDef` when the attr can't be found.
PiperOrigin-RevId: 215298411
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215294817
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This cleanup will make the future CL implementing lazy compilation simpler.
Includes some supporting changes:
- Teach NewInternalScope to create a scope that doesn't do shape inference. We
need this because we don't have a ShapeRefiner that has been run over the
entire graph available in the build_xla_ops pass.
- Add a WithAssignedDevice modifier to tensorflow::Scope.
- Make cc_op_gen write out an Operation field for nodes which may not
necessarily have any outputs. We already did this in most cases, but we
weren't doing it for nodes that have possibly-empty list outputs.
- Minor change renaming ops/xla_jit_op.cc to ops/xla_jit_ops.cc, now that we
have more than one XLA JIT op.
PiperOrigin-RevId: 215293817
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215272497
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215252408
|
| | |
| | |
| | |
| | |
| | |
| | | |
requested device placement of the XlaLaunch op must be derived from the subgraph.
PiperOrigin-RevId: 215239672
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Where "X" is the parameter number. Previously, fusion parameter names including
the name of the original instruction which produced the value which was
confusing.
PiperOrigin-RevId: 215238171
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215183847
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Make parameter always defined before other instructions.
- Add extra indentations to the predecessor field in ToString() method to make it clear.
PiperOrigin-RevId: 215162840
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 215003704
|
| | |
| | |
| | |
| | |
| | |
| | | |
function.
PiperOrigin-RevId: 214945748
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214848216
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Even with this bug we were accidentally doing the right thing (so the test case
doesn't actually fail without the fix): deleting an Edge sets its input and
output indices to kControlSlot-1 so we'd normally expect to fail when there is a
control edge out of the TF cluster (because a control edge would be recognized
as a data edge). But AddEdge(x, -1, y, -1) seems to do the right thing for both
control and data edges.
PiperOrigin-RevId: 214831204
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214824023
|
| | |
| | |
| | |
| | |
| | |
| | | |
The intent of this field is to enable more information to be encoded in the custom call and passed through to the backend.
PiperOrigin-RevId: 214800539
|
| | |
| | |
| | |
| | |
| | |
| | | |
specified separately from the compute stream in ServiceRunOptions
PiperOrigin-RevId: 214778267
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214711381
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
functionalization.
If we want to evaluate SymbolicGradient op in constant folding, we need to construct Device object and attach it to FunctionLibraryRuntime. In graph rewriting pass, we do not have Device object created yet; it will only be created in XlaCompiler.
PiperOrigin-RevId: 214702943
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214700693
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214675055
|
| | |
| | |
| | |
| | |
| | |
| | | |
DeconstructTuple doesn't support nested tuples yet, so MakeFakeArgumentsOrDie failed if any of the arguments were tuple-shaped. But we don't really need it here anyway, just build the arguments one-by-one.
PiperOrigin-RevId: 214671374
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The purpose of these ops is to fix a latency problem observed for an inference benchmark. Often a inference step starts by reading the value of many (hundreds) of weights. For a resource variable, this requires a VarHandleOp and a ReadVariableOp per variable. Running hundreds of trivial ops can add hundreds of microseconds of latency to the critical path of an inference step. The inter-op latency of the executor can be hundreds of nanoseconds, which rapidly adds up.
This change introduces two fused ops _VarHandlesOp and _ReadVariablesOp that allow us to read many variables in a pair of larger ops, rather than many tiny ops.
PiperOrigin-RevId: 214662338
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It used to be a reasonable proxy, but that's no longer the case. This is because GetUniqueId() in XlaBuilder uses a *global* (rather than a module-global) counter. Since HloModule::CreateFromProto no-longer uniquifies ids coming in from protos, the potentially very high IDs coming from GetUniqueId() become the module's next_unique_id.
There is another case of this in TuplePointsTo, that will be handled separately.
PiperOrigin-RevId: 214614576
|
| | |
| | |
| | |
| | |
| | |
| | | |
data_format attr.
PiperOrigin-RevId: 214608039
|
| | |
| | |
| | |
| | |
| | |
| | | |
are assumed to be the same for HandleImag and HandleReal, when in fact they should be float and complex64 (or float for HandleReal's case), respectively.
PiperOrigin-RevId: 214548051
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214532043
|
| | |
| | |
| | |
| | |
| | |
| | | |
match convolution forward + relu.
PiperOrigin-RevId: 214521083
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 214515610
|