| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
RemoveInstructionAndUnusedOperands
If the caller explicitly asks to remove a side effceting instruction
(e.g. all-reduce) then we should respect it instead of silently ignoring
the request.
PiperOrigin-RevId: 216505133
|
|
|
|
| |
PiperOrigin-RevId: 215460064
|
|
|
|
| |
PiperOrigin-RevId: 215324035
|
|
|
|
| |
PiperOrigin-RevId: 215272497
|
|
|
|
|
|
|
|
| |
Where "X" is the parameter number. Previously, fusion parameter names including
the name of the original instruction which produced the value which was
confusing.
PiperOrigin-RevId: 215238171
|
|
|
|
|
|
|
| |
* Use a FlatMap for instruction_iterators_, and actually remove elements from it (which is cheap for a FlatMap).
* Use the size of the map (which is O(1)) rather than the size of the list (which is O(n)) for instruction_count().
PiperOrigin-RevId: 214459259
|
|
|
|
| |
PiperOrigin-RevId: 213316504
|
|
|
|
| |
PiperOrigin-RevId: 213191899
|
|
|
|
|
|
| |
Re-assigning unique IDs broke serialization of HloSchedule, and keeping IDs stable improves the fidelity of the proto serialization. This change requires that instructions in HLO module protos have valid, module-scope-unique ids so change the XLA builder to hand out module-scope-unique ids. Previously, instruction ids were only unique in the computation scope.
PiperOrigin-RevId: 212692339
|
|
|
|
|
|
|
|
| |
Add HloSchedule as a field on HloModule. This will enable scheduling to be a normal HLO pass and enable some passes such as copy insertion to more easily use tighter instruction live ranges based on the schedule. This change required adding HloSchedule to the "hlo" library because of circular dependencies.
Nothing except for tests actually sets the schedule at the moment, but follow up cls will add a scheduling pass which will do so.
PiperOrigin-RevId: 211815293
|
|
|
|
| |
PiperOrigin-RevId: 210998142
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it used an std::map containing std::vector's what added a
large overhead to HloComputation::MakeInstructionPostOrder when a model
contained a large number of channels.
The new implementation replaced it with a FlatMap and an InlineVector
what eliminates a large number of allocations and improves perfromance
by a lot.
PiperOrigin-RevId: 210531816
|
|
|
|
|
|
|
|
| |
Unlike Printf, StrFormat does not require type-length qualifiers, e.g
%z, %ll. Nor does it require that you call c_str() to print strings.
So these are fixed up here as well.
PiperOrigin-RevId: 210435915
|
|
|
|
|
|
| |
Also move 'using' statements into namespaces.
PiperOrigin-RevId: 210055083
|
|
|
|
| |
PiperOrigin-RevId: 210049592
|
|
|
|
| |
PiperOrigin-RevId: 210018843
|
|
|
|
|
|
|
| |
Unfortunately this has to be one big patch, because e.g. absl::StrCat
doesn't accept a TF StringPiece, but as soon as we switch to
absl::string_view, we have to switch away from all of the TF functions.
PiperOrigin-RevId: 209957896
|
|
|
|
| |
PiperOrigin-RevId: 209819199
|
|
|
|
|
|
|
|
|
|
|
|
| |
Send&recv instructions and cross-replica-sum instructions are imposing
extra dependencies via the channel id or all reduce id. This CL teaches
the reachability calculation logic in hlo computation to correctly
account for these "invisible" dependencies.
The main purpose is to stop multi output fusion from generating
dependency cyclies via communicating instructions.
PiperOrigin-RevId: 209593997
|
|
|
|
|
|
| |
Same for WrapUnique.
PiperOrigin-RevId: 209531124
|
|
|
|
| |
PiperOrigin-RevId: 209502513
|
|
|
|
| |
PiperOrigin-RevId: 209248552
|
|
|
|
| |
PiperOrigin-RevId: 209247783
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now we give these parameters an additional allocation and then don't use
it.
This is interesting for the CPU backend because we use call instructions to
represent fork-join parallelism (i.e. a specially annotated kCall instruction
tells the CPU IR emitter to shard the called computation across CPU threads).
Moreover, I need this for a principled fix to b/111116907.
PiperOrigin-RevId: 204820965
|
|
|
|
|
|
|
|
| |
The while loop input and output alias each other, so as long as an input is also used by other ops that could not use BF16, the propagation pass could not change such an input/ouput to BF16 even if all uses in the while loop could use BF16. Add copies for each while loop operand. This increases the chance to propagate BF16 through the while loop; if some of these copies do not help, they will remain same-shape copies and be removed at the end.
This can sometimes increase HBM usage because both BF16 and F32 copies are alive, and can sometimes reduce HBM usage.
PiperOrigin-RevId: 203848348
|
|
|
|
|
|
|
|
|
| |
We have two versions of HloReachabilityMap::SetReachabilityToUnion where
one of them is slightly more efficient by not returning if the
reachability have been changed or not. This change migrates the users
not caring about the return value to the faster variant.
PiperOrigin-RevId: 203256625
|
|
|
|
|
|
| |
This is a follow up to cl/202069017 which added tokens as operands to Send and Recv.
PiperOrigin-RevId: 203145403
|
|
|
|
| |
PiperOrigin-RevId: 202049336
|
|
|
|
|
|
|
|
|
| |
Cycles were not handled correctly when computing the postorder of an HLO computation.
Add methods to multioutput fusion that allows subclasses to recompute and query the
current reachability map.
PiperOrigin-RevId: 201274181
|
|
|
|
|
|
|
| |
std::list is just hilariously inefficient and the postorder list creation has
been rewritten not to not depend on splicing anymore so there's no need for the
list. While there remove the old unused postorder list creation code.
PiperOrigin-RevId: 200743677
|
|
|
|
|
|
|
|
| |
TOKENs will be used for ordering side-effecting operations. They are not materialized but can be contained in tuples and flow into and out of computations. This CL adds a trivial representation for the cpu and gpu backends to support TOKENs and modifies copy insertion to avoid making copies of tokens.
This also adds a Literal TOKEN which is required for the interpreter backend.
PiperOrigin-RevId: 200623120
|
|
|
|
| |
PiperOrigin-RevId: 200555862
|
|
|
|
|
|
|
|
|
|
|
|
| |
A TOKEN primitive type was added with cl/199215963 and XLA also has an OPAQUE primitive type. However, in many places in XLA we assume either a tuple or array. This CL fixes many of those instances, but some may remain. Identified instances were discovered by searching for IsTuple or IsArray so the set of fixes is not exhaustive.
Also opportunistically addressed a couple potential points of confusion in the ShapeUtil interface:
(1) Rename ShapeUtil::HasZeroElements to ShapeUtil::IsZeroElementArray. The point of confusion here is that tuples can also have zero elements and HasZeroElements would check fail on tuple shapes. Method no longer check fails if the given shape is not an array.
(2) ShapeUtil::IsNil now returns true only for empty tuples. Previously it also returned true for zero-element array types which was confusing because ShapeUtil::MakeNil creates an empty tuple.
PiperOrigin-RevId: 200452672
|
|
|
|
| |
PiperOrigin-RevId: 200189642
|
|
|
|
| |
PiperOrigin-RevId: 200110003
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it used the same infrastructure as HloInstruction::Accept
what caused a high overhead for large models due to the excess amount of
work it have to do to support modifying the graph under iteration and due
to the lack of caching on graphs with multiple sinks.
The new code is a very simple implementation of an iterative DFS based
topological sort.
PiperOrigin-RevId: 199606688
|
|
|
|
|
|
|
|
| |
instructions with similar attributes (ie, sharding).
This CL simply adds the infrastructure, but leaves the wire-on to a separate CL.
PiperOrigin-RevId: 198503625
|
|
|
|
|
|
| |
and HloComputation.
PiperOrigin-RevId: 196334340
|
|
|
|
|
|
|
|
|
|
| |
creating from proto.
Also, delete the HloModule parameter HloInstruction::CreateFromProto, it's not used anywhere.
Also, in ToProto, set sharding to proto if there is sharding.
PiperOrigin-RevId: 196049173
|
|
|
|
|
|
| |
intentionally not exposed in ComputationBuilder and is not intended for use or to be set at all prior to the last backend-specific part of compilation.
PiperOrigin-RevId: 195493500
|
|
|
|
|
|
|
|
| |
Pre process the input module to reassign reserved devices (like the host compute one) to new sequentially increasing device numbers, and track those in the GlobalState.
This avoids having many places where we need to spread the is-special-device logic, within the HLO partitioner and its related components.
Added handling for kCall, which was missing from previous implementation.
PiperOrigin-RevId: 191601831
|
|
|
|
| |
PiperOrigin-RevId: 189831057
|
|
|
|
|
|
| |
HloInstructionProto.
PiperOrigin-RevId: 189811729
|
|
|
|
| |
PiperOrigin-RevId: 188100425
|
|
|
|
| |
PiperOrigin-RevId: 186038783
|
|
|
|
| |
PiperOrigin-RevId: 185623948
|
|
|
|
| |
PiperOrigin-RevId: 185598764
|
|
|
|
| |
PiperOrigin-RevId: 184239740
|
|
|
|
| |
PiperOrigin-RevId: 179138523
|
|
|
|
| |
PiperOrigin-RevId: 179132435
|