| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 215324035
|
|
|
|
| |
PiperOrigin-RevId: 215272497
|
|
|
|
|
|
|
| |
* Use a FlatMap for instruction_iterators_, and actually remove elements from it (which is cheap for a FlatMap).
* Use the size of the map (which is O(1)) rather than the size of the list (which is O(n)) for instruction_count().
PiperOrigin-RevId: 214459259
|
|
|
|
| |
PiperOrigin-RevId: 213316504
|
|
|
|
| |
PiperOrigin-RevId: 213191899
|
|
|
|
|
|
|
|
| |
Add HloSchedule as a field on HloModule. This will enable scheduling to be a normal HLO pass and enable some passes such as copy insertion to more easily use tighter instruction live ranges based on the schedule. This change required adding HloSchedule to the "hlo" library because of circular dependencies.
Nothing except for tests actually sets the schedule at the moment, but follow up cls will add a scheduling pass which will do so.
PiperOrigin-RevId: 211815293
|
|
|
|
|
|
| |
dependencies as well.
PiperOrigin-RevId: 211038094
|
|
|
|
| |
PiperOrigin-RevId: 210998142
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it used an std::map containing std::vector's what added a
large overhead to HloComputation::MakeInstructionPostOrder when a model
contained a large number of channels.
The new implementation replaced it with a FlatMap and an InlineVector
what eliminates a large number of allocations and improves perfromance
by a lot.
PiperOrigin-RevId: 210531816
|
|
|
|
|
|
|
| |
Unfortunately this has to be one big patch, because e.g. absl::StrCat
doesn't accept a TF StringPiece, but as soon as we switch to
absl::string_view, we have to switch away from all of the TF functions.
PiperOrigin-RevId: 209957896
|
|
|
|
|
|
|
|
|
|
|
|
| |
Send&recv instructions and cross-replica-sum instructions are imposing
extra dependencies via the channel id or all reduce id. This CL teaches
the reachability calculation logic in hlo computation to correctly
account for these "invisible" dependencies.
The main purpose is to stop multi output fusion from generating
dependency cyclies via communicating instructions.
PiperOrigin-RevId: 209593997
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now we give these parameters an additional allocation and then don't use
it.
This is interesting for the CPU backend because we use call instructions to
represent fork-join parallelism (i.e. a specially annotated kCall instruction
tells the CPU IR emitter to shard the called computation across CPU threads).
Moreover, I need this for a principled fix to b/111116907.
PiperOrigin-RevId: 204820965
|
|
|
|
|
|
|
|
| |
The while loop input and output alias each other, so as long as an input is also used by other ops that could not use BF16, the propagation pass could not change such an input/ouput to BF16 even if all uses in the while loop could use BF16. Add copies for each while loop operand. This increases the chance to propagate BF16 through the while loop; if some of these copies do not help, they will remain same-shape copies and be removed at the end.
This can sometimes increase HBM usage because both BF16 and F32 copies are alive, and can sometimes reduce HBM usage.
PiperOrigin-RevId: 203848348
|
|
|
|
| |
PiperOrigin-RevId: 202049336
|
|
|
|
|
|
|
| |
std::list is just hilariously inefficient and the postorder list creation has
been rewritten not to not depend on splicing anymore so there's no need for the
list. While there remove the old unused postorder list creation code.
PiperOrigin-RevId: 200743677
|
|
|
|
|
|
|
|
| |
instructions with similar attributes (ie, sharding).
This CL simply adds the infrastructure, but leaves the wire-on to a separate CL.
PiperOrigin-RevId: 198503625
|
|
|
|
| |
PiperOrigin-RevId: 196915982
|
|
|
|
|
|
|
|
|
|
| |
creating from proto.
Also, delete the HloModule parameter HloInstruction::CreateFromProto, it's not used anywhere.
Also, in ToProto, set sharding to proto if there is sharding.
PiperOrigin-RevId: 196049173
|
|
|
|
|
|
| |
intentionally not exposed in ComputationBuilder and is not intended for use or to be set at all prior to the last backend-specific part of compilation.
PiperOrigin-RevId: 195493500
|
|
|
|
| |
PiperOrigin-RevId: 189831057
|
|
|
|
|
|
| |
HloInstructionProto.
PiperOrigin-RevId: 189811729
|
|
|
|
| |
PiperOrigin-RevId: 188100425
|
|
|
|
| |
PiperOrigin-RevId: 186038783
|
|
|
|
| |
PiperOrigin-RevId: 185623948
|
|
|
|
| |
PiperOrigin-RevId: 185598764
|
|
|
|
| |
PiperOrigin-RevId: 184239740
|
|
|
|
|
|
|
|
|
|
|
| |
a default param.
No functional change.
The motivation for this is that GDB ignores default params, but resolves
overloads just fine.
PiperOrigin-RevId: 179125588
|
|
|
|
| |
PiperOrigin-RevId: 179079727
|
|
|
|
|
|
| |
and moved the condition to the hlo_dce pass.
PiperOrigin-RevId: 177215395
|
|
|
|
|
|
| |
names. The names are already unique and uniquifying them again will mutate them resulting in inconsistent names between the proto and the constructed HLO.
PiperOrigin-RevId: 176035108
|
|
|
|
|
|
|
|
|
| |
Also,
- Add a HloInstruction::CreateFusion interface that creates a fusion instruction with given fusion computation. Add a HloComputation::SetFusionInstruction interface to help do that.
- Change how we print fusion kind. Before this change we print fusion kind together with the opcode, e.g., fusion:kLoop, which is not easy to parse. Now we append fusion kind as an attribute.
- Print fusion computation the same way as other computations, instead of nested in an instruction.
PiperOrigin-RevId: 175621768
|
|
|
|
|
|
|
|
|
|
|
| |
internal model.
END_PUBLIC
BEGIN_PUBLIC
Automated g4 rollback of changelist 174423881
PiperOrigin-RevId: 174505237
|
|
|
|
|
|
| |
Add missing const overloads of Accept methods.
PiperOrigin-RevId: 174500495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(number of copies inserted) is roughly similar to the existing implementation, but the new implementation is much more general. The new implementation can handle entry argument buffer reuse with minimal modification, for example.
Some unnecessary copies are still added due to deficiencies in buffer assignment (b/62548313), but these can be removed when buffer assignment also uses HloAliasAnalysis.
Also address a few issues uncovered with this cl:
(1) For inplace dynamic slice in llvm backends, truncate do not wrap the slice. This matches the behavior of the non-inplace variant.
(2) Disable SelectBetweenPredTuples test on GPU. The test introduces top-level buffer ambiguity which is not tolerated by the gpu backend.
(3) When deserializing HLO form a proto, do not uniquify instruction names in fused computations.
(4) In dataflow analysis, don't deallocate deleted HloValues during propagation.
(5) In dataflow analysis, fix issue with live_out_of_computation property.
PiperOrigin-RevId: 174423881
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically, if a while loop has tuple element that
- is not used by the while condition, and
- is not used by the while body, except to pass it along to the next
iteration of the loop,
then we can reshape the while loop's computations to eliminate this
tuple element.
PiperOrigin-RevId: 174413683
|
|
|
|
|
|
| |
Use it in the HLO cost analysis pass.
PiperOrigin-RevId: 174411043
|
|
|
|
|
|
| |
cloned instruction and computations to still have live link to their parent original modules and computations.
PiperOrigin-RevId: 174271432
|
|
|
|
| |
PiperOrigin-RevId: 174257660
|
|
|
|
|
|
| |
Don't uniquify names when creating and HLO module from a proto. This preserves instruction names across serialization/deserialization.
PiperOrigin-RevId: 173498734
|
|
|
|
|
|
|
|
| |
be serialized to HLO protos and deserialized without any information loss.
As part of this change, a bug is fixed in NameUniquer. Previously, passing names with numeric suffixes could result in name collisions.
PiperOrigin-RevId: 172161360
|
|
|
|
|
|
|
|
| |
Currently it returns a view of unique_ptr<HloInstruction>s. But the
fact that these are unique_ptrs is an implementation detail, and it's
ugly to leak it everywhere.
PiperOrigin-RevId: 170445375
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HloInstruction::ReplaceAllUsesWith.
RAUW used to be *almost* synonymous with RUOI, except RAUW didn't update
the computation's root. This was a dangerous footgun -- if you
accidentally called RAUW when you wanted RUOI (which you almost always
did), your code would work perfectly, except when the relevant node
happened to be the root of a computation.
This change simplifies our APIs so there's just one Right Way To Do It,
by making RAUW update the computation.
PiperOrigin-RevId: 170290230
|
|
|
|
|
|
|
|
|
|
|
|
| |
HloInstruction.
Presently, each instruction inside a fusion computation contains a pointer to the fusion instruction that contains the computation, which is redundant since this is common across the entire computation. This leads to lots of places where this pointer must be set when adding an instruction to the fusion computation (and bugs such as b/65177535 when one is missed), as well as code to check that it's set correctly. In addition, this is simply unnecessary data bloat.
Moreover, the computation itself does not contain a pointer to the fusion instruction that references it, which leads to odd circumlocutions in the HloComputation code that retrieve the fusion instruction from the computation's root instruction.
Thus, this CL moves this pointer into the HloComputation class (replacing the is_fusion_computation_ bool value), and refactor the uses as necessary.
PiperOrigin-RevId: 167039280
|
|
|
|
|
|
|
| |
certain indices. Also, add mechanism for returning the kCopy instructions
added to create the deep copy.
PiperOrigin-RevId: 166521917
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
END_PUBLIC
---
Commit b30ce4714 authored by James Qin<jamesqin@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Revamp CudnnRNN Saveables
1. Use a lossy way to save/restore cudnn biases during checkpointing.
Cudnn uses 2 biases each gate for all RNNs while tf uses one. To allow cudnn checkpoints
to be compatible with both Cudnn and platform-independent impls, previously both
individual bias and summed biases each gate were stored.
The new way only stores the bias sum for each gate, and split it half-half when
restoring from a cudnn graph. Doing this does not cause problems since RNNs do not use
weight-decay to regularize.
2. Use inheritance instead of branching
* Split RNNParamsSaveable to 1 base class and 4 subclasses.
* Extract common routines and only overwrite rnn-type-specific pieces in subclasses.
PiperOrigin-RevId: 166413989
---
Commit ebc421daf authored by Alan Yee<alyee@ucsd.edu>
Committed by Jonathan Hseu<vomjom@vomjom.net>:
Update documentation for contrib (#12424)
* Update __init__.py
Remove ## for standardization of api docs
* Create README.md
Add README to define this directory's purpose
* Update __init.py
Markdown styling does not show up well in api docs
* Update README.md
Add short mention of describing what to deprecate
* Update README.md
Capitalize title
* Update README.md
Revert README change
* Delete README.md
---
Commit fd295394d authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Use latest version of nsync library, which now allows use of cmake on MacOS.
PiperOrigin-RevId: 166411437
---
Commit 587d728e0 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Refactor reduce-precision-insertion filters, add several more options.
In particular, this adds the ability to add reduce-precision operations after fusion nodes based on the contents of those fusion nodes, and the ability to filter operations based on the "op_name" metadata.
PiperOrigin-RevId: 166408392
---
Commit 3142f8ef5 authored by Ali Yahya<alive@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Steps toward making ResourceVariables compatible with Eager.
This change forces the value of the reuse flag in variable scopes to be tf.AUTO_REUSE when in Eager mode.
This change also adds comprehensive Eager tests for ResourceVariable.
PiperOrigin-RevId: 166408161
---
Commit b2ce45150 authored by Igor Ganichev<iga@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make Graph::IsValidNode public
It can be reimplemented with existing public APIs, but instead of doing so,
making this one public seems better.
PiperOrigin-RevId: 166407897
---
Commit 0a2f40e92 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA::CPU] Fix HLO profiling in parallel CPU backend.
PiperOrigin-RevId: 166400211
---
Commit c4a58e3fd authored by Yao Zhang<yaozhang@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Identify frame ids for all nodes in a graph.
PiperOrigin-RevId: 166397615
---
Commit 989713f26 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
BEGIN_PUBLIC
Automated g4 rollback of changelist 166294015
PiperOrigin-RevId: 166521502
|
|
|
|
|
|
|
| |
certain indices. Also, add mechanism for returning the kCopy instructions
added to create the deep copy.
PiperOrigin-RevId: 166521917
|
|
|
|
| |
PiperOrigin-RevId: 164920220
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updating is possible if operands/uses or computation roots change in
the graph. Updating is not possible if instructions are deleted or if
new instructions are added.
Specific changes:
* Add verification methods for asserting invariants and checking the
analysis after updating.
* Always add phi values at while instructions. Previously these were
added only if the phi had different inputs. The advantage of using
phi's unconditionally is that the set of values is fixed for a
module. Updates due to changing operands/uses in the graph do not
create new values.
* Store values in a vector rather than a map. With unconditional phi
values, the number of HloValues is fixed so the values can be held
in a vector with stable references to elements.
PiperOrigin-RevId: 164778750
|
|
|
|
|
|
|
| |
removable from a computation. This is to prevent DCE from removing a
while instruction that includes a send/recv instruction.
PiperOrigin-RevId: 164722478
|
|
|
|
|
|
| |
As part of the CL, change the underlying representation in the reachability map to BitVectors which allows efficient update by OR'ing the vectors together.
PiperOrigin-RevId: 160591849
|