| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
that isn't part of their public API.
PiperOrigin-RevId: 212123326
|
|
|
|
|
|
| |
dependencies as well.
PiperOrigin-RevId: 211038094
|
|
|
|
|
|
|
| |
1. Propose a new API with ability to do input/output.
2. Start to enable ABSL in TF's codebase.
PiperOrigin-RevId: 210783617
|
|
|
|
| |
PiperOrigin-RevId: 209686671
|
|
|
|
|
|
| |
Same for WrapUnique.
PiperOrigin-RevId: 209531124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes.
- Set up benchmark for flat tuples, useful as a benchmark for future optimizations.
name old time/op new time/op delta
BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5)
BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5)
BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5)
BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5)
BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5)
BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5)
BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5)
BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4)
BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5)
BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5)
name old allocs/op new allocs/op delta
BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5)
BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5)
BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5)
BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5)
BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
PiperOrigin-RevId: 205154861
|
|
|
|
|
|
|
|
|
| |
Rollback of 81161f9d9987a8eb70793d95048c20be34292859 due internal breakage.
END_PUBLIC
Automated rollback of commit 81161f9d9987a8eb70793d95048c20be34292859
PiperOrigin-RevId: 205107655
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes.
- Set up benchmark for flat tuples, useful as a benchmark for future optimizations.
name old time/op new time/op delta
BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5)
BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5)
BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5)
BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5)
BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5)
BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5)
BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5)
BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4)
BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5)
BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5)
name old allocs/op new allocs/op delta
BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5)
BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5)
BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5)
BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5)
BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
PiperOrigin-RevId: 205024687
|
|
|
|
|
|
|
| |
Avoids creating temporary std::vectors on the consumer side. Also push ShapeIndexViews
through the GPU backend a bit.
PiperOrigin-RevId: 201529722
|
|
|
|
| |
PiperOrigin-RevId: 200327849
|
|
|
|
|
|
|
|
| |
instructions with similar attributes (ie, sharding).
This CL simply adds the infrastructure, but leaves the wire-on to a separate CL.
PiperOrigin-RevId: 198503625
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes ShapeTree quite significantly. In particular this optimizes for the common case of querying/iterating, copying and moving ShapeTrees.
* Allocate all ShapeTreeNodes inside a single, owned, vector. This reduces the number of memory allocations and improves cache performance.
* Instead of storing children nodes as unique_ptrs, store them as indices into the owning container's vector. This allows cheap copy-construction (a std::vector POD copy) and doesn't change the fast path (dereferencing a pointer is just as fast as dereferencing a base + offset).
* Instead of a unique_ptr<Shape>, use a shared_ptr<Shape>. This removes a load of copy-construction overhead at the cost of a shared_ptr over a unique_ptr (one extra allocation).
* Instead of computing ShapeIndexes on-demand in the iterators/ForEach*, precompute them during construction time. This adds a few more bytes per ShapeTree, but now we can...
* ... store a std::pair<ShapeIndex, T> as the ShapeTreeNode's data element. This allows us to provide a std::pair<K,V>&, STL-like interface from iterators without going through any of the previous unique_ptr hacks around storage lifetimes.
* Because we no longer need to iterate from the beginning to build up the ShapeIndex, we can now offer a ::find() function to return an iterator for a ShapeIndex in O(K) time. As the iteration order is guaranteed to be pre-order, this can be used (and will be, later) to speed up the fast-path of mutating a subtree of a ShapeTree from tf2xla::ExtractSubBuffers.
* Similarly because we now have a very standard, cheap STL interface with no performance cliffs, we can hopefully improve ShapedBuffer's copy and move constructors to be cheaper.
PiperOrigin-RevId: 197548717
|
|
|
|
| |
PiperOrigin-RevId: 187520283
|
|
|
|
|
|
| |
moving memory ownership instead of copying.
PiperOrigin-RevId: 185871648
|
|
|
|
| |
PiperOrigin-RevId: 179953488
|
|
|
|
| |
PiperOrigin-RevId: 179260538
|
|
|
|
| |
PiperOrigin-RevId: 179258973
|
|
|
|
|
|
| |
Add reverse iteration to ShapeTree.
PiperOrigin-RevId: 175341255
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of each individual subtensor individually. Tuples are essentially just
containers - the tensors they contain should be able to be sharded
differently.
Tuples are hierarchically structured, but shardings were designed to
not contain the sharded type (the sharded type is inferred from the
output type of the instruction the sharding is applied to). Therefore,
shardings for tuples contain shardings for each subtensor as a
non-structured list.
This list is ordered as a preorder walk of the tuple shape, and of
course only the leaf nodes of the tuple shape are stored. The
structure is reapplied when the sharded instruction's shape is known.
PiperOrigin-RevId: 175132692
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added CompactPointerSet<T>, which is optimized for set size <= 1.
* Changed expensive CHECKs to DCHECKS in buffer_assignment.cc
* Reserve space in DFS state array before starting DFS.
* Use unsigned arithmetic in DFS state maintenance.
* HloInstruction:
- Moved frequently used fields to start for better cache locality.
- Use InlinedVector instead of vector for operand array.
- Use InlinedVector instead of vector for DFS stack.
* Pre-compute "is array" and "is tuple" for LogicalBuffer.
* PointsToSet:
- Combine two ShapeTrees into one.
- Use CompactPointerSet instead of std::set to hold sources.
- Use CompactPointerSet instead of std::set to hold flattened buffers.
* ShapeTree: use unique_ptr instead of optional for shape storage
(reduces size and destruction overhead).
* Add proper const qualifiers to some FlatSet iterator methods.
Co-author=jeff
PiperOrigin-RevId: 165759117
|
|
|
|
|
|
|
| |
because const Foo<T>* and Foo<const T>* are not convertible in C++ which broke
the internal machinery of the iterator in the const case.
PiperOrigin-RevId: 164276236
|
|
|
|
|
|
| |
This allows it to be iterated over by a for-loop.
PiperOrigin-RevId: 163950729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This analysis is one of the most expensive parts of the HLO optimization
pipeline.
- Avoid one or two unnecessary hashtable lookups in
PopulateDefinedBuffersAndAliases.
- Add a mode to ShapeTree wherein we avoid copying Shapes.
- Use templated functors rather than std::function in ShapeTree's
iterators, thus avoiding the overhead of std::function.
PiperOrigin-RevId: 160487485
|
|
|
|
|
|
| |
Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used.
PiperOrigin-RevId: 158201574
|
|
|
|
|
|
|
|
| |
Add support for holding non-copyable types, operator==, and a
CopySubtreeFrom method for copying a subtree from one ShapeTree to
another.
PiperOrigin-RevId: 157777636
|
|
|
|
| |
Change: 153385325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Ensure PODs are zero-initialized via the shape-constructor; i.e.
ShapeTree<bool>(shape) has all data items initialized to false.
2) Add a default constructor, which makes it possible to use operator[] in a
FlatMap<_, ShapeTree<_>>. This creates a tree where !ShapeTree::IsValid.
3) Change the representation so that the root node holds the shape by value,
while non-root nodes don't hold a shape. This is "cleaner" than holding an
unused shape in non-root nodes, and also simplifies the implementation.
There is now a struct internal::ShapeTreeNode only used by ShapeTree.
4) Simplify constructors to delegate to a single implementation.
5) Fix bug in ShapeTree(Shape shape, T init_value) constructor, which caused
the existing code to only work for T=bool.
6) Fix documentation to indicate that a T data value is held for every node in
the tree, not just leaf nodes. That was the original behavior, which is
unchanged in this CL; it's just the documentation that was wrong.
Change: 153305130
|
|
XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators.
XLA is still experimental; we are releasing it early to get the community involved.
Change: 143990941
|