| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
Same for WrapUnique.
PiperOrigin-RevId: 209531124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* lambda capture 'builder' is not used
* using decl 'Printf' is unused
* lambda capture 'this' is not used (17 times)
* lambda capture 'buffer_liveness' is not used
* lambda capture 'computation' is not used
* lambda capture 'operand_to_generator' is not used
* lambda capture 'M' is not used
* using decl 'InvalidParameterArgument' is unused
* lambda capture 'sum' is not used
* lambda capture 's' is not used
* lambda capture 'epsilon' is not used
PiperOrigin-RevId: 207542895
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes.
- Set up benchmark for flat tuples, useful as a benchmark for future optimizations.
name old time/op new time/op delta
BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5)
BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5)
BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5)
BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5)
BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5)
BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5)
BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5)
BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4)
BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5)
BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5)
name old allocs/op new allocs/op delta
BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5)
BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5)
BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5)
BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5)
BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
PiperOrigin-RevId: 205154861
|
|
|
|
|
|
|
|
|
| |
Rollback of 81161f9d9987a8eb70793d95048c20be34292859 due internal breakage.
END_PUBLIC
Automated rollback of commit 81161f9d9987a8eb70793d95048c20be34292859
PiperOrigin-RevId: 205107655
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes.
- Set up benchmark for flat tuples, useful as a benchmark for future optimizations.
name old time/op new time/op delta
BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5)
BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5)
BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5)
BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5)
BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5)
BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5)
BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5)
BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4)
BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5)
BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5)
name old allocs/op new allocs/op delta
BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5)
BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5)
BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5)
BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5)
BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5)
BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5)
BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5)
BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5)
PiperOrigin-RevId: 205024687
|
|
|
|
| |
PiperOrigin-RevId: 200327849
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimizes ShapeTree quite significantly. In particular this optimizes for the common case of querying/iterating, copying and moving ShapeTrees.
* Allocate all ShapeTreeNodes inside a single, owned, vector. This reduces the number of memory allocations and improves cache performance.
* Instead of storing children nodes as unique_ptrs, store them as indices into the owning container's vector. This allows cheap copy-construction (a std::vector POD copy) and doesn't change the fast path (dereferencing a pointer is just as fast as dereferencing a base + offset).
* Instead of a unique_ptr<Shape>, use a shared_ptr<Shape>. This removes a load of copy-construction overhead at the cost of a shared_ptr over a unique_ptr (one extra allocation).
* Instead of computing ShapeIndexes on-demand in the iterators/ForEach*, precompute them during construction time. This adds a few more bytes per ShapeTree, but now we can...
* ... store a std::pair<ShapeIndex, T> as the ShapeTreeNode's data element. This allows us to provide a std::pair<K,V>&, STL-like interface from iterators without going through any of the previous unique_ptr hacks around storage lifetimes.
* Because we no longer need to iterate from the beginning to build up the ShapeIndex, we can now offer a ::find() function to return an iterator for a ShapeIndex in O(K) time. As the iteration order is guaranteed to be pre-order, this can be used (and will be, later) to speed up the fast-path of mutating a subtree of a ShapeTree from tf2xla::ExtractSubBuffers.
* Similarly because we now have a very standard, cheap STL interface with no performance cliffs, we can hopefully improve ShapedBuffer's copy and move constructors to be cheaper.
PiperOrigin-RevId: 197548717
|
|
|
|
|
|
| |
Add reverse iteration to ShapeTree.
PiperOrigin-RevId: 175341255
|
|
|
|
|
|
|
| |
because const Foo<T>* and Foo<const T>* are not convertible in C++ which broke
the internal machinery of the iterator in the const case.
PiperOrigin-RevId: 164276236
|
|
|
|
|
|
| |
This allows it to be iterated over by a for-loop.
PiperOrigin-RevId: 163950729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This analysis is one of the most expensive parts of the HLO optimization
pipeline.
- Avoid one or two unnecessary hashtable lookups in
PopulateDefinedBuffersAndAliases.
- Add a mode to ShapeTree wherein we avoid copying Shapes.
- Use templated functors rather than std::function in ShapeTree's
iterators, thus avoiding the overhead of std::function.
PiperOrigin-RevId: 160487485
|
|
|
|
|
|
| |
Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used.
PiperOrigin-RevId: 158201574
|
|
|
|
|
|
|
|
| |
Add support for holding non-copyable types, operator==, and a
CopySubtreeFrom method for copying a subtree from one ShapeTree to
another.
PiperOrigin-RevId: 157777636
|
|
|
|
| |
Change: 153385325
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Ensure PODs are zero-initialized via the shape-constructor; i.e.
ShapeTree<bool>(shape) has all data items initialized to false.
2) Add a default constructor, which makes it possible to use operator[] in a
FlatMap<_, ShapeTree<_>>. This creates a tree where !ShapeTree::IsValid.
3) Change the representation so that the root node holds the shape by value,
while non-root nodes don't hold a shape. This is "cleaner" than holding an
unused shape in non-root nodes, and also simplifies the implementation.
There is now a struct internal::ShapeTreeNode only used by ShapeTree.
4) Simplify constructors to delegate to a single implementation.
5) Fix bug in ShapeTree(Shape shape, T init_value) constructor, which caused
the existing code to only work for T=bool.
6) Fix documentation to indicate that a T data value is held for every node in
the tree, not just leaf nodes. That was the original behavior, which is
unchanged in this CL; it's just the documentation that was wrong.
Change: 153305130
|
|
|
|
| |
Change: 152823724
|
|
XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators.
XLA is still experimental; we are releasing it early to get the community involved.
Change: 143990941
|