aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/xla/shape_tree_test.cc
Commit message (Collapse)AuthorAge
* [XLA] Use absl::make_unique instead of xla::MakeUnique.Gravatar Justin Lebar2018-08-20
| | | | | | Same for WrapUnique. PiperOrigin-RevId: 209531124
* [XLA] Clean up clang tidy readability warnings in compiler/xlaGravatar Benjamin Kramer2018-08-06
| | | | | | | | | | | | | | | | * lambda capture 'builder' is not used * using decl 'Printf' is unused * lambda capture 'this' is not used (17 times) * lambda capture 'buffer_liveness' is not used * lambda capture 'computation' is not used * lambda capture 'operand_to_generator' is not used * lambda capture 'M' is not used * using decl 'InvalidParameterArgument' is unused * lambda capture 'sum' is not used * lambda capture 's' is not used * lambda capture 'epsilon' is not used PiperOrigin-RevId: 207542895
* - Use InlinedVector for ShapeIndex.Gravatar Yunxing Dai2018-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes. - Set up benchmark for flat tuples, useful as a benchmark for future optimizations. name old time/op new time/op delta BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5) BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5) BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5) BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5) BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5) BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5) BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5) BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5) BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5) BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4) BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5) BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5) name old allocs/op new allocs/op delta BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5) BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5) BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5) BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5) BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5) BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5) BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) PiperOrigin-RevId: 205154861
* BEGIN_PUBLICGravatar A. Unique TensorFlower2018-07-18
| | | | | | | | | Rollback of 81161f9d9987a8eb70793d95048c20be34292859 due internal breakage. END_PUBLIC Automated rollback of commit 81161f9d9987a8eb70793d95048c20be34292859 PiperOrigin-RevId: 205107655
* - Use InlinedVector for ShapeIndex.Gravatar Yunxing Dai2018-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Use a separate IndexTable for lookups. This reduces the number of cachelines needed for storing ShapeTreeNodes. - Set up benchmark for flat tuples, useful as a benchmark for future optimizations. name old time/op new time/op delta BM_Construct/2/8 8.34?s ? 1% 7.57?s ? 2% -9.26% (p=0.008 n=5+5) BM_Construct/1/1000 143?s ? 1% 132?s ? 1% -7.29% (p=0.008 n=5+5) BM_ConstructUnowned/2/8 2.18?s ? 4% 1.31?s ? 1% -39.99% (p=0.008 n=5+5) BM_ConstructUnowned/1/1000 23.0?s ? 7% 15.1?s ? 1% -34.47% (p=0.008 n=5+5) BM_Copy/2/8 1.52?s ? 5% 0.37?s ? 1% -76.01% (p=0.008 n=5+5) BM_Copy/1/1000 18.7?s ? 3% 4.9?s ? 2% -73.85% (p=0.008 n=5+5) BM_Move/2/8 0.03ns ? 2% 13.42ns ? 1% +40877.10% (p=0.016 n=4+5) BM_Move/1/1000 0.03ns ? 0% 13.54ns ? 3% +40930.30% (p=0.016 n=4+5) BM_ForEach/2/8 26.4ns ? 1% 27.9ns ? 2% +5.77% (p=0.008 n=5+5) BM_ForEach/1/1000 271ns ? 1% 273ns ? 0% +0.81% (p=0.016 n=5+4) BM_Iterate/2/8 25.5ns ? 3% 23.9ns ? 8% ~ (p=0.151 n=5+5) BM_Iterate/1/1000 272ns ? 2% 271ns ? 1% ~ (p=0.984 n=5+5) name old allocs/op new allocs/op delta BM_Construct/2/8 373 ? 0% 276 ? 0% -26.01% (p=0.008 n=5+5) BM_Construct/1/1000 5.00k ? 0% 4.00k ? 0% -20.00% (p=0.008 n=5+5) BM_ConstructUnowned/2/8 99.0 ? 0% 2.0 ? 0% -97.98% (p=0.008 n=5+5) BM_ConstructUnowned/1/1000 1.00k ? 0% 0.00k ? 0% -99.80% (p=0.008 n=5+5) BM_Copy/2/8 105 ? 0% 19 ? 0% -81.90% (p=0.008 n=5+5) BM_Copy/1/1000 1.31k ? 0% 0.25k ? 0% -80.84% (p=0.008 n=5+5) BM_Move/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_Move/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) BM_ForEach/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_ForEach/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) BM_Iterate/2/8 23.0 ? 0% 17.0 ? 0% -26.09% (p=0.008 n=5+5) BM_Iterate/1/1000 313 ? 0% 250 ? 0% -20.13% (p=0.008 n=5+5) PiperOrigin-RevId: 205024687
* Do not count empty tuples as having one leaf node.Gravatar A. Unique TensorFlower2018-06-12
| | | | PiperOrigin-RevId: 200327849
* [XLA] Optimize ShapeTree<T>Gravatar A. Unique TensorFlower2018-05-22
| | | | | | | | | | | | | | This optimizes ShapeTree quite significantly. In particular this optimizes for the common case of querying/iterating, copying and moving ShapeTrees. * Allocate all ShapeTreeNodes inside a single, owned, vector. This reduces the number of memory allocations and improves cache performance. * Instead of storing children nodes as unique_ptrs, store them as indices into the owning container's vector. This allows cheap copy-construction (a std::vector POD copy) and doesn't change the fast path (dereferencing a pointer is just as fast as dereferencing a base + offset). * Instead of a unique_ptr<Shape>, use a shared_ptr<Shape>. This removes a load of copy-construction overhead at the cost of a shared_ptr over a unique_ptr (one extra allocation). * Instead of computing ShapeIndexes on-demand in the iterators/ForEach*, precompute them during construction time. This adds a few more bytes per ShapeTree, but now we can... * ... store a std::pair<ShapeIndex, T> as the ShapeTreeNode's data element. This allows us to provide a std::pair<K,V>&, STL-like interface from iterators without going through any of the previous unique_ptr hacks around storage lifetimes. * Because we no longer need to iterate from the beginning to build up the ShapeIndex, we can now offer a ::find() function to return an iterator for a ShapeIndex in O(K) time. As the iteration order is guaranteed to be pre-order, this can be used (and will be, later) to speed up the fast-path of mutating a subtree of a ShapeTree from tf2xla::ExtractSubBuffers. * Similarly because we now have a very standard, cheap STL interface with no performance cliffs, we can hopefully improve ShapedBuffer's copy and move constructors to be cheaper. PiperOrigin-RevId: 197548717
* Change HloSharding to allow getting a ShapeTree for non-tuple types.Gravatar A. Unique TensorFlower2017-11-10
| | | | | | Add reverse iteration to ShapeTree. PiperOrigin-RevId: 175341255
* Fix const_iterator in ShapeTree. The existing implementation didn't workGravatar Mark Heffernan2017-08-04
| | | | | | | because const Foo<T>* and Foo<const T>* are not convertible in C++ which broke the internal machinery of the iterator in the const case. PiperOrigin-RevId: 164276236
* Add an iterator implementing std::forward_iterator_tag to ShapeTree.Gravatar A. Unique TensorFlower2017-08-02
| | | | | | This allows it to be iterated over by a for-loop. PiperOrigin-RevId: 163950729
* Speed up TuplePointsToAnalysis.Gravatar Justin Lebar2017-06-28
| | | | | | | | | | | | | | | This analysis is one of the most expensive parts of the HLO optimization pipeline. - Avoid one or two unnecessary hashtable lookups in PopulateDefinedBuffersAndAliases. - Add a mode to ShapeTree wherein we avoid copying Shapes. - Use templated functors rather than std::function in ShapeTree's iterators, thus avoiding the overhead of std::function. PiperOrigin-RevId: 160487485
* [XLA] Simplify Shape traversal visitors.Gravatar Mark Heffernan2017-06-06
| | | | | | Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used. PiperOrigin-RevId: 158201574
* [XLA] Various improvements to ShapeTree.Gravatar Mark Heffernan2017-06-01
| | | | | | | | Add support for holding non-copyable types, operator==, and a CopySubtreeFrom method for copying a subtree from one ShapeTree to another. PiperOrigin-RevId: 157777636
* Fix ShapeTree operator= to actually work. Also added some tests.Gravatar A. Unique TensorFlower2017-04-17
| | | | Change: 153385325
* [XLA] Improve ShapeTree implementation.Gravatar A. Unique TensorFlower2017-04-16
| | | | | | | | | | | | | | | | | | 1) Ensure PODs are zero-initialized via the shape-constructor; i.e. ShapeTree<bool>(shape) has all data items initialized to false. 2) Add a default constructor, which makes it possible to use operator[] in a FlatMap<_, ShapeTree<_>>. This creates a tree where !ShapeTree::IsValid. 3) Change the representation so that the root node holds the shape by value, while non-root nodes don't hold a shape. This is "cleaner" than holding an unused shape in non-root nodes, and also simplifies the implementation. There is now a struct internal::ShapeTreeNode only used by ShapeTree. 4) Simplify constructors to delegate to a single implementation. 5) Fix bug in ShapeTree(Shape shape, T init_value) constructor, which caused the existing code to only work for T=bool. 6) Fix documentation to indicate that a T data value is held for every node in the tree, not just leaf nodes. That was the original behavior, which is unchanged in this CL; it's just the documentation that was wrong. Change: 153305130
* Using GMock matchers in XLA tests.Gravatar A. Unique TensorFlower2017-04-11
| | | | Change: 152823724
* Initial open-source release of XLA: Accelerated Linear Algebra.Gravatar Peter Hawkins2017-01-09
XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941