aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/xla/service/heap_simulator_test.cc
Commit message (Collapse)AuthorAge
* [TF:XLA] Improve the accounting for subcomputations in the heap simulator.Gravatar Dimitris Vardoulakis2018-10-03
| | | | | | | | Subtract the size of the aliased buffers from the subcomputation estimate instead of from the current computation. This way, the memory estimate for the current computation is more accurate. For the newly added test, the heap simulation calculates 48 bytes at head instead of the correct 64 bytes. PiperOrigin-RevId: 215653047
* [XLA] Migrate from gtl::FlatMap to absl::flat_hash_mapGravatar Benjamin Kramer2018-10-01
| | | | PiperOrigin-RevId: 215272497
* [XLA] Add a global decreasing size best-fit buffer allocation algorithm, ↵Gravatar Yuanzhong Xu2018-09-21
| | | | | | | | which sorts buffers by size regardless of their alloc/free time. It uses a interval tree to avoid conflicting allocations. Also changed to choose the best result from the new algorithm and the old one. PiperOrigin-RevId: 214032637
* [TF:XLA] Migrate unit tests to use the HLO verifier (only tests where the ↵Gravatar Dimitris Vardoulakis2018-09-10
| | | | | | conversion is mostly automated). PiperOrigin-RevId: 212303594
* Rollforward of cl/211656888 after fixing failing unit test.Gravatar Mark Heffernan2018-09-05
| | | | | | | | | | | *** Original change description *** Add HloSchedule class representing a sequential order of an HloModule. Currently we represent a sequential schedule of a module using a SequentialHloOrdering::HloModuleSequence which is a type alias of a bare map from HloComputation* to std::vector<HloInstruction*>. This CL replaces this with a proper class which results in better encap... *** PiperOrigin-RevId: 211726890
* BEGIN_PUBLICGravatar Mark Heffernan2018-09-05
| | | | | | Automated rollback of commit 7fa693209fe238478739b3982f652a7e35be91f3 PiperOrigin-RevId: 211681957
* [TF:XLA] Define DefaultPrecisionConfig in HloTestBase and delete multiple ↵Gravatar Dimitris Vardoulakis2018-09-05
| | | | | | duplicate definitions. PiperOrigin-RevId: 211662523
* Add HloSchedule class representing a sequential order of an HloModule.Gravatar Mark Heffernan2018-09-05
| | | | | | | | Currently we represent a sequential schedule of a module using a SequentialHloOrdering::HloModuleSequence which is a type alias of a bare map from HloComputation* to std::vector<HloInstruction*>. This CL replaces this with a proper class which results in better encapsulation of code which deals with schedules and better enforcement of invariants. This CL also fixes a corner-case bug in dataflow analysis, where values of instructions which are live out of the computation erroneously did not interfere with the values of instructions scheduled after the root instruction. PiperOrigin-RevId: 211656888
* [XLA] Make kConvolution, kDot HLO attributes mandatoryGravatar David Majnemer2018-09-04
| | | | | | | | HLO transformations would forget to propagate the feature depth attribute. Making these attributes mandatory, while slightly less convenient for tests, makes HLO transformations more robust. PiperOrigin-RevId: 211490160
* [XLA] Use absl::make_unique instead of xla::MakeUnique.Gravatar Justin Lebar2018-08-20
| | | | | | Same for WrapUnique. PiperOrigin-RevId: 209531124
* [TF:XLA] Split literal_util into {literal, literal_util}.Gravatar Kay Zhu2018-07-03
| | | | | | | | | Currently Literal classes sits in literal_util.{h,cc} instead of literal.{h,cc}. It also contains helper functions that are better fit to be their own separate class/namespace. This change starts this process by moving most static factory methods to LiteralUtil namespace. PiperOrigin-RevId: 203217065
* Enable multioutput fusion opearnd buffer reuse.Gravatar Yunxing Dai2018-06-21
| | | | | | | | - Enable multioutput fusion opearnd buffer reuse. - Fix a bug in heap simulator where a buffer can be reused twice. - Add unittest. PiperOrigin-RevId: 201567720
* [TF:XLA] Account for subcomputations in heap simulator during scheduling.Gravatar Dimitris Vardoulakis2018-06-14
| | | | PiperOrigin-RevId: 200646674
* [TF:XLA] Move methods MinimumMemoryFor... from hlo_scheduling to heap_simulator.Gravatar Dimitris Vardoulakis2018-06-12
| | | | | | | These methods have nothing to do with scheduling. Also, rename methods CreateMemoryMinimizingSequence in hlo_scheduling. PiperOrigin-RevId: 200254100
* Update HeapSimulator to use BufferValue.Gravatar Jeremy Lau2018-05-11
| | | | PiperOrigin-RevId: 196293610
* BufferValue is a new base class for LogicalBuffer and HloValue. This makes itGravatar Jeremy Lau2018-05-02
| | | | | | | easier to migrate from TuplePointsToAnalysis/LogicalBuffer to HloDataflowAnalysis/HloValue. No functional changes. PiperOrigin-RevId: 195179676
* [TF:XLA]Gravatar Dimitris Vardoulakis2018-04-28
| | | | | | | - Require a module config when creating an HloModule. - All tests using HloTestBase create a module using CreateNewModule. PiperOrigin-RevId: 194684585
* [XLA] GTE of a certain element of the tuple does not need not keep other ↵Gravatar Michael Kuperstein2018-02-26
| | | | | | | | | | elements alive. This achieves two things: 1. Heap simulation runtime is no longer quadratic in the number of tuple elements (as we don't add each GetTupleElement to the liveset of each buffer defined by the tuple). 2. A reduction in the heap memory footprint. PiperOrigin-RevId: 187079787
* [XLA] Adds Dot with DotDimensionNumbers proto for specifying arbitrary ↵Gravatar A. Unique TensorFlower2017-11-30
| | | | | | contracting and batch dimensions. PiperOrigin-RevId: 177481231
* Use xla/tests:xla_internal_test_main for all tests under tf/compiler/xlaGravatar Mark Heffernan2017-09-12
| | | | | | | and remove any main() definitions in tests. This enables use of flags in all tests. PiperOrigin-RevId: 168424796
* Add flag parsing to more tests in xla/service specifically those which buildGravatar Mark Heffernan2017-09-12
| | | | | | | | HLO graphs. This enables, for example, dumping of the graphs with --xla_generate_hlo_graph. Also remove some superfluous tensorflow test_main dependencies. PiperOrigin-RevId: 168406746
* Reduce XLA compile time by ~7% for a convolutional image model:Gravatar A. Unique TensorFlower2017-08-18
| | | | | | | | | | | | | | | | | | | | | | * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117
* [XLA] Make logical buffer coloring run on output of points-to analysis.Gravatar A. Unique TensorFlower2017-06-27
| | | | PiperOrigin-RevId: 160354095
* Remove class xla::LiteralUtil. NFC (mind-numbingly so).Gravatar A. Unique TensorFlower2017-06-19
| | | | | | This patch removes class xla::LiteralUtil and rewrites every call to use class xla::Literal instead. PiperOrigin-RevId: 159446373
* [XLA] Add support for logical buffer coloring.Gravatar A. Unique TensorFlower2017-06-09
| | | | | | Buffers can now be assigned "color" tags. Buffers that have different colors must live in separate allocations. PiperOrigin-RevId: 158575828
* [XLA:HLO] Run HeapSimulator on whole-module if all computations are sequential.Gravatar A. Unique TensorFlower2017-05-02
| | | | | | | | | | | | | | Previously the HeapSimulator was only run on a per-computation basis. This meant that if you had many sub-computations in your module (e.g. many While loops), the space for all of the temporary buffers inside the conditions and bodies of the loops were in distinct memory ranges. This is overly pessimistic if all computations in the module are sequential. This CL changes the HeapSimulator to also run whole-module simulation, calling Alloc and Free on sub-computation buffers at the appropriate nested spot, right next to the calling instruction. The BufferAssigner is updated to take advantage of this when possible, as is MinimumMemoryForSequence. Change: 154908856
* [TF:XLA] Reduce sequential memory usage via better ordering and simulated heap.Gravatar A. Unique TensorFlower2017-03-02
The choice of instruction ordering, and the minimization of fragmentation once we've chosen an order, are two large inter-related factors wrt overall memory usage. The approach in this CL uses heuristics to do better on both, but neither problem is completely solved. To pick a better an ordering (the larger factor), the approach is to try the original list-scheduler based ordering, and to also try a DFS based ordering. We pick the ordering that yields a smaller minimum memory, computed with the simulated heap, ignoring fragmentation. Note that this is the absolute minimum memory for a given ordering. To minimize fragmentation, the approach is to run a heap simulation on temporary buffers. We still try to re-use existing allocations when possible, but instead of creating new allocations for temp buffers, we collect all the leftovers and use a heap to pack them. The heap algorithm that gave the best results is "lazy best-fit"; a variant of traditional best-fit that sometimes delays offset assignment until Free is called, in the hopes of yielding larger free chunks. Here's some measurements of the temp buffer sizes for GNMT encoder training (a stacked LSTM). Lower is better. I've tried various combinations of instruction ordering and heap simulation, to show the joint impact of these two factors. List-scheduler order, no heap simulation 33.33GiB List-scheduler order, with heap simulation 25.09GiB Minimized DFS order, no heap simulation 16.59GiB Arbitrary DFS order, no heap simulation 15.05GiB (old) Arbitrary DFS order, with heap simulation 12.57GiB Minimized DFS order, with heap simulation 11.71GiB (new) Note that the original list scheduler order is much worse than DFS on stacked LSTMs, but (not shown here) is much better than DFS on convolutions like Inception. Also note that heap simulation packs things tighter for all instruction orders in this example, but to varying degrees. Change: 149049028