aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/xla/service/buffer_liveness.h
Commit message (Collapse)AuthorAge
* [XLA] Migrate from gtl::FlatSet to absl::flat_hash_setGravatar Benjamin Kramer2018-10-01
| | | | PiperOrigin-RevId: 215324035
* [XLA] Migrate from gtl::FlatMap to absl::flat_hash_mapGravatar Benjamin Kramer2018-10-01
| | | | PiperOrigin-RevId: 215272497
* [XLA] s/tensorflow::Status/Status/.Gravatar Justin Lebar2018-05-11
| | | | | | | These are type aliases of one another; we'd like to be consistent and use the shorter one. PiperOrigin-RevId: 196322955
* [XLA] Separate logical buffer creation and ownership out of tuple pointer ↵Gravatar A. Unique TensorFlower2017-08-21
| | | | | | analysis. PiperOrigin-RevId: 165963591
* Reduce XLA compile time by ~7% for a convolutional image model:Gravatar A. Unique TensorFlower2017-08-18
| | | | | | | | | | | | | | | | | | | | | | * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117
* Sped up TuplePointsToAnalysis module by consolidating three hashGravatar Jeffrey A. Dean2017-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | tables and a vector into two vectors. One holds per-instruction information (TuplePointsToAnalysis::PerInstruction) and is indexed by HloInstruction::unique_id(), and the other ((TuplePointsToAnalysis::PerLogicalBuffer) holds per-logical-buffer information, indexed by LogicalBuffer::Id. This speeds up compilation of a convolutional image model from 44.4s to 37.39s ( 15.8% improvement), and reduces the number of profiling samples in TuplePointsToAnalysis-related code from 17.38s to 11.87s (31.7% improvement). Changed the interface to get access to the logical buffers collected by the TuplePointsToAnalysis module. Instead of a logical_buffers() accessor that gives access to a whole vector of buffers, we instead expose a num_logical_buffers() method and require clients to iterate sequentially over the buffer id space and call a logical_buffer(id) access to get access to each individual buffer (this is necessary because we no longer keep the buffers in a flat std::vector object, and we'd rather not leak this kind of representation information through the interface anyway). PiperOrigin-RevId: 164199505
* Allow access to the module BufferLiveness applies to.Gravatar A. Unique TensorFlower2017-07-20
| | | | PiperOrigin-RevId: 162690202
* [XLA] Allow logical buffer coloring to use liveness information.Gravatar A. Unique TensorFlower2017-07-17
| | | | PiperOrigin-RevId: 162275084
* [XLA] Make logical buffer coloring run on output of points-to analysis.Gravatar A. Unique TensorFlower2017-06-27
| | | | PiperOrigin-RevId: 160354095
* [XLA] Add support for logical buffer coloring.Gravatar A. Unique TensorFlower2017-06-09
| | | | | | Buffers can now be assigned "color" tags. Buffers that have different colors must live in separate allocations. PiperOrigin-RevId: 158575828
* [TF:XLA] Reduce sequential memory usage via better ordering and simulated heap.Gravatar A. Unique TensorFlower2017-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The choice of instruction ordering, and the minimization of fragmentation once we've chosen an order, are two large inter-related factors wrt overall memory usage. The approach in this CL uses heuristics to do better on both, but neither problem is completely solved. To pick a better an ordering (the larger factor), the approach is to try the original list-scheduler based ordering, and to also try a DFS based ordering. We pick the ordering that yields a smaller minimum memory, computed with the simulated heap, ignoring fragmentation. Note that this is the absolute minimum memory for a given ordering. To minimize fragmentation, the approach is to run a heap simulation on temporary buffers. We still try to re-use existing allocations when possible, but instead of creating new allocations for temp buffers, we collect all the leftovers and use a heap to pack them. The heap algorithm that gave the best results is "lazy best-fit"; a variant of traditional best-fit that sometimes delays offset assignment until Free is called, in the hopes of yielding larger free chunks. Here's some measurements of the temp buffer sizes for GNMT encoder training (a stacked LSTM). Lower is better. I've tried various combinations of instruction ordering and heap simulation, to show the joint impact of these two factors. List-scheduler order, no heap simulation 33.33GiB List-scheduler order, with heap simulation 25.09GiB Minimized DFS order, no heap simulation 16.59GiB Arbitrary DFS order, no heap simulation 15.05GiB (old) Arbitrary DFS order, with heap simulation 12.57GiB Minimized DFS order, with heap simulation 11.71GiB (new) Note that the original list scheduler order is much worse than DFS on stacked LSTMs, but (not shown here) is much better than DFS on convolutions like Inception. Also note that heap simulation packs things tighter for all instruction orders in this example, but to varying degrees. Change: 149049028
* Break out HloOrdering classes into separate files.Gravatar Mark Heffernan2017-01-25
| | | | | | | Add CreateMemoryMinimizingSequence which constructs a sequence of the instructions in an HLO module that heuristically minimizes the total size of live buffers containing HLO outputs. Change: 145599747
* Initial open-source release of XLA: Accelerated Linear Algebra.Gravatar Peter Hawkins2017-01-09
XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941