tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	[TF:XLA] Improve the accounting for subcomputations in the heap simulator.	Dimitris Vardoulakis	2018-10-03
\| \| \| \| \| \| \| \|	Subtract the size of the aliased buffers from the subcomputation estimate instead of from the current computation. This way, the memory estimate for the current computation is more accurate. For the newly added test, the heap simulation calculates 48 bytes at head instead of the correct 64 bytes. PiperOrigin-RevId: 215653047
*	[XLA] Migrate from gtl::FlatSet to absl::flat_hash_set	Benjamin Kramer	2018-10-01
\| \| \| \|	PiperOrigin-RevId: 215324035
*	[XLA] Migrate from gtl::FlatMap to absl::flat_hash_map	Benjamin Kramer	2018-10-01
\| \| \| \|	PiperOrigin-RevId: 215272497
*	[XLA] Add a global decreasing size best-fit buffer allocation algorithm, ↵	Yuanzhong Xu	2018-09-21
\| \| \| \| \| \| \| \|	which sorts buffers by size regardless of their alloc/free time. It uses a interval tree to avoid conflicting allocations. Also changed to choose the best result from the new algorithm and the old one. PiperOrigin-RevId: 214032637
*	Rollforward of cl/211656888 after fixing failing unit test.	Mark Heffernan	2018-09-05
\| \| \| \| \| \| \| \| \| \| \|	* Original change description * Add HloSchedule class representing a sequential order of an HloModule. Currently we represent a sequential schedule of a module using a SequentialHloOrdering::HloModuleSequence which is a type alias of a bare map from HloComputation* to std::vector<HloInstruction>. This CL replaces this with a proper class which results in better encap... ** PiperOrigin-RevId: 211726890
*	BEGIN_PUBLIC	Mark Heffernan	2018-09-05
\| \| \| \| \| \|	Automated rollback of commit 7fa693209fe238478739b3982f652a7e35be91f3 PiperOrigin-RevId: 211681957
*	Add HloSchedule class representing a sequential order of an HloModule.	Mark Heffernan	2018-09-05
\| \| \| \| \| \| \| \|	Currently we represent a sequential schedule of a module using a SequentialHloOrdering::HloModuleSequence which is a type alias of a bare map from HloComputation* to std::vector<HloInstruction*>. This CL replaces this with a proper class which results in better encapsulation of code which deals with schedules and better enforcement of invariants. This CL also fixes a corner-case bug in dataflow analysis, where values of instructions which are live out of the computation erroneously did not interfere with the values of instructions scheduled after the root instruction. PiperOrigin-RevId: 211656888
*	[TF:XLA] Handle some of the buffer aliasing across computations in ↵	Dimitris Vardoulakis	2018-08-21
\| \| \| \| \| \| \| \|	HeapSimulator. The new modeling of subcomputations is still not entirely accurate, but probably not worth putting more work into, since TuplePointsToAnalysis will be removed from HeapSimulator soon. PiperOrigin-RevId: 209646234
*	[TF:XLA] Account for subcomputations in heap simulator during scheduling.	Dimitris Vardoulakis	2018-06-14
\| \| \| \|	PiperOrigin-RevId: 200646674
*	[TF:XLA] Move methods MinimumMemoryFor... from hlo_scheduling to heap_simulator.	Dimitris Vardoulakis	2018-06-12
\| \| \| \| \| \| \|	These methods have nothing to do with scheduling. Also, rename methods CreateMemoryMinimizingSequence in hlo_scheduling. PiperOrigin-RevId: 200254100
*	Update HeapSimulator to use BufferValue.	Jeremy Lau	2018-05-11
\| \| \| \|	PiperOrigin-RevId: 196293610
*	Changed the heap simulator to allow it to be configured about whether to ↵	A. Unique TensorFlower	2018-01-30
\| \| \| \| \| \|	issue Alloc/Free for constants, and enable buffer sharing. PiperOrigin-RevId: 183836922
*	Merge changes from github.	A. Unique TensorFlower	2017-12-22
\| \| \| \|	PiperOrigin-RevId: 179953488
*	Add debug protos that serialize HLO graph information.	A. Unique TensorFlower	2017-05-25
\| \| \| \| \| \| \|	Also add flags to dump this data in JSON format, for each backend. This is useful for upcoming debugging tools. PiperOrigin-RevId: 157178357
*	[XLA:HLO] Run HeapSimulator on whole-module if all computations are sequential.	A. Unique TensorFlower	2017-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the HeapSimulator was only run on a per-computation basis. This meant that if you had many sub-computations in your module (e.g. many While loops), the space for all of the temporary buffers inside the conditions and bodies of the loops were in distinct memory ranges. This is overly pessimistic if all computations in the module are sequential. This CL changes the HeapSimulator to also run whole-module simulation, calling Alloc and Free on sub-computation buffers at the appropriate nested spot, right next to the calling instruction. The BufferAssigner is updated to take advantage of this when possible, as is MinimumMemoryForSequence. Change: 154908856
*	[TF:XLA] Reduce sequential memory usage via better ordering and simulated heap.	A. Unique TensorFlower	2017-03-02
	The choice of instruction ordering, and the minimization of fragmentation once we've chosen an order, are two large inter-related factors wrt overall memory usage. The approach in this CL uses heuristics to do better on both, but neither problem is completely solved. To pick a better an ordering (the larger factor), the approach is to try the original list-scheduler based ordering, and to also try a DFS based ordering. We pick the ordering that yields a smaller minimum memory, computed with the simulated heap, ignoring fragmentation. Note that this is the absolute minimum memory for a given ordering. To minimize fragmentation, the approach is to run a heap simulation on temporary buffers. We still try to re-use existing allocations when possible, but instead of creating new allocations for temp buffers, we collect all the leftovers and use a heap to pack them. The heap algorithm that gave the best results is "lazy best-fit"; a variant of traditional best-fit that sometimes delays offset assignment until Free is called, in the hopes of yielding larger free chunks. Here's some measurements of the temp buffer sizes for GNMT encoder training (a stacked LSTM). Lower is better. I've tried various combinations of instruction ordering and heap simulation, to show the joint impact of these two factors. List-scheduler order, no heap simulation 33.33GiB List-scheduler order, with heap simulation 25.09GiB Minimized DFS order, no heap simulation 16.59GiB Arbitrary DFS order, no heap simulation 15.05GiB (old) Arbitrary DFS order, with heap simulation 12.57GiB Minimized DFS order, with heap simulation 11.71GiB (new) Note that the original list scheduler order is much worse than DFS on stacked LSTMs, but (not shown here) is much better than DFS on convolutions like Inception. Also note that heap simulation packs things tighter for all instruction orders in this example, but to varying degrees. Change: 149049028