Add ScopedAllocatorOptimizer in support of CollectiveReduce.

The efficiency of CollectiveReduce is greatly improved by merging multiple parallel reductions over smaller tensors into a single reduction over a larger tensor that is the concatentation of the smaller tensors. Because CollectiveReduce is essentially an element-wise array operation which operates on a 1-D reshape of the input tensor it is eligible for a ScopedAllocation optimization. The optimization works by looking for serially independent instances of CollectiveReduce that lie within the same name-scope tier and have the same control-flow (e.g. loop) embedding structure. Where two or more such nodes are found the upstream nodes that generate their inputs are modified to write their outputs into consecutive regions of a single tensor buffer maintained by a ScopedAllocator. The multiple CollectiveReduce nodes are then replaced by a single CollectiveReduce that operates in-place on the backing buffer. The effectiveness of the optimization depends on there being candidate CollectiveReduce nodes with these characteristics that become eligible for execution at close to the same time. If the name scope is too large, and includes nodes that become execution eligible at very different times, this graph rewrite could result in a slowdown. Note that this optimization is experimental: it is not guaranteed to work, especially for ops other than CollectiveReduce. PiperOrigin-RevId: 198089642
author: A. Unique TensorFlower <gardener@tensorflow.org> 2018-05-25 12:54:49 -0700
committer: TensorFlower Gardener <gardener@tensorflow.org> 2018-05-25 12:57:18 -0700
commit: 0b522fd22b986704d1056254961cc7988ae182eb (patch)
tree: 472c18f77c5e6b2c1dae0f1aacd6234f5e53436b /tensorflow/core/protobuf
parent: ae0eb1b7f81f6d98e0503b9568c72feaa805e655 (diff)
1 files changed, 10 insertions, 0 deletions
diff --git a/tensorflow/core/protobuf/rewriter_config.proto b/tensorflow/core/protobuf/rewriter_config.proto
index 45e57594e4..bbb25d6f3f 100644
--- a/tensorflow/core/protobuf/rewriter_config.proto
+++ b/tensorflow/core/protobuf/rewriter_config.proto
@@ -14,6 +14,11 @@ message AutoParallelOptions {
   int32 num_replicas = 2;
 }
 
+message ScopedAllocatorOptions {
+  // If present, only perform optimization for these ops.
+  repeated string enable_op = 1;
+}
+
 message RewriterConfig {
   // Graph rewriting is experimental and subject to change, not covered by any
   // API stability guarantees.
@@ -67,6 +72,9 @@ message RewriterConfig {
   Toggle debug_stripper = 11;
   // If true, don't remove unnecessary ops from the graph
   bool disable_model_pruning = 2;
+  // Try to allocate some independent Op outputs contiguously in order to
+  // merge or eliminate downstream Ops (off by default).
+  Toggle scoped_allocator_optimization = 15;
 
   // Controls how many times we run the optimizers in meta optimizer (default
   // is once).
@@ -115,6 +123,8 @@ message RewriterConfig {
   // meta-optimizer or when manually specified through the optimizers field.
   AutoParallelOptions auto_parallel = 5;
 
+  ScopedAllocatorOptions scoped_allocator_opts = 16;
+
   // If non-empty, will use this as an alternative way to specify a list of
   // optimizations to turn on and the order of the optimizations (replacing the
   // meta-optimizer).
author	A. Unique TensorFlower <gardener@tensorflow.org>	2018-05-25 12:54:49 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2018-05-25 12:57:18 -0700
commit	0b522fd22b986704d1056254961cc7988ae182eb (patch)
tree	472c18f77c5e6b2c1dae0f1aacd6234f5e53436b /tensorflow/core/protobuf
parent	ae0eb1b7f81f6d98e0503b9568c72feaa805e655 (diff)