Switch to the new accumulators in the sync_rep optimizer (currently called V2). Please note that the gradients from replicas are now averaged instead of summed (as in the old sync_replicas_optimizer) so you need to increase the learning rate according to the number of replicas. This change is introduced to be consistent with how gradients are aggregated (averaged) within a batch in a replica.

As shown in the code change, the switch results in: 1. much cleaner and simpler code. 2. much more efficient and reliable staleness check. It is now 100% strict with no extra contention to PS servers. 3. no need for clean_up op so we can get rid of the abort_op which can confuse users. 4. number of replicas can be changed without complaints from checkpoint as the local_step is now just a local variable instead of a global vector variable. This has been tried with manual restarts of workers (chief or non chief) and ps and seems to be quite robust. Change: 135513399
author: Jianmin Chen <jmchen@google.com> 2016-10-07 12:53:06 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2016-10-07 14:03:39 -0700
commit: ecdee38a534133ecd7ba18e58527cc4120277190 (patch)
tree: 5b76e2e8a3038cb3b11539121360c062c2719154 /tensorflow/python/training/training.py
parent: 2c8d270735176df1a59b5a80885b2e14b4f06953 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/tensorflow/python/training/training.py b/tensorflow/python/training/training.py
index a814eb99ce..284cc43bc4 100644
--- a/tensorflow/python/training/training.py
+++ b/tensorflow/python/training/training.py
@@ -182,6 +182,7 @@ from tensorflow.python.training.rmsprop import RMSPropOptimizer
 from tensorflow.python.training.gradient_descent import GradientDescentOptimizer
 from tensorflow.python.training.proximal_gradient_descent import ProximalGradientDescentOptimizer
 from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizer
+from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizerV2
 
 # Utility classes for training.
 from tensorflow.python.training.coordinator import Coordinator
author	Jianmin Chen <jmchen@google.com>	2016-10-07 12:53:06 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2016-10-07 14:03:39 -0700
commit	ecdee38a534133ecd7ba18e58527cc4120277190 (patch)
tree	5b76e2e8a3038cb3b11539121360c062c2719154 /tensorflow/python/training/training.py
parent	2c8d270735176df1a59b5a80885b2e14b4f06953 (diff)