From ecdee38a534133ecd7ba18e58527cc4120277190 Mon Sep 17 00:00:00 2001 From: Jianmin Chen Date: Fri, 7 Oct 2016 12:53:06 -0800 Subject: Switch to the new accumulators in the sync_rep optimizer (currently called V2). Please note that the gradients from replicas are now averaged instead of summed (as in the old sync_replicas_optimizer) so you need to increase the learning rate according to the number of replicas. This change is introduced to be consistent with how gradients are aggregated (averaged) within a batch in a replica. As shown in the code change, the switch results in: 1. much cleaner and simpler code. 2. much more efficient and reliable staleness check. It is now 100% strict with no extra contention to PS servers. 3. no need for clean_up op so we can get rid of the abort_op which can confuse users. 4. number of replicas can be changed without complaints from checkpoint as the local_step is now just a local variable instead of a global vector variable. This has been tried with manual restarts of workers (chief or non chief) and ps and seems to be quite robust. Change: 135513399 --- tensorflow/python/training/training.py | 1 + 1 file changed, 1 insertion(+) (limited to 'tensorflow/python/training/training.py') diff --git a/tensorflow/python/training/training.py b/tensorflow/python/training/training.py index a814eb99ce..284cc43bc4 100644 --- a/tensorflow/python/training/training.py +++ b/tensorflow/python/training/training.py @@ -182,6 +182,7 @@ from tensorflow.python.training.rmsprop import RMSPropOptimizer from tensorflow.python.training.gradient_descent import GradientDescentOptimizer from tensorflow.python.training.proximal_gradient_descent import ProximalGradientDescentOptimizer from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizer +from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizerV2 # Utility classes for training. from tensorflow.python.training.coordinator import Coordinator -- cgit v1.2.3