From ecdee38a534133ecd7ba18e58527cc4120277190 Mon Sep 17 00:00:00 2001
From: Jianmin Chen <jmchen@google.com>
Date: Fri, 7 Oct 2016 12:53:06 -0800
Subject: Switch to the new accumulators in the sync_rep optimizer (currently
 called V2). Please note that the gradients from replicas are now averaged
 instead of summed (as in the old sync_replicas_optimizer) so you need to
 increase the learning rate according to the number of replicas. This change
 is introduced to be consistent with how gradients are aggregated (averaged)
 within a batch in a replica.

As shown in the code change, the switch results in:
1. much cleaner and simpler code.
2. much more efficient and reliable staleness check. It is now 100% strict with
no extra contention to PS servers.
3. no need for clean_up op so we can get rid of the abort_op which can confuse users.
4. number of replicas can be changed without complaints from checkpoint as the
local_step is now just a local variable instead of a global vector variable.

This has been tried with manual restarts of workers (chief or non chief) and
ps and seems to be quite robust.
Change: 135513399
---
 tensorflow/python/training/training.py | 1 +
 1 file changed, 1 insertion(+)

(limited to 'tensorflow/python/training/training.py')

diff --git a/tensorflow/python/training/training.py b/tensorflow/python/training/training.py
index a814eb99ce..284cc43bc4 100644
--- a/tensorflow/python/training/training.py
+++ b/tensorflow/python/training/training.py
@@ -182,6 +182,7 @@ from tensorflow.python.training.rmsprop import RMSPropOptimizer
 from tensorflow.python.training.gradient_descent import GradientDescentOptimizer
 from tensorflow.python.training.proximal_gradient_descent import ProximalGradientDescentOptimizer
 from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizer
+from tensorflow.python.training.sync_replicas_optimizer import SyncReplicasOptimizerV2
 
 # Utility classes for training.
 from tensorflow.python.training.coordinator import Coordinator
-- 
cgit v1.2.3