aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/optimizer_v2
diff options
context:
space:
mode:
authorGravatar Patrick Nguyen <drpng@google.com>2018-05-01 14:28:36 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2018-05-01 14:33:20 -0700
commit325d0ef21a48bea1cc618a2bd24a9776de417ce5 (patch)
treed41cf6304071e95bebd5747ca87dfca571e98634 /tensorflow/contrib/optimizer_v2
parent46bf1e8934b3bc8edeff3f218a50b0ee5806e96b (diff)
Merge changes from github.
PiperOrigin-RevId: 194997009
Diffstat (limited to 'tensorflow/contrib/optimizer_v2')
-rw-r--r--tensorflow/contrib/optimizer_v2/adam.py20
1 files changed, 8 insertions, 12 deletions
diff --git a/tensorflow/contrib/optimizer_v2/adam.py b/tensorflow/contrib/optimizer_v2/adam.py
index 42b7f92a76..d538ad0fb0 100644
--- a/tensorflow/contrib/optimizer_v2/adam.py
+++ b/tensorflow/contrib/optimizer_v2/adam.py
@@ -40,23 +40,19 @@ class AdamOptimizer(optimizer_v2.OptimizerV2):
Initialization:
- ```
- m_0 <- 0 (Initialize initial 1st moment vector)
- v_0 <- 0 (Initialize initial 2nd moment vector)
- t <- 0 (Initialize timestep)
- ```
+ $$m_0 := 0 (Initialize initial 1st moment vector)$$
+ $$v_0 := 0 (Initialize initial 2nd moment vector)$$
+ $$t := 0 (Initialize timestep)$$
The update rule for `variable` with gradient `g` uses an optimization
described at the end of section2 of the paper:
- ```
- t <- t + 1
- lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
+ $$t := t + 1$$
+ $$lr_t := \text{learning_rate} * \sqrt{(1 - beta_2^t) / (1 - beta_1^t)}$$
- m_t <- beta1 * m_{t-1} + (1 - beta1) * g
- v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
- variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
- ```
+ $$m_t := beta_1 * m_{t-1} + (1 - beta_1) * g$$
+ $$v_t := beta_2 * v_{t-1} + (1 - beta_2) * g * g$$
+ $$variable := variable - lr_t * m_t / (\sqrt{v_t} + \epsilon)$$
The default value of 1e-8 for epsilon might not be a good default in
general. For example, when training an Inception network on ImageNet a