diff options
author | TensorFlower Gardener <gardener@tensorflow.org> | 2018-08-27 10:51:36 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2018-08-27 10:51:36 -0700 |
commit | d9b33f24106fa389ea0f94b583efb27a3b8dc79a (patch) | |
tree | 2ce2a456c5a513104c72a70bdcc77a2abda80357 /tensorflow/contrib/optimizer_v2 | |
parent | dfa007e4562fb85fd5320a0c7ca8a00e50e8b34d (diff) | |
parent | 6ded4099af9527d224fbf6837aa928f501e80b0f (diff) |
Merge pull request #21552 from sbrodehl:patch-1
PiperOrigin-RevId: 210392464
Diffstat (limited to 'tensorflow/contrib/optimizer_v2')
-rw-r--r-- | tensorflow/contrib/optimizer_v2/adam.py | 9 |
1 files changed, 4 insertions, 5 deletions
diff --git a/tensorflow/contrib/optimizer_v2/adam.py b/tensorflow/contrib/optimizer_v2/adam.py index 631d4f44df..04b1552b61 100644 --- a/tensorflow/contrib/optimizer_v2/adam.py +++ b/tensorflow/contrib/optimizer_v2/adam.py @@ -40,15 +40,14 @@ class AdamOptimizer(optimizer_v2.OptimizerV2): Initialization: - $$m_0 := 0 (Initialize initial 1st moment vector)$$ - $$v_0 := 0 (Initialize initial 2nd moment vector)$$ - $$t := 0 (Initialize timestep)$$ - + $$m_0 := 0 \text{(Initialize initial 1st moment vector)}$$ + $$v_0 := 0 \text{(Initialize initial 2nd moment vector)}$$ + $$t := 0 \text{(Initialize timestep)}$$ The update rule for `variable` with gradient `g` uses an optimization described at the end of section2 of the paper: $$t := t + 1$$ - $$lr_t := \text{learning_rate} * \sqrt{(1 - beta_2^t) / (1 - beta_1^t)}$$ + $$lr_t := \text{learning\_rate} * \sqrt{1 - beta_2^t} / (1 - beta_1^t)$$ $$m_t := beta_1 * m_{t-1} + (1 - beta_1) * g$$ $$v_t := beta_2 * v_{t-1} + (1 - beta_2) * g * g$$ |