aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/mixed_precision
diff options
context:
space:
mode:
authorGravatar James Qin <jamesqin@google.com>2018-06-10 22:15:46 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2018-06-10 22:18:40 -0700
commit3a1d8bd815b5216bc9515801e4d59cf3ebd1126d (patch)
treec0fb2c362712d62b2e8ccaba12ae700812e145ed /tensorflow/contrib/mixed_precision
parent119db15241e29587e0b6ab3912bff5ff63d123eb (diff)
Improve the loss_scale_optimizer docstring.
PiperOrigin-RevId: 200001771
Diffstat (limited to 'tensorflow/contrib/mixed_precision')
-rw-r--r--tensorflow/contrib/mixed_precision/python/loss_scale_optimizer.py42
1 files changed, 24 insertions, 18 deletions
diff --git a/tensorflow/contrib/mixed_precision/python/loss_scale_optimizer.py b/tensorflow/contrib/mixed_precision/python/loss_scale_optimizer.py
index e4e5ccc334..ef34f7bf7b 100644
--- a/tensorflow/contrib/mixed_precision/python/loss_scale_optimizer.py
+++ b/tensorflow/contrib/mixed_precision/python/loss_scale_optimizer.py
@@ -26,26 +26,32 @@ from tensorflow.python.training import optimizer
class LossScaleOptimizer(optimizer.Optimizer):
+ # TODO(jamesqin): move mixed precision training explanation to __init__
+ # docstring.
"""An optimizer that applies loss scaling in backprop.
- This class is useful for mixed precision training on GPUs (or other potential
- accelerators), which is an approach to improve compute throughput without loss
- of model quality.
-
- The commmon configuration of mixed precision models is the following:
- * variables are kept in high precision (e.g. float32).
- * computations are done in lower precision (e.g. float16). variables are
- casted to lower precision before they're used.
- * (in training), final gradients are casted back to variable precision and get
- applied.
-
- Because computations happen in lower precision, gradients in the backprop pass
- might underflow in the smaller dynamic range, causing a model to converge at a
- suboptimal level. This optimizer multiplies the loss by a factor before
- backprop starts to prevent underflow. Before gradients are applied, they are
- casted to higher precision and down-scaled by the same factor, so
- mathematically the variable updates are no different from regular
- same-precision training.
+ This class is useful for "mixed precision training" on GPUs (or other
+ potential accelerators), an approach to improve compute throughput without
+ compromising model quality.
+
+ The canonical way to perform mixed precision training is the following:
+ * Model variables are kept in high precision (e.g. float32).
+ * Computations are done in lower precision (e.g. float16), which enjoys
+ performance speedup by virtue of hardware support. Variables are casted to
+ lower precision before they're used.
+ * Final gradients are casted back to high precision dtype, then used to update
+ variables.
+
+ The side-effect of performing computation in lower precision, is that it comes
+ with smaller numerical range. During backproping, small gradients might
+ underflow in the reduced numerical range, causing a model to converge at
+ suboptimal level.
+
+ To prevent underflow, this optimizer multiplies the loss by a factor before
+ backprop starts. Consequently, the gradients are linearly scaled up by the
+ same factor, thus not falling into the underflow zone. After that, to perserve
+ the correctness of backprop, the gradients are down-scaled by the same factor,
+ casted to the (higher) variable precision, then applied on the variables.
See [Nvidia's manual on mixed precision training](
https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html)