diff options
author | Suharsh Sivakumar <suharshs@google.com> | 2018-04-06 16:51:05 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2018-04-06 16:54:53 -0700 |
commit | ba17f2a81949a0b35a92a4d6f7704d0fb2917bd3 (patch) | |
tree | dedd65b00bc9252b291313af8207009d2c4fdf9e /tensorflow/contrib/quantize | |
parent | d8d7d8ba35b9de83fbc983f753acf53e5185dfc0 (diff) |
Update docs to include the most relevant paper.
PiperOrigin-RevId: 191959657
Diffstat (limited to 'tensorflow/contrib/quantize')
-rw-r--r-- | tensorflow/contrib/quantize/README.md | 21 |
1 files changed, 13 insertions, 8 deletions
diff --git a/tensorflow/contrib/quantize/README.md b/tensorflow/contrib/quantize/README.md index 348c824a40..c83623ec94 100644 --- a/tensorflow/contrib/quantize/README.md +++ b/tensorflow/contrib/quantize/README.md @@ -2,14 +2,17 @@ tf.contrib.quantize provides tools for transforming graphs to include ops to model quantization of weights, biases and activations during both training and -inference. This is done using the +inference. The details of the transformation implemented in this package is +described here [1]. + +This is done using the [fake quantization op](https://www.tensorflow.org/versions/r0.12/api_docs/python/array_ops/fake_quantization). -Recent literature has shown that fixed point networks provide comparable -performance to floating point networks [1]. This is achieved by modeling the -quantization operation during training in both the forward and backward passes. +Literature has shown that fixed point networks provide comparable performance to +floating point networks [2]. This is achieved by modeling the quantization +operation during training in both the forward and backward passes. The fake quantization operator achieves this by modeling the quantizer as a pass -through estimator [2]. Note that during back propagation, the parameters are +through estimator [3]. Note that during back propagation, the parameters are updated at high precision as this is needed to ensure sufficient precision in accumulating tiny adjustments to the parameters. However, for the forward pass, the parameters and activations are quantized to the desired lower precision. @@ -61,9 +64,11 @@ These rewrites are an active area of research and experimentation, so the rewrites and quantized training will likely not work across all models, though we hope to work towards generalizing these techniques. +[1] B.Jacob et al., "Quantization and Training of Neural Networks for Efficient +Integer-Arithmetic-Only Inference", https://arxiv.org/abs/1712.05877 -[1] P.Gysel, "HARDWARE-ORIENTED APPROXIMATION OF CONVOLUTIONAL +[2] P.Gysel et al., "HARDWARE-ORIENTED APPROXIMATION OF CONVOLUTIONAL NEURAL NETWORKS", https://arxiv.org/pdf/1604.03168.pdf -[2] Y.Bengio, "Estimating or Propagating Gradients Through Stochastic Neurons -for Conditional Computation", https://arxiv.org/abs/1308.3432 +[3] Y.Bengio et al., "Estimating or Propagating Gradients Through Stochastic +Neurons for Conditional Computation", https://arxiv.org/abs/1308.3432 |