{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6Y8E0lw5eYWm" }, "source": [ "# Post Training Quantization" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "CIGrZZPTZVeO" }, "source": [ "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", " \u003ctd\u003e\n", " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/tutorials/post_training_quant.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", " \u003c/td\u003e\n", " \u003ctd\u003e\n", " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/tutorials/post_training_quant.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", " \u003c/td\u003e\n", "\u003c/table\u003e" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BTC1rDAuei_1" }, "source": [ "## Overview\n", "\n", "[TensorFlow Lite](https://www.tensorflow.org/lite/) now supports\n", "converting weights to 8 bit precision as part of model conversion from\n", "tensorflow graphdefs to TFLite's flat buffer format. Weight quantization\n", "achieves a 4x reduction in the model size. In addition, TFLite supports on the\n", "fly quantization and dequantization of activations to allow for:\n", "\n", "1. Using quantized kernels for faster implementation when available.\n", "\n", "2. Mixing of floating-point kernels with quantized kernels for different parts\n", " of the graph.\n", "\n", "Note that the activations are always stored in floating point. For ops that\n", "support quantized kernels, the activations are quantized to 8 bits of precision\n", "dynamically prior to processing and are de-quantized to float precision after\n", "processing. Depending on the model being converted, this can give a speedup over\n", "pure floating point computation.\n", "\n", "In contrast to\n", "[quantization aware training](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize)\n", ", the weights are quantized post training and the activations are quantized dynamically \n", "at inference in this method.\n", "Therefore, the model weights are not retrained to compensate for quantization\n", "induced errors. It is important to check the accuracy of the quantized model to\n", "ensure that the degradation is acceptable.\n", "\n", "In this tutorial, we train an MNIST model from scratch, check its accuracy in\n", "tensorflow and then convert the saved model into a Tensorflow Lite flatbuffer\n", "with weight quantization. We finally check the\n", "accuracy of the converted model and compare it to the original saved model. We\n", "run the training script mnist.py from\n", "[Tensorflow official mnist tutorial](https://github.com/tensorflow/models/tree/master/official/mnist).\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "2XsEP17Zelz9" }, "source": [ "## Building an MNIST model" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dDqqUIZjZjac" }, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "gyqAw1M9lyab" }, "outputs": [], "source": [ "! pip uninstall -y tensorflow\n", "! pip install -U tf-nightly" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "WsN6s5L1ieNl" }, "outputs": [], "source": [ "import tensorflow as tf\n", "tf.enable_eager_execution()" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "00U0taBoe-w7" }, "outputs": [], "source": [ "! git clone --depth 1 https://github.com/tensorflow/models" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "4XZPtSh-fUOc" }, "outputs": [], "source": [ "import sys\n", "import os\n", "\n", "if sys.version_info.major \u003e= 3:\n", " import pathlib\n", "else:\n", " import pathlib2 as pathlib\n", "\n", "# Add `models` to the python path.\n", "models_path = os.path.join(os.getcwd(), \"models\")\n", "sys.path.append(models_path)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "eQ6Q0qqKZogR" }, "source": [ "### Train and export the model" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "eMsw_6HujaqM" }, "outputs": [], "source": [ "saved_models_root = \"/tmp/mnist_saved_model\"" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "hWSAjQWagIHl" }, "outputs": [], "source": [ "# The above path addition is not visible to subprocesses, add the path for the subprocess as well.\n", "# Note: channels_last is required here or the conversion may fail. \n", "!PYTHONPATH={models_path} python models/official/mnist/mnist.py --train_epochs=1 --export_dir {saved_models_root} --data_format=channels_last" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5NMaNZQCkW9X" }, "source": [ "For the example, we only trained the model for a single epoch, so it only trains to ~96% accuracy.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xl8_fzVAZwOh" }, "source": [ "### Convert to a TFLite model\n", "\n", "The `savedmodel` directory is named with a timestamp. Select the most recent one: " ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "Xp5oClaZkbtn" }, "outputs": [], "source": [ "saved_model_dir = str(sorted(pathlib.Path(saved_models_root).glob(\"*\"))[-1])\n", "saved_model_dir" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "AT8BgkKmljOy" }, "source": [ "Using the python `TocoConverter`, the saved model can be converted into a TFLite model.\n", "\n", "First load the model using the `TocoConverter`:" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "_i8B2nDZmAgQ" }, "outputs": [], "source": [ "import tensorflow as tf\n", "tf.enable_eager_execution()\n", "converter = tf.contrib.lite.TocoConverter.from_saved_model(saved_model_dir)\n", "tflite_model = converter.convert()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "F2o2ZfF0aiCx" }, "source": [ "Write it out to a tflite file:" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "vptWZq2xnclo" }, "outputs": [], "source": [ "tflite_models_dir = pathlib.Path(\"/tmp/mnist_tflite_models/\")\n", "tflite_models_dir.mkdir(exist_ok=True, parents=True)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "Ie9pQaQrn5ue" }, "outputs": [], "source": [ "tflite_model_file = tflite_models_dir/\"mnist_model.tflite\"\n", "tflite_model_file.write_bytes(tflite_model)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "7BONhYtYocQY" }, "source": [ "To quantize the model on export, set the `post_training_quantize` flag:" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "g8PUvLWDlmmz" }, "outputs": [], "source": [ "# Note: If you don't have a recent tf-nightly installed, the\n", "# \"post_training_quantize\" line will have no effect.\n", "tf.logging.set_verbosity(tf.logging.INFO)\n", "converter.post_training_quantize = True\n", "tflite_quant_model = converter.convert()\n", "tflite_model_quant_file = tflite_models_dir/\"mnist_model_quant.tflite\"\n", "tflite_model_quant_file.write_bytes(tflite_quant_model)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PhMmUTl4sbkz" }, "source": [ "Note how the resulting file, with `post_training_quantize` set, is approximately `1/4` the size." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "JExfcfLDscu4" }, "outputs": [], "source": [ "!ls -lh {tflite_models_dir}" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "L8lQHMp_asCq" }, "source": [ "## Run the TFLite models" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "-5l6-ciItvX6" }, "source": [ "We can run the TensorFlow Lite model using the python TensorFlow Lite\n", "Interpreter. \n", "\n", "### load the test data\n", "\n", "First let's load the mnist test data to feed to it:" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "eTIuU07NuKFL" }, "outputs": [], "source": [ "import numpy as np\n", "mnist_train, mnist_test = tf.keras.datasets.mnist.load_data()\n", "images, labels = tf.to_float(mnist_test[0])/255.0, mnist_test[1]\n", "\n", "# Note: If you change the batch size, then use \n", "# `tf.contrib.lite.Interpreter.resize_tensor_input` to also change it for\n", "# the interpreter.\n", "mnist_ds = tf.data.Dataset.from_tensor_slices((images, labels)).batch(1)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Ap_jE7QRvhPf" }, "source": [ "### Load the model into an interpreter" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "Jn16Rc23zTss" }, "outputs": [], "source": [ "interpreter = tf.contrib.lite.Interpreter(model_path=str(tflite_model_file))\n", "interpreter.allocate_tensors()\n", "input_index = interpreter.get_input_details()[0][\"index\"]\n", "output_index = interpreter.get_output_details()[0][\"index\"]" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "J8Pztk1mvNVL" }, "outputs": [], "source": [ "tf.logging.set_verbosity(tf.logging.DEBUG)\n", "interpreter_quant = tf.contrib.lite.Interpreter(model_path=str(tflite_model_quant_file))" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "Afl6yGvWyqAr" }, "outputs": [], "source": [ "interpreter_quant.allocate_tensors()\n", "input_index = interpreter_quant.get_input_details()[0][\"index\"]\n", "output_index = interpreter_quant.get_output_details()[0][\"index\"]\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "2opUt_JTdyEu" }, "source": [ "### Test the model on one image" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "AKslvo2kwWac" }, "outputs": [], "source": [ "for img, label in mnist_ds.take(1):\n", " break\n", "\n", "interpreter.set_tensor(input_index, img)\n", "interpreter.invoke()\n", "predictions = interpreter.get_tensor(output_index)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "XZClM2vo3_bm" }, "outputs": [], "source": [ "import matplotlib.pylab as plt\n", "\n", "plt.imshow(img[0])\n", "template = \"True:{true}, predicted:{predict}\"\n", "_ = plt.title(template.format(true= str(label[0].numpy()),\n", " predict=str(predictions[0,0])))\n", "plt.grid(False)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "LwN7uIdCd8Gw" }, "source": [ "### Evaluate the models" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "05aeAuWjvjPx" }, "outputs": [], "source": [ "def eval_model(interpreter, mnist_ds):\n", " total_seen = 0\n", " num_correct = 0\n", "\n", " for img, label in mnist_ds:\n", " total_seen += 1\n", " interpreter.set_tensor(input_index, img)\n", " interpreter.invoke()\n", " predictions = interpreter.get_tensor(output_index)\n", " if predictions == label.numpy():\n", " num_correct += 1\n", "\n", " if total_seen % 500 == 0:\n", " print(\"Accuracy after %i images: %f\" %\n", " (total_seen, float(num_correct) / float(total_seen)))\n", "\n", " return float(num_correct) / float(total_seen)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "DqXBnDfJ7qxL" }, "outputs": [], "source": [ "print(eval_model(interpreter, mnist_ds))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Km3cY9ry8ZlG" }, "source": [ "We can repeat the evaluation on the weight quantized model to obtain:\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "-9cnwiPp6EGm" }, "outputs": [], "source": [ "print(eval_model(interpreter_quant, mnist_ds))\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "L7lfxkor8pgv" }, "source": [ "\n", "In this example, we have compressed model with no difference in the accuracy." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "M0o1FtmWeKZm" }, "source": [ "\n", "\n", "## Optimizing an existing model\n", "\n", "We now consider another example. Resnets with pre-activation layers (Resnet-v2) are widely used for vision applications.\n", " Pre-trained frozen graph for resnet-v2-101 is available at the\n", " [Tensorflow Lite model repository](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/models.md).\n", "\n", "We can convert the frozen graph to a TFLite flatbuffer with quantization by:\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "v5p5VcNPjILQ" }, "outputs": [], "source": [ "archive_path = tf.keras.utils.get_file(\"resnet_v2_101.tgz\", \"https://storage.googleapis.com/download.tensorflow.org/models/tflite_11_05_08/resnet_v2_101.tgz\", extract=True)\n", "archive_path = pathlib.Path(archive_path)\n", "archive_dir = str(archive_path.parent)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "-sxnXQuC4ThD" }, "source": [ "The `info.txt` file lists the input and output names. You can also find them using TensorBoard to visually inspect the graph." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "g_Q_OMEJ4LIc" }, "outputs": [], "source": [ "! cat {archive_dir}/resnet_v2_101_299_info.txt" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "ujCAFhqm-C6H" }, "outputs": [], "source": [ "graph_def_file = pathlib.Path(archive_path).parent/\"resnet_v2_101_299_frozen.pb\"\n", "input_arrays = [\"input\"] \n", "output_arrays = [\"output\"]\n", "converter = tf.contrib.lite.TocoConverter.from_frozen_graph(\n", " str(graph_def_file), input_arrays, output_arrays, input_shapes={\"input\":[1,299,299,3]})\n", "converter.post_training_quantize = True\n", "resnet_tflite_file = graph_def_file.parent/\"resnet_v2_101_quantized.tflite\"\n", "resnet_tflite_file.write_bytes(converter.convert())\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": {}, "colab_type": "code", "id": "vhOjeg1x9Knp" }, "outputs": [], "source": [ "\n", "!ls -lh {archive_dir}/*.tflite" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "qqHLaqFMCjRZ" }, "source": [ "\n", "The model size reduces from 171 MB to 43 MB.\n", "The accuracy of this model on imagenet can be evaluated using the scripts provided for [TFLite accuracy measurement](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite/tools/accuracy/ilsvrc).\n", "\n", "The optimized model top-1 accuracy is 76.8, the same as the floating point model." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "post-training-quant.ipynb", "private_outputs": true, "provenance": [], "toc_visible": true, "version": "0.3.2" }, "kernelspec": { "display_name": "Python 2", "name": "python2" } }, "nbformat": 4, "nbformat_minor": 0 }