diff options
author | 2017-01-09 17:55:52 -0800 | |
---|---|---|
committer | 2017-01-09 18:07:30 -0800 | |
commit | 3afdb06d64509ae99a157ce07955331352518160 (patch) | |
tree | 633d37ab6b83b3a83cd7abf1178c6ff777157b7c /tensorflow/g3doc/tutorials | |
parent | 61d0f1572c7701aa0e73c0bedd5c57b3f80f22e7 (diff) |
Layers tutorial and corresponding example code.
Change: 144031391
Diffstat (limited to 'tensorflow/g3doc/tutorials')
-rw-r--r-- | tensorflow/g3doc/tutorials/layers/index.md | 755 |
1 files changed, 755 insertions, 0 deletions
diff --git a/tensorflow/g3doc/tutorials/layers/index.md b/tensorflow/g3doc/tutorials/layers/index.md new file mode 100644 index 0000000000..2d0071a31a --- /dev/null +++ b/tensorflow/g3doc/tutorials/layers/index.md @@ -0,0 +1,755 @@ +# A Guide to TF Layers: Building a Convolutional Neural Network + +The TensorFlow [`layers` +module](https://www.tensorflow.org/code/tensorflow/python/layers/layers.py) +provides a high-level API that makes it easy to construct a neural network. It +provides methods that facilitate the creation of dense (fully connected) layers +and convolutional layers, adding activation functions, and applying dropout +regularization. In this tutorial, you'll learn how to use `layers` to build a +convolutional neural network model to recognize the handwritten digits in the +MNIST data set. + +![handwritten digits 0–9 from the MNIST data set](../../images/mnist_0-9.png) +**The [MNIST dataset](http://yann.lecun.com/exdb/mnist/) comprises 60,000 +training examples and 10,000 test examples of the handwritten digits 0–9, +formatted as 28x28-pixel monochrome images.** + +## Getting Started + +Let's set up the skeleton for our TensorFlow program. Create a file called +`cnn_mnist.py`, and add the following code: + +```python +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Imports +import numpy as np +import tensorflow as tf + +from tensorflow.contrib import learn +from tensorflow.contrib.learn.python.learn.estimators import model_fn as model_fn_lib + +tf.logging.set_verbosity(tf.logging.INFO) + +# Our application logic will be added here + +if __name__ == "__main__": + tf.app.run() +``` + +As you work through the tutorial, you'll add code to construct, train, and +evaluate the convolutional neural network. The complete, final code can be +[found +here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/layers/cnn_mnist.py). + +<p class="note"><b>NOTE:</b> Before proceeding, make sure you've +<a href="https://www.tensorflow.org/get_started/os_setup">installed the latest +version of TensorFlow</a> on your machine.</p> + +## Intro to Convolutional Neural Networks + +Convolutional neural networks (CNNs) are the current state-of-the-art model +architecture for image classification tasks. CNNs apply a series of filters to +the raw pixel data of an image to extract and learn higher-level features, which +the model can then use for classification. CNNs contains three components: + +* **Convolutional layers**, which apply a specified number of convolution + filters to the image. For each subregion, the layer performs a set of + mathematical operations to produce a single value in the output feature map. + Convolutional layers then typically apply a [ReLU activation + function](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\)) to + the output to introduce nonlinearities into the model. + +* **Pooling layers**, which [downsample the image + data](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer) + extracted by the convolutional layers to reduce the dimensionality of the + feature map in order to decrease processing time. A commonly used pooling + algorithm is max pooling, which extracts subregions of the feature map + (e.g., 2x2-pixel tiles), keeps their maximum value, and discards all other + values. + +* **Dense (fully connected) layers**, which perform classification on the + features extracted by the convolutional layers and downsampled by the + pooling layers. In a dense layer, every node in the layer is connected to + every node in the preceding layer. + +Typically, a CNN is composed of a stack of convolutional modules that perform +feature extraction. Each module consists of a convolutional layer followed by a +pooling layer. The last convolutional module is followed by one or more dense +layers that perform classification. The final dense layer in a CNN contains a +single node for each target class in the model (all the possible classes the +model may predict), with a +[softmax](https://en.wikipedia.org/wiki/Softmax_function) activation function to +generate a value between 0–1 for each node (the sum of all these softmax values +is equal to 1). We can interpret the softmax values for a given image as +relative measurements of how likely it is that the image falls into each target +class. + +NOTE: For a more comprehensive walkthrough of CNN architecture, see Stanford +University's [Convolutional Neural Networks for Visual Recognition course +materials](http://cs231n.github.io/convolutional-networks/). + +## Building the CNN MNIST Classifier {#building-cnn-classifier} + +Let's build a model to classify the images in the MNIST dataset using the +following CNN architecture: + +1. **Convolutional Layer #1**: Applies 32 5x5 filters (extracting 5x5-pixel + subregions), with ReLU activation function +2. **Pooling Layer #1**: Performs max pooling with a 2x2 filter and stride of 2 + (which specifies that pooled regions do not overlap) +3. **Convolutional Layer #2**: Applies 64 5x5 filters, with ReLU activation + function +4. **Pooling Layer #2**: Again, performs max pooling with a 2x2 filter and + stride of 2 +5. **Dense Layer #1**: 1,024 neurons, with dropout regularization rate of 0.4 + (probability of 0.4 that any given element will be dropped during training) +6. **Dense Layer #2 (Logits Layer)**: 10 neurons, one for each digit target + class (0–9). + +The `tf.layers` module contains methods to create each of the three layer types +above: + +* `conv2d()`. Constructs a two-dimensional convolutional layer. Takes number + of filters, filter kernel size, padding, and activation function as + arguments. +* `max_pooling2d()`. Constructs a two-dimensional pooling layer using the + max-pooling algorithm. Takes pooling filter size and stride as arguments. +* `dense()`. Constructs a dense layer. Takes number of neurons and activation + function as arguments. + +Each of these methods accepts a tensor as input and returns a transformed tensor +as output. This makes it easy to connect one layer to another: just take the +output from one layer-creation method and supply it as input to another. + +Open `cnn_mnist.py` and add the following `cnn_model_fn` function, which +conforms to the interface expected by TensorFlow's Estimator API (more on this +later in [Create the Estimator](#create-the-estimator)). `cnn_mnist.py` takes +MNIST feature data, labels, and [model +mode](../../api_docs/python/contrib.learn.md#ModeKeys) (`TRAIN`, `EVAL`, +`INFER`) as arguments; configures the CNN; and returns predictions, loss, and a +training operation: + +```python +def cnn_model_fn(features, labels, mode): + """Model function for CNN.""" + # Input Layer + input_layer = tf.reshape(features, [-1, 28, 28, 1]) + + # Convolutional Layer #1 + conv1 = tf.layers.conv2d( + inputs=input_layer, + filters=32, + kernel_size=[5, 5], + padding="same", + activation=tf.nn.relu) + + # Pooling Layer #1 + pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) + + # Convolutional Layer #2 and Pooling Layer #2 + conv2 = tf.layers.conv2d( + inputs=pool1, + filters=64, + kernel_size=[5, 5], + padding="same", + activation=tf.nn.relu) + pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) + + # Dense Layer + pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) + dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu) + dropout = tf.layers.dropout( + inputs=dense, rate=0.4, training=mode == learn.ModeKeys.TRAIN) + + # Logits Layer + logits = tf.layers.dense(inputs=dropout, units=10) + + loss = None + train_op = None + + # Calculate Loss (for both TRAIN and EVAL modes) + if mode != learn.ModeKeys.INFER: + onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) + loss = tf.losses.softmax_cross_entropy( + onehot_labels=onehot_labels, logits=logits) + + # Configure the Training Op (for TRAIN mode) + if mode == learn.ModeKeys.TRAIN: + train_op = tf.contrib.layers.optimize_loss( + loss=loss, + global_step=tf.contrib.framework.get_global_step(), + learning_rate=0.001, + optimizer="SGD") + + # Generate Predictions + predictions = { + "classes": tf.argmax( + input=logits, axis=1), + "probabilities": tf.nn.softmax( + logits, name="softmax_tensor") + } + + # Return a ModelFnOps object + return model_fn_lib.ModelFnOps( + mode=mode, predictions=predictions, loss=loss, train_op=train_op) +``` + +The following sections (with headings corresponding to each code block above) +dive deeper into the `tf.layers` code used to create each layer, as well as how +to calculate loss, configure the training op, and generate predictions. If +you're already experienced with CNNs and [TensorFlow +`Estimator`s](../estimators/index.md), and find the above code intuitive, you +may want to skim these sections or just skip ahead to ["Training and Evaluating +the CNN MNIST Classifier"](#training-evaluating). + +### Input Layer + +The methods in the `layers` module for creating convolutional and pooling layers +for two-dimensional image data expect input tensors to have a shape of +<code>[<em>batch_size</em>, <em>image_width</em>, <em>image_height</em>, +<em>channels</em>]</code>, defined as follows: + +* _`batch_size`_. Size of the subset of examples to use when performing + gradient descent during training. +* _`image_width`_. Width of the example images. +* _`image_height`_. Height of the example images. +* _`channels`_. Number of color channels in the example images. For color + images, the number of channels is 3 (red, green, blue). For monochrome + images, there is just 1 channel (black). + +Here, our MNIST dataset is composed of monochrome 28x28 pixel images, so the +desired shape for our input layer is <code>[<em>batch_size</em>, 28, 28, +1]</code>. + +To convert our input feature map (`features`) to this shape, we can perform the +following `reshape` operation: + +```python +input_layer = tf.reshape(features, [-1, 28, 28, 1]) +``` + +Note that we've indicated `-1` for batch size, which specifies that this +dimension should be dynamically computed based on the number of input values in +`features`, holding the size of all other dimensions constant. This allows us to +treat `batch_size` as a hyperparameter that we can tune. For example, if we feed +examples into our model in batches of 5, `features` will contain 3,920 values +(one value for each pixel in each image), and `input_layer` will have a shape of +`[5, 28, 28, 1]`. Similarly, if we feed examples in batches of 100, `features` +will contain 78,400 values, and `input_layer` will have a shape of `[100, 28, +28, 1]`. + +### Convolutional Layer #1 + +In our first convolutional layer, we want to apply 32 5x5 filters to the input +layer, with a ReLU activation function. We can use the `conv2d()` method in the +`layers` module to create this layer as follows: + +```python +conv1 = tf.layers.conv2d( + inputs=input_layer, + filters=32, + kernel_size=[5, 5], + padding="same", + activation=tf.nn.relu) +``` + +The `inputs` argument specifies our input tensor, which must have the shape +<code>[<em>batch_size</em>, <em>image_width</em>, <em>image_height</em>, +<em>channels</em>]</code>. Here, we're connecting our first convolutional layer +to `input_layer`, which has the shape <code>[<em>batch_size</em>, 28, 28, +1]</code>. + +<p class="note"><b>NOTE:</b> <code>conv2d()</code> will instead accept a shape of <code>[<em>channels</em>, +<em>batch_size</em>, <em>image_width</em>, <em>image_height</em>]</code> when +passed the argument <code>data_format=channels_first</code>.</p> + +The `filters` argument specifies the number of filters to apply (here, 32), and +`kernel_size` specifies the dimensions of the filters as <code>[<em>width</em>, +<em>height</em>]</code> (here, <code>[5, 5]</code>). + +<p class="tip"><b>TIP:</b> If filter width and height have the same value, you can instead specify a +single integer for <code>kernel_size</code>—e.g., <code>kernel_size=5</code>.</p> + +The `padding` argument specifies one of two enumerated values +(case-insensitive): `valid` (default value) or `same`. To specify that the +output tensor should have the same width and height values as the input tensor, +we set `padding=same` here, which instructs TensorFlow to add 0 values to the +edges of the output tensor to preserve width and height of 28. (Without padding, +a 5x5 convolution over a 28x28 tensor will produce a 24x24 tensor, as there are +24x24 locations to extract a 5x5 tile from a 28x28 grid.) + +The `activation` argument specifies the activation function to apply to the +output of the convolution. Here, we specify ReLU activation with +[`tf.nn.relu`](../../api_docs/python/nn.md#relu). + +Our output tensor produced by `conv2d()` has a shape of +<code>[<em>batch_size</em>, 28, 28, 1]</code>: the same width and height +dimensions as the input, but now with 32 channels holding the output from each +of the filters. + +### Pooling Layer #1 + +Next, we connect our first pooling layer to the convolutional layer we just +created. We can use the `max_pooling2d()` method in `layers` to construct a +layer that performs max pooling with a 2x2 filter and stride of 2: + +```python +pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) +``` + +Again, `inputs` specifies the input tensor, with a shape of +<code>[<em>batch_size</em>, <em>image_width</em>, <em>image_height</em>, +<em>channels</em>]</code>. Here, our input tensor is `conv1`, the output from +the first convolutional layer, which has a shape of <code>[<em>batch_size</em>, +28, 28, 32]</code>. + +<p class="note"><b>NOTE:</b> As with <code>conv2d()</code>, <code>max_pooling2d()</code> will instead accept a shape of +<code>[<em>channels</em>, <em>batch_size</em>, <em>image_width</em>, +<em>image_height</em>]</code> when passed the argument +<code>data_format=channels_first</code>.</p> + +The `pool_size` argument specifies the size of the max pooling filter as +<code>[<em>width</em>, <em>height</em>]</code> (here, `[2, 2]`). If both +dimensions have the same value, you can instead specify a single integer (e.g., +`pool_size=2`). + +The `strides` argument specifies the size of the stride. Here, we set a stride +of 2, which indicates that the subregions extracted by the filter should be +separated by 2 pixels in both the width and height dimensions (for a 2x2 filter, +this means that none of the regions extracted will overlap). If you want to set +different stride values for width and height, you can instead specify a tuple or +list (e.g., `stride=[3,6]`). + +Our output tensor produced by `max_pooling2d()` (`pool1`) has a shape of +<code>[<em>batch_size</em>, 14, 14, 1]</code>: the 2x2 filter reduces width and +height by 50%. + +### Convolutional Layer #2 and Pooling Layer #2 + +We can connect a second convolutional and pooling layer to our CNN using +`conv2d()` and `max_pooling2d()` as before. For convolutional layer #2, we +configure 64 5x5 filters with ReLU activation, and for pooling layer #2, we use +the same specs as pooling layer #1 (a 2x2 max pooling filter with stride of 2): + +```python +conv2 = tf.layers.conv2d( + inputs=pool1, + filters=64, + kernel_size=[5, 5], + padding="same", + activation=tf.nn.relu) + +pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) +``` + +Note that convolutional layer #2 takes the output tensor of our first pooling +layer (`pool1`) as input, and produces the tensor `h_conv2` as output. `conv2` +has a shape of <code>[<em>batch_size</em>, 14, 14, 64]</code>, the same width +and height as `pool1` (due to `padding="same"`), and 64 channels for the 64 +filters applied. + +Pooling layer #2 takes `conv2` as input, producing `pool2` as output. `pool2` +has shape <code>[<em>batch_size</em>, 7, 7, 64]</code> (50% reduction of width +and height from `conv2`). + +### Dense Layer + +Next, we want to add a dense layer (with 1,024 neurons and ReLU activation) to +our CNN to perform classification on the features extracted by the +convolution/pooling layers. Before we connect the layer, however, we'll flatten +our feature map (`pool2`) to shape <code>[<em>batch_size</em>, +<em>features</em>]</code>, so that our tensor has only two dimensions: + +```python +pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) +``` + +In the `reshape()` operation above, the `-1` signifies that the *`batch_size`* +dimension will be dynamically calculated based on the number of examples in our +input data. Each example has 7 (`pool2` width) * 7 (`pool2` height) * 64 +(`pool2` channels) features, so we want the `features` dimension to have a value +of 7 * 7 * 64 (3136 in total). The output tensor, `pool2_flat`, has shape +<code>[<em>batch_size</em>, 3136]</code>. + +Now, we can use the `dense()` method in `layers` to connect our dense layer as +follows: + +```python +dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu) +``` + +The `inputs` argument specifies the input tensor: our flattened feature map, +`pool2_flat`. The `units` argument specifies the number of neurons in the dense +layer (1,024). The `activation` argument takes the activation function; again, +we'll use `tf.nn.relu` to add ReLU activation. + +To help improve the results of our model, we also apply dropout regularization +to our dense layer, using the `dropout` method in `layers`: + +```python +dropout = tf.layers.dropout( + inputs=dense, rate=0.4, training=mode == learn.ModeKeys.TRAIN) +``` + +Again, `inputs` specifies the input tensor, which is the output tensor from our +dense layer (`dense`). + +The `rate` argument specifies the dropout rate; here, we use `0.4`, which means +40% of the elements will be randomly dropped out during training. + +The `training` argument takes a boolean specifying whether or not the model is +currently being run in training mode; dropout will only be performed if +`training` is `True`. Here, we check if the `mode` passed to our model function +`cnn_model_fn` is `TRAIN` mode. + +Our output tensor `dropout` has shape <code>[<em>batch_size</em>, 1024]</code>. + +### Logits Layer + +The final layer in our neural network is the logits layer, which will return the +raw values for our predictions. We create a dense layer with 10 neurons (one for +each target class 0–9), with linear activation (the default): + +```python +logits = tf.layers.dense(inputs=dropout, units=10) +``` + +Our final output tensor of the CNN, `logits`, has shape +<code>[<em>batch_size</em>, 10]</code>. + +### Calculate Loss {#calculating-loss} + +For both training and evaluation, we need to define a [loss +function](https://en.wikipedia.org/wiki/Loss_function) that measures how closely +the model's predictions match the target classes. For multiclass classification +problems like MNIST, [cross +entropy](https://en.wikipedia.org/wiki/Cross_entropy) is typically used as the +loss metric. The following code calculates cross entropy when the model runs in +either `TRAIN` or `EVAL` mode: + +```python +loss = None +train_op = None + +# Calculate loss for both TRAIN and EVAL modes +if mode != learn.ModeKeys.INFER: + onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) + loss = tf.losses.softmax_cross_entropy( + onehot_labels=onehot_labels, logits=logits) +``` + +Let's take a closer look at what's happening above. + +Our `labels` tensor contains a list of predictions for our examples, e.g. `[1, +9, ...]`. In order to calculate cross-entropy, first we need to convert `labels` +to the corresponding [one-hot +encoding](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science): + +```none +[[0, 1, 0, 0, 0, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 0, 0, 0, 0, 1], + ...] +``` + +We use the [`tf.one_hot()`](../../api_docs/python/array_ops.md#one_hot) function +to perform this conversion. `tf.one_hot()` has two required arguments: + +* `indices`. The locations in the one-hot tensor that will have "on + values"—i.e., the locations of `1` values in the tensor shown above. +* `depth`. The depth of the one-hot tensor—i.e., the number of target classes. + Here, the depth is `10`. + +The following code creates the one-hot tensor for our labels, `onehot_labels`: + +```python +onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) +``` + +Because `labels` contains a series of values from 0–9, `indices` is just our +`labels` tensor, with values cast to integers. The `depth` is `10` because we +have 10 possible target classes, one for each digit. + +Next, we compute cross-entropy of `onehot_labels` and the softmax of the +predictions from our logits layer. `tf.losses.softmax_cross_entropy()` takes +`onehot_labels` and `logits` as arguments, performs softmax activation on +`logits`, calculates cross-entropy, and returns our `loss` as a scalar `Tensor`: + +```python +loss = tf.losses.softmax_cross_entropy( + onehot_labels=onehot_labels, logits=logits) +``` + +### Configure the Training Op + +In the previous section, we defined loss for our CNN as the softmax +cross-entropy of the logits layer and our labels. Let's configure our model to +optimize this loss value during training, using the +[`optimize_loss()`](../../api_docs/python/contrib.layers.md#optimize_loss) +method in `tf.contrib.layers`. We'll use a learning rate of 0.001 and +[stochastic gradient +descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) as the +optimization algorithm: + +```python +# Configure the Training Op (for TRAIN mode) +if mode == learn.ModeKeys.TRAIN: + train_op = tf.contrib.layers.optimize_loss( + loss=loss, + global_step=tf.contrib.framework.get_global_step(), + learning_rate=0.001, + optimizer="SGD") +``` + +<p class="note"><b>NOTE:</b> For a more in-depth look at configuring training ops for Estimator model +functions, see <a href="../estimators/index.md#defining_the_training_op_for_the_model">"Defining the training op for the +model"</a> in the +<a href="../estimators/index.md">"Creating Estimations in tf.contrib.learn"]</a> tutorial.</p> + +### Generate Predictions {#generate-predictions} + +The logits layer of our model returns our predictions as raw values in a +<code>[<em>batch_size</em>, 10]</code>-dimensional tensor. Let's convert these +raw values into two different formats that our model function can return: + +* The **predicted class** for each example: a digit from 0–9. +* The **probabilities** for each possible target class for each example: the + probability that the example is a 0, is a 1, is a 2, etc. + +For a given example, our predicted class is the element in the corresponding row +of the logits tensor with the highest raw value. We can find the index of this +element using the [`tf.argmax()`](../../api_docs/python/math_ops.md#argmax) +function: + +```python +tf.argmax(input=logits, axis=1) +``` + +The `input` argument specifies the tensor from which to extract maximum +values—here `logits`. The `axis` argument specifies the axis of the `input` +tensor along which to find the greatest value. Here, we want to find the largest +value along the dimension with index of 1, which corresponds to our predictions +(recall that our logits tensor has shape <code>[<em>batch_size</em>, +10]</code>). + +We can derive probabilities from our logits layer by applying softmax activation +using [`tf.nn.softmax()`](../../api_docs/python/nn.md#softmax): + +```python +tf.nn.softmax(logits, name="softmax_tensor") +``` + +<p class="note"><b>NOTE:</b We use the `name` argument to explicitly name this operation `softmax_tensor`, so we can reference it later. (We'll set up logging for the softmax values in <a href="#set-up-a-logging-hook">Set Up a Logging Hook</a>.)</p> + +We compile our predictions in a dict as follows: + +```python +predictions = { + "classes": tf.argmax( + input=logits, axis=1), + "probabilities": tf.nn.softmax( + logits, name="softmax_tensor") +} +``` + +Finally, now that we've got our `predictions`, `loss`, and `train_op`, we can +return them, along with our `mode` argument, in a +[`ModelFnOps`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/estimators/model_fn.py) +object: + +```python +# Return a ModelFnOps object +return model_fn_lib.ModelFnOps( + mode=mode, predictions=predictions, loss=loss, train_op=train_op) +``` + +## Training and Evaluating the CNN MNIST Classifier {#training-evaluating} + +We've coded our MNIST CNN model function; now we're ready to train and evaluate +it. + +### Load Training and Test Data + +First, let's load our training and test data. Add a `main()` function to +`cnn_mnist.py` with the following code: + +```python +def main(unused_argv): + # Load training and eval data + mnist = learn.datasets.load_dataset("mnist") + train_data = mnist.train.images # Returns np.array + train_labels = np.asarray(mnist.train.labels, dtype=np.int32) + eval_data = mnist.test.images # Returns np.array + eval_labels = np.asarray(mnist.test.labels, dtype=np.int32) +``` + +We store the training feature data (the raw pixel values for 55,000 images of +hand-drawn digits) and training labels (the corresponding value from 0–9 for +each image) as [numpy +arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html) +in `train_data` and `train_labels`, respectively. Similarly, we store the +evalulation feature data (10,000 images) and evaluation labels in `eval_data` +and `eval_labels`, respectively. + +### Create the Estimator {#create-the-estimator} + +Next, let's create an `Estimator` (a TensorFlow class for performing high-level +model training, evaluation, and inference) for our model. Add the following code +to `main()`: + +```python +# Create the Estimator +mnist_classifier = learn.Estimator( + model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model") +``` + +The `model_fn` argument specifies the model function to use for training, +evaluation, and inference; we pass it the `cnn_model_fn` we created in["Building +the CNN MNIST Classifier."](#building-cnn-classifier) The `model_dir` argument +specifies the directory where model data (checkpoints) will be saved (here, we +specify the temp directory `/tmp/mnist_convnet_model`, but feel free to change +to another directory of your choice). + +<p class="note"><b>NOTE:</b> For an in-depth walkthrough of the TensorFlow <code>Estimator</code> API, see the tutorial <a href="../estimators/index.md">"Creating Estimators in tf.contrib.learn."</a></p> + +### Set Up a Logging Hook {#set-up-a-logging-hook} + +Since CNNs can take a while to train, let's set up some logging so we can track +progress during training. We can use TensorFlow's [`SessionRunHook` +API](../../api_docs/python/train.md#SessionRunHook) to create a +[`LoggingTensorHook`](https://tensorflow.org/api_docs/python/train.html?cl=head#LoggingTensorHook) +that will log the probability values from the softmax layer of our CNN. Add the +following to `main()`: + +```python +# Set up logging for predictions + tensors_to_log = {"probabilities": "softmax_tensor"} + logging_hook = tf.train.LoggingTensorHook( + tensors=tensors_to_log, every_n_iter=50) +``` + +We store a dict of the tensors we want to log in `tensors_to_log`. Each key is a +label of our choice that will be printed in the log output, and the +corresponding label is the name of a `Tensor` in the TensorFlow graph. Here, our +`probabilities` can be found in `softmax_tensor`, the name we gave our softmax +operation earlier when we generated the probabilities in `cnn_model_fn`. + +<p class="note"><b>NOTE:</b> If you don't explicitly assign a name to an operation via +the `name` argument, TensorFlow will assign a default name. A couple easy ways to discover +the names applied to operations are to visualize your graph on <a href="../../how_tos/graph_viz/index.md">TensorBoard</a>) +or to enable the <a href="../../how_tos/debugger/index.md">TensorFlow Debugger (tfdbg)</a>.</p> + +Next, we create the `LoggingTensorHook`, passing `tensors_to_log` to the +`tensors` argument. We set `every_n_iter=50`, which specifies that probabilities +should be logged after every 50 steps of training. + +### Train the Model + +Now we're ready to train our model, which we can do by calling `fit()` on +`mnist_classifier`. Add the following to `main()`: + +```python +# Train the model +mnist_classifier.fit( + x=train_data, + y=train_labels, + batch_size=100, + steps=20000, + monitors=[logging_hook]) +``` + +In the `fit` call, we pass the training feature data and labels to `x` and `y`, +respectively. We set a `batch_size` of `100` (which means that the model will +train on minibatches of 100 examples at each step), and `steps` of `20000` +(which means the model will train for 20,000 steps total). We pass our +`logging_hook` to the `monitors` argument, so that it will be triggered during +training. + +### Evaluate the Model + +Once training is complete, we want to evaluate our model to determine its +accuracy on the MNIST test set. To set up the accuracy metric for our model, we +need to create a metrics dict with a +[`MetricSpec`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/metric_spec.py) +that calculates accuracy. Add the following to `main()`: + +```python +# Configure the accuracy metric for evaluation +metrics = { + "accuracy": + learn.metric_spec.MetricSpec( + metric_fn=tf.metrics.accuracy, prediction_key="classes"), +} +``` + +We create our `MetricSpec`s with the following two arguments: + +* `metric_fn`. The function that calculates and returns the value of our + metric. Here, we can use the predefined `accuracy` function in the + [`tf.metrics`](https://www.tensorflow.org/code/tensorflow/python/ops/metrics_impl.py) + module. +* `prediction_key`. The key of the tensor that contains the predictions + returned by the model function. Here, because we're building a + classification model, the prediction key is `"classes"`, which we specified + back in ["Generate Predictions."](#generate-predictions) + +Now that we've set up our `metrics` dict, we can evaluate the model. Add the +following code, which performs evaluation and prints the results: + +```python +# Evaluate the model and print results +eval_results = mnist_classifier.evaluate( + x=eval_data, y=eval_labels, metrics=metrics) +print(eval_results) +``` + +We pass our evaluation feature data and labels to `evaluate()` in the `x` and +`y` arguments, respectively. The `metrics` argument takes the metrics dict we +just defined. + +### Run the Model + +We've coded the CNN model function, `Estimator`, and the training/evaluation +logic; now let's see the results. Run `cnn_mnist.py`. + +<p class="note"><b>NOTE:</b> Training CNNs is quite computationally intensive. Estimated completion time of <code>cnn_mnist.py</code> will vary depending on your processor, but will likely be upwards of 1 hour on CPU. To train more quickly, you can decrease the number of <code>steps</code> passed to <code>fit()</code>, but note that this will affect accuracy.</p> + +As the model trains, you'll see log output like the following: + +```python +INFO:tensorflow:loss = 2.36026, step = 1 +INFO:tensorflow:probabilities = [[ 0.07722801 0.08618255 0.09256398, ...]] +... +INFO:tensorflow:loss = 2.13119, step = 101 +INFO:tensorflow:global_step/sec: 5.44132 +... +INFO:tensorflow:Loss for final step: 0.553216. +``` + +When training is complete, you'll see the results of the model evaluation, e.g.: + +```python +INFO:tensorflow:Restored model from /tmp/mnist_convnet_model +INFO:tensorflow:Eval steps [0,inf) for training step 20000. +INFO:tensorflow:Input iterator is exhausted. +INFO:tensorflow:Saving evaluation summary for step 20000: accuracy = 0.9733, loss = 0.0902271 +{'loss': 0.090227105, 'global_step': 20000, 'accuracy': 0.97329998} +``` + +Here, we've achieved an accuracy of 97.3% on our test data set. + +## Additional Resources + +To learn more about TensorFlow Estimators and CNNs in TensorFlow, see the +following resources: + +* [Creating Estimators in tf.contrib.learn](../estimators/index.md). An + introduction to the TensorFlow Estimator API, which walks through + configuring an Estimator, writing a model function, calculating loss, and + defining a training op. +* [Deep MNIST for Experts: Building a Multilayer + CNN](../mnist/pros/index.md#build_a_multilayer_convolutional_network). Walks + through how to build a MNIST CNN classification model *without layers* using + lower-level TensorFlow operations. |