diff options
Diffstat (limited to 'tensorflow/g3doc/tutorials/mnist')
-rwxr-xr-x | tensorflow/g3doc/tutorials/mnist/__init__.py | 0 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/beginners/index.md | 420 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/download/index.md | 85 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py | 219 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/input_data.py | 175 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/mnist.py | 148 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/mnist_softmax.py | 33 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/pros/index.md | 390 | ||||
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/tf/index.md | 513 |
9 files changed, 1983 insertions, 0 deletions
diff --git a/tensorflow/g3doc/tutorials/mnist/__init__.py b/tensorflow/g3doc/tutorials/mnist/__init__.py new file mode 100755 index 0000000000..e69de29bb2 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/__init__.py diff --git a/tensorflow/g3doc/tutorials/mnist/beginners/index.md b/tensorflow/g3doc/tutorials/mnist/beginners/index.md new file mode 100644 index 0000000000..8ccb69d977 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/beginners/index.md @@ -0,0 +1,420 @@ +# MNIST Softmax Regression (For Beginners) + +*This tutorial is intended for readers who are new to both machine learning and +TensorFlow. If you already +know what MNIST is, and what softmax (multinomial logistic) regression is, +you might prefer this [faster paced tutorial](../pros/index.md).* + +When one learns how to program, there's a tradition that the first thing you do +is print "Hello World." Just like programming has Hello World, machine learning +has MNIST. + +MNIST is a simple computer vision dataset. It consists of images of handwritten +digits like these: + +<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/MNIST.png"> +</div> + +It also includes labels for each image, telling us which digit it is. For +example, the labels for the above images are 5, 0, 4, and 1. + +In this tutorial, we're going to train a model to look at images and predict +what digits they are. Our goal isn't to train a really elaborate model that +achieves state-of-the-art performance -- although we'll give you code to do that +later! -- but rather to dip a toe into using TensorFlow. As such, we're going +to start with a very simple model, called a Softmax Regression. + +The actual code for this tutorial is very short, and all the interesting +stuff happens in just three lines. However, it is very +important to understand the ideas behind it: both how TensorFlow works and the +core machine learning concepts. Because of this, we are going to very carefully +work through the code. + +## The MNIST Data + +The MNIST data is hosted on +[Yann LeCun's website](http://yann.lecun.com/exdb/mnist/). +For your convenience, we've included some python code to download and install +the data automatically. You can either download [the code](../input_data.py) and +import it as below, or simply copy and paste it in. + +```python +import input_data +mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) +``` + +The downloaded data is split into two parts, 60,000 data points of training +data (`mnist.train`) and 10,000 points of test data (`mnist.test`). This +split is very important: it's essential in machine learning that we +have separate data which we don't learn from so that we can make sure +that what we've learned actually generalizes! + +As mentioned earlier, every MNIST data point has two parts: an image of a +handwritten digit and a corresponding label. We will call the images "xs" and +the labels "ys". Both the training set and test set contain xs and ys, for +example the training images are `mnist.train.images` and the train labels are +`mnist.train.labels`. + +Each image is 28 pixels by 28 pixels. We can interpret this as a big array of +numbers: + +<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/MNIST-Matrix.png"> +</div> + +We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't +matter how we flatten the array, as long as we're consistent between images. +From this perspective, the MNIST images are just a bunch of points in a +784-dimensional vector space, with a +[very rich structure](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) +(warning: computationally intensive visualizations). + +Flattening the data throws away information about the 2D structure of the image. +Isn't that bad? Well, the best computer vision methods do exploit this +structure, and we will in later tutorials. But the simple method we will be +using here, a softmax regression, won't. + +The result is that `mnist.train.images` is a tensor (an n-dimensional array) with a +shape of `[60000, 784]`. The first dimension indexes the images and the second +dimension indexes the pixels in each image. Each entry in the tensor is the +pixel intensity between 0 and 1, for a particular pixel in a particular image. + +<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/mnist-train-xs.png"> +</div> + +The corresponding labels in MNIST are numbers between 0 and 9, describing +which digit a given image is of. +For the purposes of this tutorial, we're going to want our labels as +as "one-hot vectors". A one-hot vector is a vector which is 0 in most +dimensions, and 1 in a single dimension. In this case, the $$n$$th digit will be +represented as a vector which is 1 in the $$n$$th dimensions. For example, 0 +would be $$[1,0,0,0,0,0,0,0,0,0,0]$$. +Consequently, `mnist.train.labels` is a +`[60000, 10]` array of floats. + +<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/mnist-train-ys.png"> +</div> + +We're now ready to actually make our model! + +## Softmax Regressions + +We know that every image in MNIST is a digit, whether it's a zero or a nine. We +want to be able to look at an image and give probabilities for it being each +digit. For example, our model might look at a picture of a nine and be 80% sure +it's a nine, but give a 5% chance to it being an eight (because of the top loop) +and a bit of probability to all the others because it isn't sure. + +This is a classic case where a softmax regression is a natural, simple model. +If you want to assign probabilities to an object being one of several different +things, softmax is the thing to do. Even later on, when we train more +sophisticated models, the final step will be a layer of softmax. + +A softmax regression has two steps: first we add up the evidence of our input +being in certain classes, and then we convert that evidence into probabilities. + +To tally up the evidence that a given image is in a particular class, we do a +weighted sum of the pixel intensities. The weight is negative if that pixel +having a high intensity is evidence against the image being in that class, +and positive if it is evidence in favor. + +The following diagram shows the weights one model learned for each of these +classes. Red represents negative weights, while blue represents positive +weights. + +<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/softmax-weights.png"> +</div> + +We also add some extra evidence called a bias. Basically, we want to be able +to say that some things are more likely independent of the input. The result is +that the evidence for a class $$i$$ given an input $$x$$ is: + +$$\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i$$ + +where $$W_i$$ is the weights and $$b_i$$ is the bias for class $$i$$, and $$j$$ +is an index for summing over the pixels in our input image $$x$$. We then +convert the evidence tallies into our predicted probabilities +$$y$$ using the "softmax" function: + +$$y = \text{softmax}(\text{evidence})$$ + +Here softmax is serving as an "activation" or "link" function, shaping +the output of our linear function into the form we want -- in this case, a +probability distribution over 10 cases. +You can think of it as converting tallies +of evidence into probabilities of our input being in each class. +It's defined as: + +$$\text{softmax}(x) = \text{normalize}(\exp(x))$$ + +If you expand that equation out, you get: + +$$\text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$ + +But it's often more helpful to think of softmax the first way: +exponentiating its inputs and then normalizing them. The exponentiation +means that one unit more evidence increases the weight given to any hypothesis +multiplicatively. And conversely, having one less unit of evidence means that a +hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero +or negative weight. Softmax then normalizes these weights, so that they add up +to one, forming a valid probability distribution. (To get more intuition about +the softmax function, check out the +[section](http://neuralnetworksanddeeplearning.com/chap3.html#softmax) +on it in Michael Nieslen's book, complete with an interactive visualization.) + + +You can picture our softmax regression as looking something like the following, +although with a lot more $$x$$s. For each output, we compute a weighted sum of +the $$x$$s, add a bias, and then apply softmax. + +<div style="width:55%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/softmax-regression-scalargraph.png"> +</div> + +If we write that out as equations, we get: + +<div style="width:52%; margin-left:25%; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/softmax-regression-scalarequation.png"> +</div> + +We can "vectorize" this procedure, turning it into a matrix multiplication +and vector addition. This is helpful for computational efficiency. (It's also +a useful way to think.) + +<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;"> +<img style="width:100%" src="img/softmax-regression-vectorequation.png"> +</div> + +More compactly, we can just write: + +$$y = \text{softmax}(Wx + b)$$ + + +## Implementing the Regression + + +To do efficient numerical computing in Python, we typically use libraries like +NumPy that do expensive operations such as matrix multiplication outside Python, +using highly efficient code implemented in another language. +Unfortunately, there can still be a lot of overhead from switching back to +Python every operation. This overhead is especially bad if you want to run +computations on GPUs or in a distributed manner, where there can be a high cost +to transferring data. + +TensorFlow also does its heavy lifting outside python, +but it takes things a step further to avoid this overhead. +Instead of running a single expensive operation independently +from Python, TensorFlow lets us describe a graph of interacting operations that +run entirely outside Python. (Approaches like this can be seen in a few +machine learning libraries.) + +To run computations, TensorFlow needs to connect to its backend. This connection +is called a `Session`. To use TensorFlow, we need to import it and create a +session. + +```python +import tensorflow as tf +sess = tf.InteractiveSession() +``` + +(Using an `InteractiveSession` makes TensorFlow a bit more flexible about how +you structure your code. In particular, it's helpful for work in interactive +contexts like iPython.) + +We describe these interacting operations by manipulating symbolic variables. +Let's create one: + +```python +x = tf.placeholder("float", [None, 784]) +``` + +`x` isn't a specific value. It's a `placeholder`, a value that we'll input when +we ask TensorFlow to run a computation. We want to be able to input any number +of MNIST images, each flattened into a 784-dimensional vector. We represent +this as a 2d tensor of floating point numbers, with a shape `[None, 784]`. +(Here `None` means that a dimension can be of any length.) + +We also need the weights and biases for our model. We could imagine treating +these like additional inputs, but TensorFlow has an even better way to handle +it: `Variable`. +A `Variable` is a modifiable tensor that lives in TensorFlow's graph of +interacting +operations. It can be used and even modified by the computation. For machine +learning applications, one generally has the model parameters be `Variable`s. + +```python +W = tf.Variable(tf.zeros([784,10])) +b = tf.Variable(tf.zeros([10])) +``` + +We create these `Variable`s by giving `tf.Variable` the initial value of the +`Variable`: in this case, we initialize both `W` and `b` as tensors full of +zeros. Since we are going to learn `W` and `b`, it doesn't matter very much +what they initially are. + +Notice that `W` has a shape of [784, 10] because we want to multiply the +784-dimensional image vectors by it to produce 10-dimensional vectors of +evidence for the difference classes. `b` has a shape of [10] so we can add it +to the output. + +We can now implement our model. It only takes one line! + +```python +y = tf.nn.softmax(tf.matmul(x,W) + b) +``` + +First, we multiply `x` by `W` with the expression `tf.matmul(x,W)`. This is +flipped from when we multiplied them in our equation, where we had $$Wx$$, as a +small trick +to deal with `x` being a 2D tensor with multiple inputs. We then add `b`, and +finally apply `tf.nn.softmax`. + +That's it. It only took us one line to define our model, after a couple short +lines of setup. That isn't because TensorFlow is designed to make a softmax +regression particularly easy: it's just a very flexible way to describe many +kinds of numerical computations, from machine learning models to physics +simulations. And once defined, our model can be run on different devices: +your computer's CPU, GPUs, and even phones! + + +## Training + +In order to train our model, we need to define what it means for the model to +be good. Well, actually, in machine learning we typically define what it means +for a model to be bad, called the cost or loss, and then try to minimize how bad +it is. But the two are equivalent. + +One very common, very nice cost function is "cross-entropy." Surprisingly, +cross-entropy arises from thinking about information compressing codes in +information theory but it winds up being an important idea in lots of areas, +from gambling to machine learning. It's defined: + +$$H_{y'}(y) = -\sum_i y'_i \log(y_i)$$ + +Where $$y$$ is our predicted probability distribution, and $$y'$$ is the true +distribution (the one-hot vector we'll input). In some rough sense, the +cross-entropy is measuring how inefficient our predictions are for describing +the truth. Going into more detail about cross-entropy is beyond the scope of +this tutorial, but it's well worth +[understanding](http://colah.github.io/posts/2015-09-Visual-Information/). + +To implement cross-entropy we need to first add a new placeholder to input +the correct answers: + +```python +y_ = tf.placeholder("float", [None,10]) +``` + +Then we can implement the cross-entropy, $$-\sum y'\log(y)$$: + +```python +cross_entropy = -tf.reduce_sum(y_*tf.log(y)) +``` + +First, `tf.log` computes the logarithm of each element of `y`. Next, we multiply +each element of `y_` with the corresponding element of `tf.log(y_)`. Finally, +`tf.reduce_sum` adds all the elements of the tensor. (Note that this isn't +just the cross-entropy of the truth with a single prediction, but the sum of the +cross-entropies for all 100 images we looked at. How well we are doing on 100 +data points is a much better description of how good our model is than a single +data point.) + +Now that we know what we want our model to do, it's very easy to have TensorFlow +train it to do so. +Because TensorFlow know the entire graph of your computations, it +can automatically use the [backpropagation +algorithm](http://colah.github.io/posts/2015-08-Backprop/) +to efficiently determine how your variables affect the cost you ask it minimize. +Then it can apply your choice of optimization algorithm to modify the variables +and reduce the cost. + +```python +train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) +``` + +In this case, we ask TensorFlow to minimize `cross_entropy` using the gradient +descent algorithm with a learning rate of 0.01. Gradient descent is a simple +procedure, where TensorFlow simply shifts each variable a little bit in the +direction that reduces the cost. But TensorFlow also provides +[many other optimization algorithms] +(../../../api_docs/python/train.md?#optimizers): using one is as simple as +tweaking one line. + +What TensorFlow actually does here, behind the scenes, is it adds new operations +to your graph which +implement backpropagation and gradient descent. Then it gives you back a +single operation which, when run, will do a step of gradient descent training, +slightly tweaking your variables to reduce the cost. + +Now we have our model set up to train. But before we start, we need to +initialize the variables we created: + +```python +tf.initialize_all_variables().run() +``` + +Let's train -- we'll run the training step 1000 times! + +```python +for i in range(1000): + batch_xs, batch_ys = mnist.train.next_batch(100) + train_step.run({x: batch_xs, y_: batch_ys}) +``` + +Each step of the loop, we get a "batch" of one hundred random data points from +our training set. We run `train_step` feeding in the batches data to replace +the `placeholder`s. + +Using small batches of random data is called stochastic training -- in +this case, stochastic gradient descent. Ideally, we'd like to use all our data +for every step of training because that would give us a better sense of what +we should be doing, but that's expensive. So, instead, we use a different subset +every time. Doing this is cheap and has much of the same benefit. + + + +## Evaluating Our Model + +How well does our model do? + +Well, first let's figure out where we predicted the correct label. `tf.argmax` +is an extremely useful function which gives you the index of the highest entry +in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our +model thinks is most likely for each input, while `tf.argmax(y_,1)` is the +correct label. We can use `tf.equal` to check if our prediction matches the +truth. + +```python +correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) +``` + +That gives us a list of booleans. To determine what fraction are correct, we +cast to floating point numbers and then take the mean. For example, +`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`. + +```python +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +``` + +Finally, we ask for our accuracy on our test data. + +```python +print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}) +``` + +This should be about 91%. + +Is that good? Well, not really. In fact, it's pretty bad. This is because we're +using a very simple model. With some small changes, we can get to +97%. The best models can get to over 99.7% accuracy! (For more information, have +a look at this +[list of results](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html).) + +What matters is that we learned from this model. Still, if you're feeling a bit +down about these results, check out [the next tutorial](../../index.md) where we +do a lot better, and learn how to build more sophisticated models using +TensorFlow! diff --git a/tensorflow/g3doc/tutorials/mnist/download/index.md b/tensorflow/g3doc/tutorials/mnist/download/index.md new file mode 100644 index 0000000000..dc11e727d8 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/download/index.md @@ -0,0 +1,85 @@ +# Downloading MNIST + +Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/) + +The goal of this tutorial is to show how to download the dataset files required +for handwritten digit classification using the (classic) MNIST data set. + +## Tutorial Files + +This tutorial references the following files: + +File | Purpose +--- | --- +[`input_data.py`](../input_data.py) | The code to download the MNIST dataset for training and evaluation. + +## Prepare the Data + +MNIST is a classic problem in machine learning. The problem is to look at +greyscale 28x28 pixel images of handwritten digits and determine which digit +the image represents, for all the digits from zero to nine. + +![MNIST Digits](../tf/mnist_digits.png "MNIST Digits") + +For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/) +or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/). + +### Download + +[Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/) +also hosts the training and test data for download. + +File | Purpose +--- | --- +[`train-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) | training set images - 55000 training images, 5000 validation images +[`train-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz) | training set labels matching the images +[`t10k-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) | test set images - 10000 images +[`t10k-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | test set labels matching the images + +In the `input_data.py` file, the `maybe_download()` function will ensure these +files are downloaded into a local data folder for training. + +The folder name is specified in a flag variable at the top of the +`fully_connected_feed.py` file and may be changed to fit your needs. + +### Unpack and Reshape + +The files themselves are not in any standard image format and are manually +unpacked (following the instructions available at the website) by the +`extract_images()` and `extract_labels()` functions in `input_data.py`. + +The image data is extracted into a 2d tensor of: `[image index, pixel index]` +where each entry is the intensity value of a specific pixel in a specific +image, rescaled from `[0, 255]` to `[-0.5, 0.5]`. The "image index" corresponds +to an image in the dataset, counting up from zero to the size of the dataset. +And the "pixel index" corresponds to a specific pixel in that image, ranging +from zero to the number of pixels in the image. + +The 60000 examples in the `train-*` files are then split into 55000 examples +for training and 5000 examples for validation. For all of the 28x28 +pixel greyscale images in the datasets the image size is 784 and so the output +tensor for the training set images is of shape `[55000, 784]`. + +The label data is extracted into a 1d tensor of: `[image index]` +with the class identifier for each example as the value. For the training set +labels, this would then be of shape `[55000]`. + +### DataSet Object + +The underlying code will download, unpack, and reshape images and labels for +the following datasets: + +Dataset | Purpose +--- | --- +`data_sets.train` | 55000 images and labels, for primary training. +`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy. +`data_sets.test` | 10000 images and labels, for final testing of trained accuracy. + +The `read_data_sets()` function will return a dictionary with a `DataSet` +instance for each of these three sets of data. The `DataSet.next_batch()` +method can be used to fetch a tuple consisting of `batch_size` lists of images +and labels to be fed into the running TensorFlow session. + +```python +images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size) +``` diff --git a/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py new file mode 100644 index 0000000000..618c8f47cb --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py @@ -0,0 +1,219 @@ +"""Trains and Evaluates the MNIST network using a feed dictionary. + +TensorFlow install instructions: +https://tensorflow.org/get_started/os_setup.html + +MNIST tutorial: +https://tensorflow.org/tutorials/mnist/tf/index.html + +""" +# pylint: disable=missing-docstring +import os.path +import time + +import tensorflow.python.platform +import numpy +import tensorflow as tf + +from tensorflow.g3doc.tutorials.mnist import input_data +from tensorflow.g3doc.tutorials.mnist import mnist + + +# Basic model parameters as external flags. +flags = tf.app.flags +FLAGS = flags.FLAGS +flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.') +flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.') +flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.') +flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.') +flags.DEFINE_integer('batch_size', 100, 'Batch size. ' + 'Must divide evenly into the dataset sizes.') +flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.') +flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data ' + 'for unit testing.') + + +def placeholder_inputs(batch_size): + """Generate placeholder variables to represent the the input tensors. + + These placeholders are used as inputs by the rest of the model building + code and will be fed from the downloaded data in the .run() loop, below. + + Args: + batch_size: The batch size will be baked into both placeholders. + + Returns: + images_placeholder: Images placeholder. + labels_placeholder: Labels placeholder. + """ + # Note that the shapes of the placeholders match the shapes of the full + # image and label tensors, except the first dimension is now batch_size + # rather than the full size of the train or test data sets. + images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, + mnist.IMAGE_PIXELS)) + labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size)) + return images_placeholder, labels_placeholder + + +def fill_feed_dict(data_set, images_pl, labels_pl): + """Fills the feed_dict for training the given step. + + A feed_dict takes the form of: + feed_dict = { + <placeholder>: <tensor of values to be passed for placeholder>, + .... + } + + Args: + data_set: The set of images and labels, from input_data.read_data_sets() + images_pl: The images placeholder, from placeholder_inputs(). + labels_pl: The labels placeholder, from placeholder_inputs(). + + Returns: + feed_dict: The feed dictionary mapping from placeholders to values. + """ + # Create the feed_dict for the placeholders filled with the next + # `batch size ` examples. + images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, + FLAGS.fake_data) + feed_dict = { + images_pl: images_feed, + labels_pl: labels_feed, + } + return feed_dict + + +def do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_set): + """Runs one evaluation against the full epoch of data. + + Args: + sess: The session in which the model has been trained. + eval_correct: The Tensor that returns the number of correct predictions. + images_placeholder: The images placeholder. + labels_placeholder: The labels placeholder. + data_set: The set of images and labels to evaluate, from + input_data.read_data_sets(). + """ + # And run one epoch of eval. + true_count = 0 # Counts the number of correct predictions. + steps_per_epoch = int(data_set.num_examples / FLAGS.batch_size) + num_examples = steps_per_epoch * FLAGS.batch_size + for step in xrange(steps_per_epoch): + feed_dict = fill_feed_dict(data_set, + images_placeholder, + labels_placeholder) + true_count += sess.run(eval_correct, feed_dict=feed_dict) + precision = float(true_count) / float(num_examples) + print ' Num examples: %d Num correct: %d Precision @ 1: %0.04f' % ( + num_examples, true_count, precision) + + +def run_training(): + """Train MNIST for a number of steps.""" + # Get the sets of images and labels for training, validation, and + # test on MNIST. + data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) + + # Tell TensorFlow that the model will be built into the default Graph. + with tf.Graph().as_default(): + # Generate placeholders for the images and labels. + images_placeholder, labels_placeholder = placeholder_inputs( + FLAGS.batch_size) + + # Build a Graph that computes predictions from the inference model. + logits = mnist.inference(images_placeholder, + FLAGS.hidden1, + FLAGS.hidden2) + + # Add to the Graph the Ops for loss calculation. + loss = mnist.loss(logits, labels_placeholder) + + # Add to the Graph the Ops that calculate and apply gradients. + train_op = mnist.training(loss, FLAGS.learning_rate) + + # Add the Op to compare the logits to the labels during evaluation. + eval_correct = mnist.evaluation(logits, labels_placeholder) + + # Build the summary operation based on the TF collection of Summaries. + summary_op = tf.merge_all_summaries() + + # Create a saver for writing training checkpoints. + saver = tf.train.Saver() + + # Create a session for running Ops on the Graph. + sess = tf.Session() + + # Run the Op to initialize the variables. + init = tf.initialize_all_variables() + sess.run(init) + + # Instantiate a SummaryWriter to output summaries and the Graph. + summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, + graph_def=sess.graph_def) + + # And then after everything is built, start the training loop. + for step in xrange(FLAGS.max_steps): + start_time = time.time() + + # Fill a feed dictionary with the actual set of images and labels + # for this particular training step. + feed_dict = fill_feed_dict(data_sets.train, + images_placeholder, + labels_placeholder) + + # Run one step of the model. The return values are the activations + # from the `train_op` (which is discarded) and the `loss` Op. To + # inspect the values of your Ops or variables, you may include them + # in the list passed to sess.run() and the value tensors will be + # returned in the tuple from the call. + _, loss_value = sess.run([train_op, loss], + feed_dict=feed_dict) + + duration = time.time() - start_time + + # Write the summaries and print an overview fairly often. + if step % 100 == 0: + # Print status to stdout. + print 'Step %d: loss = %.2f (%.3f sec)' % (step, + loss_value, + duration) + # Update the events file. + summary_str = sess.run(summary_op, feed_dict=feed_dict) + summary_writer.add_summary(summary_str, step) + + # Save a checkpoint and evaluate the model periodically. + if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps: + saver.save(sess, FLAGS.train_dir, global_step=step) + # Evaluate against the training set. + print 'Training Data Eval:' + do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.train) + # Evaluate against the validation set. + print 'Validation Data Eval:' + do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.validation) + # Evaluate against the test set. + print 'Test Data Eval:' + do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.test) + + +def main(_): + run_training() + + +if __name__ == '__main__': + tf.app.run() diff --git a/tensorflow/g3doc/tutorials/mnist/input_data.py b/tensorflow/g3doc/tutorials/mnist/input_data.py new file mode 100644 index 0000000000..88892027ff --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/input_data.py @@ -0,0 +1,175 @@ +"""Functions for downloading and reading MNIST data.""" +import gzip +import os +import urllib + +import numpy + +SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/' + + +def maybe_download(filename, work_directory): + """Download the data from Yann's website, unless it's already here.""" + if not os.path.exists(work_directory): + os.mkdir(work_directory) + filepath = os.path.join(work_directory, filename) + if not os.path.exists(filepath): + filepath, _ = urllib.urlretrieve(SOURCE_URL + filename, filepath) + statinfo = os.stat(filepath) + print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.' + return filepath + + +def _read32(bytestream): + dt = numpy.dtype(numpy.uint32).newbyteorder('>') + return numpy.frombuffer(bytestream.read(4), dtype=dt) + + +def extract_images(filename): + """Extract the images into a 4D uint8 numpy array [index, y, x, depth].""" + print 'Extracting', filename + with gzip.open(filename) as bytestream: + magic = _read32(bytestream) + if magic != 2051: + raise ValueError( + 'Invalid magic number %d in MNIST image file: %s' % + (magic, filename)) + num_images = _read32(bytestream) + rows = _read32(bytestream) + cols = _read32(bytestream) + buf = bytestream.read(rows * cols * num_images) + data = numpy.frombuffer(buf, dtype=numpy.uint8) + data = data.reshape(num_images, rows, cols, 1) + return data + + +def dense_to_one_hot(labels_dense, num_classes=10): + """Convert class labels from scalars to one-hot vectors.""" + num_labels = labels_dense.shape[0] + index_offset = numpy.arange(num_labels) * num_classes + labels_one_hot = numpy.zeros((num_labels, num_classes)) + labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1 + return labels_one_hot + + +def extract_labels(filename, one_hot=False): + """Extract the labels into a 1D uint8 numpy array [index].""" + print 'Extracting', filename + with gzip.open(filename) as bytestream: + magic = _read32(bytestream) + if magic != 2049: + raise ValueError( + 'Invalid magic number %d in MNIST label file: %s' % + (magic, filename)) + num_items = _read32(bytestream) + buf = bytestream.read(num_items) + labels = numpy.frombuffer(buf, dtype=numpy.uint8) + if one_hot: + return dense_to_one_hot(labels) + return labels + + +class DataSet(object): + + def __init__(self, images, labels, fake_data=False): + if fake_data: + self._num_examples = 10000 + else: + assert images.shape[0] == labels.shape[0], ( + "images.shape: %s labels.shape: %s" % (images.shape, + labels.shape)) + self._num_examples = images.shape[0] + + # Convert shape from [num examples, rows, columns, depth] + # to [num examples, rows*columns] (assuming depth == 1) + assert images.shape[3] == 1 + images = images.reshape(images.shape[0], + images.shape[1] * images.shape[2]) + # Convert from [0, 255] -> [0.0, 1.0]. + images = images.astype(numpy.float32) + images = numpy.multiply(images, 1.0 / 255.0) + self._images = images + self._labels = labels + self._epochs_completed = 0 + self._index_in_epoch = 0 + + @property + def images(self): + return self._images + + @property + def labels(self): + return self._labels + + @property + def num_examples(self): + return self._num_examples + + @property + def epochs_completed(self): + return self._epochs_completed + + def next_batch(self, batch_size, fake_data=False): + """Return the next `batch_size` examples from this data set.""" + if fake_data: + fake_image = [1.0 for _ in xrange(784)] + fake_label = 0 + return [fake_image for _ in xrange(batch_size)], [ + fake_label for _ in xrange(batch_size)] + start = self._index_in_epoch + self._index_in_epoch += batch_size + if self._index_in_epoch > self._num_examples: + # Finished epoch + self._epochs_completed += 1 + # Shuffle the data + perm = numpy.arange(self._num_examples) + numpy.random.shuffle(perm) + self._images = self._images[perm] + self._labels = self._labels[perm] + # Start next epoch + start = 0 + self._index_in_epoch = batch_size + assert batch_size <= self._num_examples + end = self._index_in_epoch + return self._images[start:end], self._labels[start:end] + + +def read_data_sets(train_dir, fake_data=False, one_hot=False): + class DataSets(object): + pass + data_sets = DataSets() + + if fake_data: + data_sets.train = DataSet([], [], fake_data=True) + data_sets.validation = DataSet([], [], fake_data=True) + data_sets.test = DataSet([], [], fake_data=True) + return data_sets + + TRAIN_IMAGES = 'train-images-idx3-ubyte.gz' + TRAIN_LABELS = 'train-labels-idx1-ubyte.gz' + TEST_IMAGES = 't10k-images-idx3-ubyte.gz' + TEST_LABELS = 't10k-labels-idx1-ubyte.gz' + VALIDATION_SIZE = 5000 + + local_file = maybe_download(TRAIN_IMAGES, train_dir) + train_images = extract_images(local_file) + + local_file = maybe_download(TRAIN_LABELS, train_dir) + train_labels = extract_labels(local_file, one_hot=one_hot) + + local_file = maybe_download(TEST_IMAGES, train_dir) + test_images = extract_images(local_file) + + local_file = maybe_download(TEST_LABELS, train_dir) + test_labels = extract_labels(local_file, one_hot=one_hot) + + validation_images = train_images[:VALIDATION_SIZE] + validation_labels = train_labels[:VALIDATION_SIZE] + train_images = train_images[VALIDATION_SIZE:] + train_labels = train_labels[VALIDATION_SIZE:] + + data_sets.train = DataSet(train_images, train_labels) + data_sets.validation = DataSet(validation_images, validation_labels) + data_sets.test = DataSet(test_images, test_labels) + + return data_sets diff --git a/tensorflow/g3doc/tutorials/mnist/mnist.py b/tensorflow/g3doc/tutorials/mnist/mnist.py new file mode 100644 index 0000000000..acf4d01dd1 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/mnist.py @@ -0,0 +1,148 @@ +"""Builds the MNIST network. + +Implements the inference/loss/training pattern for model building. + +1. inference() - Builds the model as far as is required for running the network +forward to make predictions. +2. loss() - Adds to the inference model the layers required to generate loss. +3. training() - Adds to the loss model the Ops required to generate and +apply gradients. + +This file is used by the various "fully_connected_*.py" files and not meant to +be run. + +TensorFlow install instructions: +https://tensorflow.org/get_started/os_setup.html + +MNIST tutorial: +https://tensorflow.org/tutorials/mnist/tf/index.html +""" +import math + +import tensorflow.python.platform +import tensorflow as tf + +# The MNIST dataset has 10 classes, representing the digits 0 through 9. +NUM_CLASSES = 10 + +# The MNIST images are always 28x28 pixels. +IMAGE_SIZE = 28 +IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE + + +def inference(images, hidden1_units, hidden2_units): + """Build the MNIST model up to where it may be used for inference. + + Args: + images: Images placeholder, from inputs(). + hidden1: Size of the first hidden layer. + hidden2: Size of the second hidden layer. + + Returns: + softmax_linear: Output tensor with the computed logits. + """ + # Hidden 1 + with tf.name_scope('hidden1') as scope: + weights = tf.Variable( + tf.truncated_normal([IMAGE_PIXELS, hidden1_units], + stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), + name='weights') + biases = tf.Variable(tf.zeros([hidden1_units]), + name='biases') + hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases) + # Hidden 2 + with tf.name_scope('hidden2') as scope: + weights = tf.Variable( + tf.truncated_normal([hidden1_units, hidden2_units], + stddev=1.0 / math.sqrt(float(hidden1_units))), + name='weights') + biases = tf.Variable(tf.zeros([hidden2_units]), + name='biases') + hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases) + # Linear + with tf.name_scope('softmax_linear') as scope: + weights = tf.Variable( + tf.truncated_normal([hidden2_units, NUM_CLASSES], + stddev=1.0 / math.sqrt(float(hidden2_units))), + name='weights') + biases = tf.Variable(tf.zeros([NUM_CLASSES]), + name='biases') + logits = tf.matmul(hidden2, weights) + biases + return logits + + +def loss(logits, labels): + """Calculates the loss from the logits and the labels. + + Args: + logits: Logits tensor, float - [batch_size, NUM_CLASSES]. + labels: Labels tensor, int32 - [batch_size]. + + Returns: + loss: Loss tensor of type float. + """ + # Convert from sparse integer labels in the range [0, NUM_CLASSSES) + # to 1-hot dense float vectors (that is we will have batch_size vectors, + # each with NUM_CLASSES values, all of which are 0.0 except there will + # be a 1.0 in the entry corresponding to the label). + batch_size = tf.size(labels) + labels = tf.expand_dims(labels, 1) + indices = tf.expand_dims(tf.range(0, batch_size, 1), 1) + concated = tf.concat(1, [indices, labels]) + onehot_labels = tf.sparse_to_dense( + concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0) + cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, + onehot_labels, + name='xentropy') + loss = tf.reduce_mean(cross_entropy, name='xentropy_mean') + return loss + + +def training(loss, learning_rate): + """Sets up the training Ops. + + Creates a summarizer to track the loss over time in TensorBoard. + + Creates an optimizer and applies the gradients to all trainable variables. + + The Op returned by this function is what must be passed to the + `sess.run()` call to cause the model to train. + + Args: + loss: Loss tensor, from loss(). + learning_rate: The learning rate to use for gradient descent. + + Returns: + train_op: The Op for training. + """ + # Add a scalar summary for the snapshot loss. + tf.scalar_summary(loss.op.name, loss) + # Create the gradient descent optimizer with the given learning rate. + optimizer = tf.train.GradientDescentOptimizer(learning_rate) + # Create a variable to track the global step. + global_step = tf.Variable(0, name='global_step', trainable=False) + # Use the optimizer to apply the gradients that minimize the loss + # (and also increment the global step counter) as a single training step. + train_op = optimizer.minimize(loss, global_step=global_step) + return train_op + + +def evaluation(logits, labels): + """Evaluate the quality of the logits at predicting the label. + + Args: + logits: Logits tensor, float - [batch_size, NUM_CLASSES]. + labels: Labels tensor, int32 - [batch_size], with values in the + range [0, NUM_CLASSES). + + Returns: + A scalar int32 tensor with the number of examples (out of batch_size) + that were predicted correctly. + """ + # For a classifier model, we can use the in_top_k Op. + # It returns a bool tensor with shape [batch_size] that is true for + # the examples where the label's is was in the top k (here k=1) + # of all logits for that example. + correct = tf.nn.in_top_k(logits, labels, 1) + # Return the number of true entries. + return tf.reduce_sum(tf.cast(correct, tf.int32)) diff --git a/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py new file mode 100644 index 0000000000..640ea29dac --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py @@ -0,0 +1,33 @@ +"""A very simple MNIST classifer. + +See extensive documentation at ??????? (insert public URL) +""" + +# Import data +import input_data +mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) + +import tensorflow as tf +sess = tf.InteractiveSession() + +# Create the model +x = tf.placeholder("float", [None, 784]) +W = tf.Variable(tf.zeros([784,10])) +b = tf.Variable(tf.zeros([10])) +y = tf.nn.softmax(tf.matmul(x,W) + b) + +# Define loss and optimizer +y_ = tf.placeholder("float", [None,10]) +cross_entropy = -tf.reduce_sum(y_*tf.log(y)) +train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) + +# Train +tf.initialize_all_variables().run() +for i in range(1000): + batch_xs, batch_ys = mnist.train.next_batch(100) + train_step.run({x: batch_xs, y_: batch_ys}) + +# Test trained model +correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}) diff --git a/tensorflow/g3doc/tutorials/mnist/pros/index.md b/tensorflow/g3doc/tutorials/mnist/pros/index.md new file mode 100644 index 0000000000..17696712b0 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md @@ -0,0 +1,390 @@ +# MNIST Deep Learning Example (For Experts) + +TensorFlow is a powerful library for doing large-scale numerical computation. +One of the tasks at which it excels is implementing and training deep neural +networks. +In this tutorial we will learn the basic building blocks of a TensorFlow model +while constructing a deep convolutional MNIST classifier. + +*This introduction assumes familiarity with neural networks and the MNIST +dataset. If you don't have +a background with them, check out the +[introduction for beginners](../beginners/index.md).* + +## Setup + +Before we create our model, we will first load the MNIST dataset, and start a +TensorFlow session. + +### Load MNIST Data + +For your convenience, we've included [a script](../input_data.py) which +automatically downloads and imports the MNIST dataset. It will create a +directory `'MNIST_data'` in which to store the data files. + +```python +import input_data +mnist = input_data.read_data_sets('MNIST_data', one_hot=True) +``` + +Here `mnist` is a lightweight class which stores the training, validation, and +testing sets as NumPy arrays. +It also provides a function for iterating through data minibatches, which we +will use below. + +### Start TensorFlow Session + +Tensorflow relies on a highly efficient C++ backend to do its computation. The +connection to this backend is called a session. We will need to create a session +before we can do any computation. + +```python +import tensorflow as tf +sess = tf.InteractiveSession() +``` + +Using an `InteractiveSession` makes TensorFlow more flexible about how you +structure your code. +It allows you to interleave operations which build a +[computation graph](../../../get_started/basic_usage.md#the-computation-graph) +with ones that run the graph. +This is particularly convenient when working in interactive contexts like +iPython. +If you are not using an `InteractiveSession`, then you should build +the entire computation graph before starting a session and [launching the +graph](../../../get_started/basic_usage.md#launching-the-graph-in-a-session). + +#### Computation Graph + +To do efficient numerical computing in Python, we typically use libraries like +NumPy that do expensive operations such as matrix multiplication outside Python, +using highly efficient code implemented in another language. +Unfortunately, there can still be a lot of overhead from switching back to +Python every operation. This overhead is especially bad if you want to run +computations on GPUs or in a distributed manner, where there can be a high cost +to transferring data. + +TensorFlow also does its heavy lifting outside Python, +but it takes things a step further to avoid this overhead. +Instead of running a single expensive operation independently +from Python, TensorFlow lets us describe a graph of interacting operations that +run entirely outside Python. +This approach is similar to that used in Theano or Torch. + +The role of the Python code is therefore to build this external computation +graph, and to dictate which parts of the computation graph should be run. See +the +[Computation Graph](../../../get_started/basic_usage.md#the-computation-graph) +section of +[Basic Usage](../../../get_started/basic_usage.md) +for more detail. + +## Build a Softmax Regression Model + +In this section we will build a softmax regression model with a single linear +layer. In the next section, we will extend this to the case of softmax +regression with a multilayer convolutional network. + +### Placeholders + +We start building the computation graph by creating nodes for the +input images and target output classes. + +```python +x = tf.placeholder("float", shape=[None, 784]) +y_ = tf.placeholder("float", shape=[None, 10]) +``` + +Here `x` and `y_` aren't specific values. Rather, they are each a `placeholder` +-- a value that we'll input when we ask TensorFlow to run a computation. + +The input images `x` will consist of a 2d tensor of floating point numbers. +Here we assign it a `shape` of `[None, 784]`, where `784` is the dimensionality of +a single flattened MNIST image, and `None` indicates that the first dimension, +corresponding to the batch size, can be of any size. +The target output classes `y_` will also consist of a 2d tensor, +where each row is a one-hot 10-dimensional vector indicating +which digit class the corresponding MNIST image belongs to. + +The `shape` argument to `placeholder` is optional, but it allows TensorFlow +to automatically catch bugs stemming from inconsistent tensor shapes. + +### Variables + +We now define the weights `W` and biases `b` for our model. We could imagine treating +these like additional inputs, but TensorFlow has an even better way to handle +them: `Variable`. +A `Variable` is a value that lives in TensorFlow's computation graph. +It can be used and even modified by the computation. In machine +learning applications, one generally has the model paramaters be `Variable`s. + +```python +W = tf.Variable(tf.zeros([784,10])) +b = tf.Variable(tf.zeros([10])) +``` + +We pass the initial value for each parameter in the call to `tf.Variable`. +In this case, we initialize both `W` and `b` as tensors full of +zeros. `W` is a 784x10 matrix (because we have 784 input features +and 10 outputs) and `b` is a 10-dimensional vector (because we have 10 classes). + +Before `Variable`s can be used within a session, they must be initialized using +that session. +This step takes the initial values (in this case tensors full of zeros) that +have already been specified, and assigns them to each `Variable`. This can be +done for all `Variables` at once. + +```python +sess.run(tf.initialize_all_variables()) +``` + +### Predicted Class and Cost Function + +We can now implement our regression model. It only takes one line! +We multiply the vectorized input images `x` by the weight matrix `W`, add +the bias `b`, and compute the softmax probabilities that are assigned to each +class. + +```python +y = tf.nn.softmax(tf.matmul(x,W) + b) +``` + +The cost function to be minimized during training can be specified just as +easily. Our cost function will be the cross-entropy between the target and the +model's prediction. + +```python +cross_entropy = -tf.reduce_sum(y_*tf.log(y)) +``` + +Note that `tf.reduce_sum` sums across all images in the minibatch, as well as +all classes. We are computing the cross entropy for the entire minibatch. + +## Train the Model + +Now that we have defined our model and training cost function, it is +straightforward to train using TensorFlow. +Because TensorFlow knows the entire computation graph, it +can use automatic differentiation to find the gradients of the cost with +respect to each of the variables. +TensorFlow has a variety of +[builtin optimization algorithms] +(../../../api_docs/python/train.md?#optimizers). +For this example, we will use steepest gradient descent, with a step length of +0.01, to descend the cross entropy. + +```python +train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) +``` + +What TensorFlow actually did in that single line was to add new operations to +the computation graph. These operations included ones to compute gradients, +compute parameter update steps, and apply update steps to the parameters. + +The returned operation `train_step`, when run, will apply the gradient +descent updates to the parameters. Training the model can therefore be +accomplished by repeatedly running `train_step`. + +```python +for i in range(1000): + batch = mnist.train.next_batch(50) + train_step.run(feed_dict={x: batch[0], y_: batch[1]}) +``` + +Each training iteration we load 50 training examples. We then run the +`train_step` operation, using `feed_dict` to replace the `placeholder` tensors +`x` and `y_` with the training examples. +Note that you can replace any tensor in your computation graph using `feed_dict` +-- it's not restricted to just `placeholder`s. + +### Evaluate the Model + +How well did our model do? + +First we'll figure out where we predicted the correct label. `tf.argmax` +is an extremely useful function which gives you the index of the highest entry +in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our +model thinks is most likely for each input, while `tf.argmax(y_,1)` is the +true label. We can use `tf.equal` to check if our prediction matches the +truth. + +```python +correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) +``` + +That gives us a list of booleans. To determine what fraction are correct, we +cast to floating point numbers and then take the mean. For example, +`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`. + +```python +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +``` + +Finally, we can evaluate our accuracy on the test data. This should be about +91% correct. + +```python +print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}) +``` + +## Build a Multilayer Convolutional Network + +Getting 91% accuracy on MNIST is bad. It's almost embarrassingly bad. In this +section, we'll fix that, jumping from a very simple model to something moderatly +sophisticated: a small convolutional neural network. This will get us to around +99.2% accuracy -- not state of the art, but respectable. + +### Weight Initialization + +To create this model, we're going to need to create a lot of weights and biases. +One should generally initialize weights with a small amount of noise for +symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons, +it is also good practice to initialize them with a slightly positive initial +bias to avoid "dead neurons." Instead of doing this repeatedly while we build +the model, let's create two handy functions to do it for us. + +```python +def weight_variable(shape): + initial = tf.truncated_normal(shape, stddev=0.1) + return tf.Variable(initial) + +def bias_variable(shape): + initial = tf.constant(0.1, shape=shape) + return tf.Variable(initial) +``` + +### Convolution and Pooling + +TensorFlow also gives us a lot of flexibility in convolution and pooling +operations. How do we handle the boundaries? What is our stride size? +In this example, we're always going to choose the vanilla version. +Our convolutions uses a stride of one and are zero padded so that the +output is the same size as the input. Our pooling is plain old max pooling +over 2x2 blocks. To keep our code cleaner, let's also abstract those operations +into functions. + +```python +def conv2d(x, W): + return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') + +def max_pool_2x2(x): + return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], + strides=[1, 2, 2, 1], padding='SAME') +``` + +### First Convolutional Layer + +We can now implement our first layer. It will consist of convolution, followed +by max pooling. The convolutional will compute 32 features for each 5x5 patch. +Its weight tensor will have a shape of `[5, 5, 1, 32]`. The first two +dimensions are the patch size, the next is the number of input channels, and +the last is the number of output channels. We will also have a bias vector with +a component for each output channel. + +```python +W_conv1 = weight_variable([5, 5, 1, 32]) +b_conv1 = bias_variable([32]) +``` + +To apply the layer, we first reshape `x` to a 4d tensor, with the second and +third dimensions corresponding to image width and height, and the final +dimension corresponding to the number of color channels. + +```python +x_image = tf.reshape(x, [-1,28,28,1]) +``` + +We then convolve `x_image` with the weight tensor, add the +bias, apply the ReLU function, and finally max pool. + +```python +h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) +h_pool1 = max_pool_2x2(h_conv1) +``` + +### Second Convolutional Layer + +In order to build a deep network, we stack several layers of this type. The +second layer will have 64 features for each 5x5 patch. + +```python +W_conv2 = weight_variable([5, 5, 32, 64]) +b_conv2 = bias_variable([64]) + +h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) +h_pool2 = max_pool_2x2(h_conv2) +``` + +### Densely Connected Layer + +Now that the image size has been reduced to 7x7, we add a fully-connected layer +with 1024 neurons to allow processing on the entire image. We reshape the tensor +from the pooling layer into a batch of vectors, +multiply by a weight matrix, add a bias, and apply a ReLU. + +```python +W_fc1 = weight_variable([7 * 7 * 64, 1024]) +b_fc1 = bias_variable([1024]) + +h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) +h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) +``` + +#### Dropout + +To reduce overfitting, we will apply dropout before the readout layer. +We create a `placeholder` for the probability that a neuron's output is kept +during dropout. This allows us to turn dropout on during training, and turn it +off during testing. +TensorFlow's `tf.nn.dropout` op automatically handles scaling neuron outputs in +addition to masking them, so dropout just works without any additional scaling. + +```python +keep_prob = tf.placeholder("float") +h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) +``` + +### Readout Layer + +Finally, we add a softmax layer, just like for the one layer softmax regression +above. + +```python +W_fc2 = weight_variable([1024, 10]) +b_fc2 = bias_variable([10]) + +y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) +``` + +### Train and Evaluate the Model + +How well does this model do? +To train and evaluate it we will use code that is nearly identical to that for +the simple one layer SoftMax network above. +The differences are that: we will replace the steepest gradient descent +optimizer with the more sophisticated ADAM optimizer; we will include the +additional parameter `keep_prob` in `feed_dict` to control the dropout rate; +and we will add logging to every 100th iteration in the training process. + +```python +cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) +train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) +correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +sess.run(tf.initialize_all_variables()) +for i in range(20000): + batch = mnist.train.next_batch(50) + if i%100 == 0: + train_accuracy = accuracy.eval(feed_dict={ + x:batch[0], y_: batch[1], keep_prob: 1.0}) + print "step %d, training accuracy %g"%(i, train_accuracy) + train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) + +print "test accuracy %g"%accuracy.eval(feed_dict={ + x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}) +``` + +The final test set accuracy after running this code should be approximately 99.2%. + +We have learned how to quickly and easily build, train, and evaluate a +fairly sophisticated deep learning model using TensorFlow. diff --git a/tensorflow/g3doc/tutorials/mnist/tf/index.md b/tensorflow/g3doc/tutorials/mnist/tf/index.md new file mode 100644 index 0000000000..86f3296287 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/tf/index.md @@ -0,0 +1,513 @@ +# Handwritten Digit Classification + +Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/) + +The goal of this tutorial is to show how to use TensorFlow to train and +evaluate a simple feed-forward neural network for handwritten digit +classification using the (classic) MNIST data set. The intended audience for +this tutorial is experienced machine learning users interested in using +TensorFlow. + +These tutorials are not intended for teaching Machine Learning in general. + +Please ensure you have followed the instructions to [`Install TensorFlow`](../../../get_started/os_setup.md). + +## Tutorial Files + +This tutorial references the following files: + +File | Purpose +--- | --- +[`mnist.py`](../mnist.py) | The code to build a fully-connected MNIST model. +[`fully_connected_feed.py`](../fully_connected_feed.py) | The main code, to train the built MNIST model against the downloaded dataset using a feed dictionary. + +Simply run the `fully_connected_feed.py` file directly to start training: + +`python fully_connected_feed.py` + +## Prepare the Data + +MNIST is a classic problem in machine learning. The problem is to look at +greyscale 28x28 pixel images of handwritten digits and determine which digit +the image represents, for all the digits from zero to nine. + +![MNIST Digits](./mnist_digits.png "MNIST Digits") + +For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/) +or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/). + +### Download + +At the top of the `run_training()` method, the `input_data.read_data_sets()` +function will ensure that the correct data has been downloaded to your local +training folder and then unpack that data to return a dictionary of `DataSet` +instances. + +```python +data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) +``` + +**NOTE**: The `fake_data` flag is used for unit-testing purposes and may be +safely ignored by the reader. + +Dataset | Purpose +--- | --- +`data_sets.train` | 55000 images and labels, for primary training. +`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy. +`data_sets.test` | 10000 images and labels, for final testing of trained accuracy. + +For more information about the data, please read the [`Download`](../download/index.md) +tutorial. + +### Inputs and Placeholders + +The `placeholder_inputs()` function creates two [`tf.placeholder`](../../../api_docs/python/io_ops.md#placeholder) +ops that define the shape of the inputs, including the `batch_size`, to the +rest of the graph and into which the actual training examples will be fed. + +```python +images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, + IMAGE_PIXELS)) +labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size)) +``` + +Further down, in the training loop, the full image and label datasets are +sliced to fit the `batch_size` for each step, matched with these placeholder +ops, and then passed into the `sess.run()` function using the `feed_dict` +parameter. + +## Build the Graph + +After creating placeholders for the data, the graph is built from the +`mnist.py` file according to a 3-stage pattern: `inference()`, `loss()`, and +`training()`. + +1. `inference()` - Builds the graph as far as is required for running +the network forward to make predictions. +1. `loss()` - Adds to the inference graph the ops required to generate +loss. +1. `training()` - Adds to the loss graph the ops required to compute +and apply gradients. + +<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> + <img style="width:100%" src="./mnist_subgraph.png"> +</div> + +### Inference + +The `inference()` function builds the graph as far as needed to +return the tensor that would contain the output predictions. + +It takes the images placeholder as input and builds on top +of it a pair of fully connected layers with ReLu activation followed by a ten +node linear layer specifying the output logits. + +Each layer is created beneath a unique [`tf.name_scope`](../../../api_docs/python/framework.md#name_scope) +that acts as a prefix to the items created within that scope. + +```python +with tf.name_scope('hidden1') as scope: +``` + +Within the defined scope, the weights and biases to be used by each of these +layers are generated into [`tf.Variable`](../../../api_docs/python/state_ops.md#Variable) +instances, with their desired shapes: + +```python +weights = tf.Variable( + tf.truncated_normal([IMAGE_PIXELS, hidden1_units], + stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), + name='weights') +biases = tf.Variable(tf.zeros([hidden1_units]), + name='biases') +``` + +When, for instance, these are created under the `hidden1` scope, the unique +name given to the weights variable would be "`hidden1/weights`". + +Each variable is given initializer ops as part of their construction. + +In this most common case, the weights are initialized with the +[`tf.truncated_normal`](../../../api_docs/python/constant_op.md#truncated_normal) +and given their shape of a 2d tensor with +the first dim representing the number of units in the layer from which the +weights connect and the second dim representing the number of +units in the layer to which the weights connect. For the first layer, named +`hidden1`, the dimensions are `[IMAGE_PIXELS, hidden1_units]` because the +weights are connecting the image inputs to the hidden1 layer. The +`tf.truncated_normal` initializer generates a random distribution with a given +mean and standard deviation. + +Then the biases are initialized with [`tf.zeros`](../../../api_docs/python/constant_op.md#zeros) +to ensure they start with all zero values, and their shape is simply the number +of units in the layer to which they connect. + +The graph's three primary ops -- two [`tf.nn.relu`](../../../api_docs/python/nn.md#relu) +ops wrapping [`tf.matmul`](../../../api_docs/python/math_ops.md#matmul) +for the hidden layers and one extra `tf.matmul` for the logits -- are then +created, each in turn, with their `tf.Variable` instances connected to the +input placeholder or the output tensor of the layer beneath each. + +```python +hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases) +``` + +```python +hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases) +``` + +```python +logits = tf.matmul(hidden2, weights) + biases +``` + +Finally, the `logits` tensor that will contain the output is returned. + +### Loss + +The `loss()` function further builds the graph by adding the required loss +ops. + +First, the values from the label_placeholder are encoded as a tensor of 1-hot +values. For example, if the class identifier is '3' the value is converted to: +<br>`[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]` + +```python +batch_size = tf.size(labels) +labels = tf.expand_dims(labels, 1) +indices = tf.expand_dims(tf.range(0, batch_size, 1), 1) +concated = tf.concat(1, [indices, labels]) +onehot_labels = tf.sparse_to_dense( + concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0) +``` + +A [`tf.nn.softmax_cross_entropy_with_logits`](../../../api_docs/python/nn.md#softmax_cross_entropy_with_logits) +op is then added to compare the output logits from the `inference()` function +and the 1-hot labels. + +```python +cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, + onehot_labels, + name='xentropy') +``` + +It then uses [`tf.reduce_mean`](../../../api_docs/python/math_ops.md#reduce_mean) +to average the cross entropy values across the batch dimension (the first +dimension) as the total loss. + +```python +loss = tf.reduce_mean(cross_entropy, name='xentropy_mean') +``` + +And the tensor that will then contain the loss value is returned. + +> Note: Cross-entropy is an idea from information theory that allows us +> to describe how bad it is to believe the predictions of the neural network, +> given what is actually true. For more information, read the blog post Visual +> Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/) + +### Training + +The `training()` function adds the operations needed to minimize the loss via +gradient descent. + +Firstly, it takes the loss tensor from the `loss()` function and hands it to a +[`tf.scalar_summary`](../../../api_docs/python/train.md#scalar_summary), +an op for generating summary values into the events file when used with a +`SummaryWriter` (see below). In this case, it will emit the snapshot value of +the loss every time the summaries are written out. + +```python +tf.scalar_summary(loss.op.name, loss) +``` + +Next, we instantiate a [`tf.train.GradientDescentOptimizer`](../../../api_docs/python/train.md#GradientDescentOptimizer) +responsible for applying gradients with the requested learning rate. + +```python +optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate) +``` + +We then generate a single variable to contain a counter for the global +training step and the [`minimize()`](../../../api_docs/python/train.md#Optimizer.minimize) +op is used to both update the trainable weights in the system and increment the +global step. This is, by convention, known as the `train_op` and is what must +be run by a TensorFlow session in order to induce one full step of training +(see below). + +```python +global_step = tf.Variable(0, name='global_step', trainable=False) +train_op = optimizer.minimize(loss, global_step=global_step) +``` + +The tensor containing the outputs of the training op is returned. + +## Train the Model + +Once the graph is built, it can be iteratively trained and evaluated in a loop +controlled by the user code in `fully_connected_feed.py`. + +### The Graph + +At the top of the `run_training()` function is a python `with` command that +indicates all of the built ops are to be associated with the default +global [`tf.Graph`](../../../api_docs/python/framework.md#Graph) +instance. + +```python +with tf.Graph().as_default(): +``` + +A `tf.Graph` is a collection of ops that may be executed together as a group. +Most TensorFlow uses will only need to rely on the single default graph. + +More complicated uses with multiple graphs are possible, but beyond the scope of +this simple tutorial. + +### The Session + +Once all of the build preparation has been completed and all of the necessary +ops generated, a [`tf.Session`](../../../api_docs/python/client.md#Session) +is created for running the graph. + +```python +sess = tf.Session() +``` + +Alternately, a `Session` may be generated into a `with` block for scoping: + +```python +with tf.Session() as sess: +``` + +The empty parameter to session indicates that this code will attach to +(or create if not yet created) the default local session. + +Immediately after creating the session, all of the `tf.Variable` +instances are initialized by calling `sess.run()` on their initialization op. + +```python +init = tf.initialize_all_variables() +sess.run(init) +``` + +The [`sess.run()`](../../../api_docs/python/client.md#Session.run) +method will run the complete subset of the graph that +corresponds to the op(s) passed as parameters. In this first call, the `init` +op is a [`tf.group`](../../../api_docs/python/control_flow_ops.md#group) +that contains only the initializers for the variables. None of the rest of the +graph is run here, that happens in the training loop below. + +### Train Loop + +After initializing the variables with the session, training may begin. + +The user code controls the training per step, and the simplest loop that +can do useful training is: + +```python +for step in xrange(max_steps): + sess.run([train_op]) +``` + +However, this tutorial is slightly more complicated in that it must also slice +up the input data for each step to match the previously generated placeholders. + +#### Feed the Graph + +For each step, the code will generate a feed dictionary that will contain the +set of examples on which to train for the step, keyed by the placeholder +ops they represent. + +In the `fill_feed_dict()` function, the given `DataSet` is queried for its next +`batch_size` set of images and labels, and tensors matching the placeholders are +filled containing the next images and labels. + +```python +images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size) +``` + +A python dictionary object is then generated with the placeholders as keys and +the representative feed tensors as values. + +```python +feed_dict = { + images_placeholder: images_feed, + labels_placeholder: labels_feed, +} +``` + +This is passed into the `sess.run()` function's `feed_dict` parameter to provide +the input examples for this step of training. + +#### Check the Status + +The code specifies two op-tensors in its run call: `[train_op, loss]`: + +```python +for step in xrange(FLAGS.max_steps): + feed_dict = fill_feed_dict(data_sets.train, + images_placeholder, + labels_placeholder) + _, loss_value = sess.run([train_op, loss], + feed_dict=feed_dict) +``` + +Because there are two tensors passed as parameters, the return from +`sess.run()` is a tuple with two items. The returned items are themselves +tensors, filled with the values of the passed op-tensors during this step of +training. + +The value of the `train_op` is actually `None` and, thus, discarded. But the +value of the `loss` tensor may become NaN if the model diverges during training. + +Assuming that the training runs fine without NaNs, the training loop also +prints a simple status text every 100 steps to let the user know the state of +training. + +```python +if step % 100 == 0: + print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration) +``` + +#### Visualize the Status + +In order to emit the events files used by [TensorBoard](../../../how_tos/summaries_and_tensorboard/index.md), +all of the summaries (in this case, only one) are collected into a single op +during the graph building phase. + +```python +summary_op = tf.merge_all_summaries() +``` + +And then after the Session is generated, a [`tf.train.SummaryWriter`](../../../api_docs/python/train.md#SummaryWriter) +may be instantiated to output into the given directory the events files, +containing the Graph itself and the values of the summaries. + +```python +summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, + graph_def=sess.graph_def) +``` + +Lastly, the events file will be updated with new summary values every time the +`summary_op` is run and the ouput passed to the writer's `add_summary()` +function. + +```python +summary_str = sess.run(summary_op, feed_dict=feed_dict) +summary_writer.add_summary(summary_str, step) +``` + +When the events files are written, TensorBoard may be run against the training +folder to display the values from the summaries. + +![MNIST TensorBoard](./mnist_tensorboard.png "MNIST TensorBoard") + +**NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial [Tensorboard: Visualizing Your Training](../../../how_tos/summaries_and_tensorboard/index.md). + +#### Save a Checkpoint + +In order to emit a checkpoint file that may be used to later restore a model +for further training or evaluation, we instantiate a +[`tf.train.Saver`](../../../api_docs/python/state_ops.md#Saver). + +```python +saver = tf.train.Saver() +``` + +In the training loop, the [`saver.save()`](../../../api_docs/python/state_ops.md#Saver.save) +method will periodically be called to write a checkpoint file to the training +directory with the current values of all the trainable variables. + +```python +saver.save(sess, FLAGS.train_dir, global_step=step) +``` + +At some later point in the future, training might be resumed by using the +[`saver.restore()`](../../../api_docs/python/state_ops.md#Saver.restore) +method to reload the model parameters. + +```python +saver.restore(sess, FLAGS.train_dir) +``` + +## Evaluate the Model + +Every thousand steps, the code will attempt to evaluate the model against both +the training and test datasets. The `do_eval()` function is called thrice, for +the training, validation, and test datasets. + +```python +print 'Training Data Eval:' +do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.train) +print 'Validation Data Eval:' +do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.validation) +print 'Test Data Eval:' +do_eval(sess, + eval_correct, + images_placeholder, + labels_placeholder, + data_sets.test) +``` + +> Note that more complicated usage would usually sequester the `data_sets.test` +> to only be checked after significant amounts of hyperparameter tuning. For +> the sake of a simple little MNIST problem, however, we evaluate against all of +> the data. + +### Build the Eval Graph + +Before opening the default Graph, the test data should have been fetched by +calling the `get_data(train=False)` function with the parameter set to grab +the test dataset. + +```python +test_all_images, test_all_labels = get_data(train=False) +``` + +Before entering the training loop, the Eval op should have been built +by calling the `evaluation()` function from `mnist.py` with the same +logits/labels parameters as the `loss()` function. + +```python +eval_correct = mnist.evaluation(logits, labels_placeholder) +``` + +The `evaluation()` function simply generates a [`tf.nn.in_top_k`](../../../api_docs/python/nn.md#in_top_k) +op that can automatically score each model output as correct if the true label +can be found in the K most-likely predictions. In this case, we set the value +of K to 1 to only consider a prediction correct if it is for the true label. + +```python +eval_correct = tf.nn.in_top_k(logits, labels, 1) +``` + +### Eval Output + +One can then create a loop for filling a `feed_dict` and calling `sess.run()` +against the `eval_correct` op to evaluate the model on the given dataset. + +```python +for step in xrange(steps_per_epoch): + feed_dict = fill_feed_dict(data_set, + images_placeholder, + labels_placeholder) + true_count += sess.run(eval_correct, feed_dict=feed_dict) +``` + +The `true_count` variable simply accumulates all of the predictions that the +`in_top_k` op has determined to be correct. From there, the precision may be +calculated from simply dividing by the total number of examples. + +```python +precision = float(true_count) / float(num_examples) +print ' Num examples: %d Num correct: %d Precision @ 1: %0.02f' % ( + num_examples, true_count, precision) +``` |