9 files changed, 1983 insertions, 0 deletions
diff --git a/tensorflow/g3doc/tutorials/mnist/__init__.py b/tensorflow/g3doc/tutorials/mnist/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/__init__.py
diff --git a/tensorflow/g3doc/tutorials/mnist/beginners/index.md b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
new file mode 100644
index 0000000000..8ccb69d977
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
@@ -0,0 +1,420 @@
+# MNIST Softmax Regression (For Beginners)
+
+*This tutorial is intended for readers who are new to both machine learning and
+TensorFlow. If you already
+know what MNIST is, and what softmax (multinomial logistic) regression is,
+you might prefer this [faster paced tutorial](../pros/index.md).*
+
+When one learns how to program, there's a tradition that the first thing you do
+is print "Hello World." Just like programming has Hello World, machine learning
+has MNIST.
+
+MNIST is a simple computer vision dataset. It consists of images of handwritten
+digits like these:
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/MNIST.png">
+</div>
+
+It also includes labels for each image, telling us which digit it is. For
+example, the labels for the above images are 5, 0, 4, and 1.
+
+In this tutorial, we're going to train a model to look at images and predict
+what digits they are. Our goal isn't to train a really elaborate model that
+achieves state-of-the-art performance -- although we'll give you code to do that
+later! -- but rather to dip a toe into using TensorFlow. As such, we're going
+to start with a very simple model, called a Softmax Regression.
+
+The actual code for this tutorial is very short, and all the interesting
+stuff happens in just three lines. However, it is very
+important to understand the ideas behind it: both how TensorFlow works and the
+core machine learning concepts. Because of this, we are going to very carefully
+work through the code.
+
+## The MNIST Data
+
+The MNIST data is hosted on
+[Yann LeCun's website](http://yann.lecun.com/exdb/mnist/).
+For your convenience, we've included some python code to download and install
+the data automatically. You can either download [the code](../input_data.py) and
+import it as below, or simply copy and paste it in.
+
+```python
+import input_data
+mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
+```
+
+The downloaded data is split into two parts, 60,000 data points of training
+data (`mnist.train`) and 10,000 points of test data (`mnist.test`).  This
+split is very important: it's essential in machine learning that we
+have separate data which we don't learn from so that we can make sure
+that what we've learned actually generalizes!
+
+As mentioned earlier, every MNIST data point has two parts: an image of a
+handwritten digit and a corresponding label. We will call the images "xs" and
+the labels "ys". Both the training set and test set contain xs and ys, for
+example the training images are `mnist.train.images` and the train labels are
+`mnist.train.labels`.
+
+Each image is 28 pixels by 28 pixels. We can interpret this as a big array of
+numbers:
+
+<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/MNIST-Matrix.png">
+</div>
+
+We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't
+matter how we flatten the array, as long as we're consistent between images.
+From this perspective, the MNIST images are just a bunch of points in a
+784-dimensional vector space, with a
+[very rich structure](http://colah.github.io/posts/2014-10-Visualizing-MNIST/)
+(warning: computationally intensive visualizations).
+
+Flattening the data throws away information about the 2D structure of the image.
+Isn't that bad? Well, the best computer vision methods do exploit this
+structure, and we will in later tutorials. But the simple method we will be
+using here, a softmax regression, won't.
+
+The result is that `mnist.train.images` is a tensor (an n-dimensional array) with a
+shape of `[60000, 784]`. The first dimension indexes the images and the second
+dimension indexes the pixels in each image. Each entry in the tensor is the
+pixel intensity between 0 and 1, for a particular pixel in a particular image.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/mnist-train-xs.png">
+</div>
+
+The corresponding labels in MNIST are numbers between 0 and 9, describing
+which digit a given image is of.
+For the purposes of this tutorial, we're going to want our labels as
+as "one-hot vectors". A one-hot vector is a vector which is 0 in most
+dimensions, and 1 in a single dimension. In this case, the $$n$$th digit will be
+represented as a vector which is 1 in the $$n$$th dimensions. For example, 0
+would be $$[1,0,0,0,0,0,0,0,0,0,0]$$.
+Consequently, `mnist.train.labels` is a
+`[60000, 10]` array of floats.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/mnist-train-ys.png">
+</div>
+
+We're now ready to actually make our model!
+
+## Softmax Regressions
+
+We know that every image in MNIST is a digit, whether it's a zero or a nine. We
+want to be able to look at an image and give probabilities for it being each
+digit. For example, our model might look at a picture of a nine and be 80% sure
+it's a nine, but give a 5% chance to it being an eight (because of the top loop)
+and a bit of probability to all the others because it isn't sure.
+
+This is a classic case where a softmax regression is a natural, simple model.
+If you want to assign probabilities to an object being one of several different
+things, softmax is the thing to do. Even later on, when we train more
+sophisticated models, the final step will be a layer of softmax.
+
+A softmax regression has two steps: first we add up the evidence of our input
+being in certain classes, and then we convert that evidence into probabilities.
+
+To tally up the evidence that a given image is in a particular class, we do a
+weighted sum of the pixel intensities. The weight is negative if that pixel
+having a high intensity is evidence against the image being in that class,
+and positive if it is evidence in favor.
+
+The following diagram shows the weights one model learned for each of these
+classes. Red represents negative weights, while blue represents positive
+weights.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-weights.png">
+</div>
+
+We also add some extra evidence called a bias. Basically, we want to be able
+to say that some things are more likely independent of the input. The result is
+that the evidence for a class $$i$$ given an input $$x$$ is:
+
+$$\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i$$
+
+where $$W_i$$ is the weights and $$b_i$$ is the bias for class $$i$$, and $$j$$
+is an index for summing over the pixels in our input image $$x$$. We then
+convert the evidence tallies into our predicted probabilities
+$$y$$ using the "softmax" function:
+
+$$y = \text{softmax}(\text{evidence})$$
+
+Here softmax is serving as an "activation" or "link" function, shaping
+the output of our linear function into the form we want -- in this case, a
+probability distribution over 10 cases.
+You can think of it as converting tallies
+of evidence into probabilities of our input being in each class.
+It's defined as:
+
+$$\text{softmax}(x) = \text{normalize}(\exp(x))$$
+
+If you expand that equation out, you get:
+
+$$\text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$
+
+But it's often more helpful to think of softmax the first way:
+exponentiating its inputs and then normalizing them. The exponentiation
+means that one unit more evidence increases the weight given to any hypothesis
+multiplicatively. And conversely, having one less unit of evidence means that a
+hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero
+or negative weight. Softmax then normalizes these weights, so that they add up
+to one, forming a valid probability distribution. (To get more intuition about
+the softmax function, check out the
+[section](http://neuralnetworksanddeeplearning.com/chap3.html#softmax)
+on it in Michael Nieslen's book, complete with an interactive visualization.)
+
+
+You can picture our softmax regression as looking something like the following,
+although with a lot more $$x$$s. For each output, we compute a weighted sum of
+the $$x$$s, add a bias, and then apply softmax.
+
+<div style="width:55%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-scalargraph.png">
+</div>
+
+If we write that out as equations, we get:
+
+<div style="width:52%; margin-left:25%; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-scalarequation.png">
+</div>
+
+We can "vectorize" this procedure, turning it into a matrix multiplication
+and vector addition. This is helpful for computational efficiency. (It's also
+a useful way to think.)
+
+<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-vectorequation.png">
+</div>
+
+More compactly, we can just write:
+
+$$y = \text{softmax}(Wx + b)$$
+
+
+## Implementing the Regression
+
+
+To do efficient numerical computing in Python, we typically use libraries like
+NumPy that do expensive operations such as matrix multiplication outside Python,
+using highly efficient code implemented in another language.
+Unfortunately, there can still be a lot of overhead from switching back to
+Python every operation. This overhead is especially bad if you want to run
+computations on GPUs or in a distributed manner, where there can be a high cost
+to transferring data.
+
+TensorFlow also does its heavy lifting outside python,
+but it takes things a step further to avoid this overhead.
+Instead of running a single expensive operation independently
+from Python, TensorFlow lets us describe a graph of interacting operations that
+run entirely outside Python. (Approaches like this can be seen in a few
+machine learning libraries.)
+
+To run computations, TensorFlow needs to connect to its backend. This connection
+is called a `Session`. To use TensorFlow, we need to import it and create a
+session.
+
+```python
+import tensorflow as tf
+sess = tf.InteractiveSession()
+```
+
+(Using an `InteractiveSession` makes TensorFlow a bit more flexible about how
+you structure your code. In particular, it's helpful for work in interactive
+contexts like iPython.)
+
+We describe these interacting operations by manipulating symbolic variables.
+Let's create one:
+
+```python
+x = tf.placeholder("float", [None, 784])
+```
+
+`x` isn't a specific value. It's a `placeholder`, a value that we'll input when
+we ask TensorFlow to run a computation. We want to be able to input any number
+of MNIST images, each flattened into a 784-dimensional vector. We represent
+this as a 2d tensor of floating point numbers, with a shape `[None, 784]`.
+(Here `None` means that a dimension can be of any length.)
+
+We also need the weights and biases for our model. We could imagine treating
+these like additional inputs, but TensorFlow has an even better way to handle
+it: `Variable`.
+A `Variable` is a modifiable tensor that lives in TensorFlow's graph of
+interacting
+operations. It can be used and even modified by the computation. For machine
+learning applications, one generally has the model parameters be `Variable`s.
+
+```python
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+```
+
+We create these `Variable`s by giving `tf.Variable` the initial value of the
+`Variable`: in this case, we initialize both `W` and `b` as tensors full of
+zeros. Since we are going to learn `W` and `b`, it doesn't matter very much
+what they initially are.
+
+Notice that `W` has a shape of [784, 10] because we want to multiply the
+784-dimensional image vectors by it to produce 10-dimensional vectors of
+evidence for the difference classes. `b` has a shape of [10] so we can add it
+to the output.
+
+We can now implement our model. It only takes one line!
+
+```python
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+```
+
+First, we multiply `x` by `W` with the expression `tf.matmul(x,W)`. This is
+flipped from when we multiplied them in our equation, where we had $$Wx$$, as a
+small trick
+to deal with `x` being a 2D tensor with multiple inputs. We then add `b`, and
+finally apply `tf.nn.softmax`.
+
+That's it. It only took us one line to define our model, after a couple short
+lines of setup. That isn't because TensorFlow is designed to make a softmax
+regression particularly easy: it's just a very flexible way to describe many
+kinds of numerical computations, from machine learning models to physics
+simulations. And once defined, our model can be run on different devices:
+your computer's CPU, GPUs, and even phones!
+
+
+## Training
+
+In order to train our model, we need to define what it means for the  model to
+be good. Well, actually, in machine learning we typically define what it means
+for a model to be bad, called the cost or loss, and then try to minimize how bad
+it is. But the two are equivalent.
+
+One very common, very nice cost function is "cross-entropy." Surprisingly,
+cross-entropy arises from thinking about information compressing codes in
+information theory but it winds up being an important idea in lots of areas,
+from gambling to machine learning. It's defined:
+
+$$H_{y'}(y) = -\sum_i y'_i \log(y_i)$$
+
+Where $$y$$ is our predicted probability distribution, and $$y'$$ is the true
+distribution (the one-hot vector we'll input).  In some rough sense, the
+cross-entropy is measuring how inefficient our predictions are for describing
+the truth. Going into more detail about cross-entropy is beyond the scope of
+this tutorial, but it's well worth
+[understanding](http://colah.github.io/posts/2015-09-Visual-Information/).
+
+To implement cross-entropy we need to first add a new placeholder to input
+the correct answers:
+
+```python
+y_ = tf.placeholder("float", [None,10])
+```
+
+Then we can implement the cross-entropy, $$-\sum y'\log(y)$$:
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+```
+
+First, `tf.log` computes the logarithm of each element of `y`. Next, we multiply
+each element of `y_` with the corresponding element of `tf.log(y_)`. Finally,
+`tf.reduce_sum` adds all the elements of the tensor. (Note that this isn't
+just the cross-entropy of the truth with a single prediction, but the sum of the
+cross-entropies for all 100 images we looked at. How well we are doing on 100
+data points is a much better description of how good our model is than a single
+data point.)
+
+Now that we know what we want our model to do, it's very easy to have TensorFlow
+train it to do so.
+Because TensorFlow know the entire graph of your computations, it
+can automatically use the [backpropagation
+algorithm](http://colah.github.io/posts/2015-08-Backprop/)
+to efficiently determine how your variables affect the cost you ask it minimize.
+Then it can apply your choice of optimization algorithm to modify the variables
+and reduce the cost.
+
+```python
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+```
+
+In this case, we ask TensorFlow to minimize `cross_entropy` using the gradient
+descent algorithm with a learning rate of 0.01. Gradient descent is a simple
+procedure, where TensorFlow simply shifts each variable a little bit in the
+direction that reduces the cost. But TensorFlow also provides
+[many other optimization algorithms]
+(../../../api_docs/python/train.md?#optimizers): using one is as simple as
+tweaking one line.
+
+What TensorFlow actually does here, behind the scenes, is it adds new operations
+to your graph which
+implement backpropagation and gradient descent. Then it gives you back a
+single operation which, when run, will do a step of gradient descent training,
+slightly tweaking your variables to reduce the cost.
+
+Now we have our model set up to train. But before we start, we need to
+initialize the variables we created:
+
+```python
+tf.initialize_all_variables().run()
+```
+
+Let's train -- we'll run the training step 1000 times!
+
+```python
+for i in range(1000):
+  batch_xs, batch_ys = mnist.train.next_batch(100)
+  train_step.run({x: batch_xs, y_: batch_ys})
+```
+
+Each step of the loop, we get a "batch" of one hundred random data points from
+our training set. We run `train_step` feeding in the batches data to replace
+the `placeholder`s.
+
+Using small batches of random data is called stochastic training -- in
+this case, stochastic gradient descent. Ideally, we'd like to use all our data
+for every step of training because that would give us a better sense of what
+we should be doing, but that's expensive. So, instead, we use a different subset
+every time. Doing this is cheap and has much of the same benefit.
+
+
+
+## Evaluating Our Model
+
+How well does our model do?
+
+Well, first let's figure out where we predicted the correct label. `tf.argmax`
+is an extremely useful function which gives you the index of the highest entry
+in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our
+model thinks is most likely for each input, while `tf.argmax(y_,1)` is the
+correct label. We can use `tf.equal` to check if our prediction matches the
+truth.
+
+```python
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+```
+
+That gives us a list of booleans. To determine what fraction are correct, we
+cast to floating point numbers and then take the mean. For example,
+`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`.
+
+```python
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+```
+
+Finally, we ask for our accuracy on our test data.
+
+```python
+print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels})
+```
+
+This should be about 91%.
+
+Is that good? Well, not really. In fact, it's pretty bad. This is because we're
+using a very simple model. With some small changes, we can get to
+97%. The best models can get to over 99.7% accuracy! (For more information, have
+a look at this
+[list of results](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html).)
+
+What matters is that we learned from this model. Still, if you're feeling a bit
+down about these results, check out [the next tutorial](../../index.md) where we
+do a lot better, and learn how to build more sophisticated models using
+TensorFlow!
diff --git a/tensorflow/g3doc/tutorials/mnist/download/index.md b/tensorflow/g3doc/tutorials/mnist/download/index.md
new file mode 100644
index 0000000000..dc11e727d8
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/download/index.md
@@ -0,0 +1,85 @@
+# Downloading MNIST
+
+Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
+
+The goal of this tutorial is to show how to download the dataset files required
+for handwritten digit classification using the (classic) MNIST data set.
+
+## Tutorial Files
+
+This tutorial references the following files:
+
+File | Purpose
+--- | ---
+[`input_data.py`](../input_data.py) | The code to download the MNIST dataset for training and evaluation.
+
+## Prepare the Data
+
+MNIST is a classic problem in machine learning. The problem is to look at
+greyscale 28x28 pixel images of handwritten digits and determine which digit
+the image represents, for all the digits from zero to nine.
+
+![MNIST Digits](../tf/mnist_digits.png "MNIST Digits")
+
+For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
+
+### Download
+
+[Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+also hosts the training and test data for download.
+
+File | Purpose
+--- | ---
+[`train-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) | training set images - 55000 training images, 5000 validation images
+[`train-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz) | training set labels matching the images
+[`t10k-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) | test set images - 10000 images
+[`t10k-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | test set labels matching the images
+
+In the `input_data.py` file, the `maybe_download()` function will ensure these
+files are downloaded into a local data folder for training.
+
+The folder name is specified in a flag variable at the top of the
+`fully_connected_feed.py` file and may be changed to fit your needs.
+
+### Unpack and Reshape
+
+The files themselves are not in any standard image format and are manually
+unpacked (following the instructions available at the website) by the
+`extract_images()` and `extract_labels()` functions in `input_data.py`.
+
+The image data is extracted into a 2d tensor of: `[image index, pixel index]`
+where each entry is the intensity value of a specific pixel in a specific
+image, rescaled from `[0, 255]` to `[-0.5, 0.5]`.  The "image index" corresponds
+to an image in the dataset, counting up from zero to the size of the dataset.
+And the "pixel index" corresponds to a specific pixel in that image, ranging
+from zero to the number of pixels in the image.
+
+The 60000 examples in the `train-*` files are then split into 55000 examples
+for training and 5000 examples for validation. For all of the 28x28
+pixel greyscale images in the datasets the image size is 784 and so the output
+tensor for the training set images is of shape `[55000, 784]`.
+
+The label data is extracted into a 1d tensor of: `[image index]`
+with the class identifier for each example as the value. For the training set
+labels, this would then be of shape `[55000]`.
+
+### DataSet Object
+
+The underlying code will download, unpack, and reshape images and labels for
+the following datasets:
+
+Dataset | Purpose
+--- | ---
+`data_sets.train` | 55000 images and labels, for primary training.
+`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy.
+`data_sets.test` | 10000 images and labels, for final testing of trained accuracy.
+
+The `read_data_sets()` function will return a dictionary with a `DataSet`
+instance for each of these three sets of data.  The `DataSet.next_batch()`
+method can be used to fetch a tuple consisting of `batch_size` lists of images
+and labels to be fed into the running TensorFlow session.
+
+```python
+images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)
+```
diff --git a/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py
new file mode 100644
index 0000000000..618c8f47cb
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py
@@ -0,0 +1,219 @@
+"""Trains and Evaluates the MNIST network using a feed dictionary.
+
+TensorFlow install instructions:
+https://tensorflow.org/get_started/os_setup.html
+
+MNIST tutorial:
+https://tensorflow.org/tutorials/mnist/tf/index.html
+
+"""
+# pylint: disable=missing-docstring
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def placeholder_inputs(batch_size):
+    """Generate placeholder variables to represent the the input tensors.
+
+    These placeholders are used as inputs by the rest of the model building
+    code and will be fed from the downloaded data in the .run() loop, below.
+
+    Args:
+      batch_size: The batch size will be baked into both placeholders.
+
+    Returns:
+      images_placeholder: Images placeholder.
+      labels_placeholder: Labels placeholder.
+    """
+    # Note that the shapes of the placeholders match the shapes of the full
+    # image and label tensors, except the first dimension is now batch_size
+    # rather than the full size of the train or test data sets.
+    images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
+                                                           mnist.IMAGE_PIXELS))
+    labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
+    return images_placeholder, labels_placeholder
+
+
+def fill_feed_dict(data_set, images_pl, labels_pl):
+    """Fills the feed_dict for training the given step.
+
+    A feed_dict takes the form of:
+    feed_dict = {
+        <placeholder>: <tensor of values to be passed for placeholder>,
+        ....
+    }
+
+    Args:
+      data_set: The set of images and labels, from input_data.read_data_sets()
+      images_pl: The images placeholder, from placeholder_inputs().
+      labels_pl: The labels placeholder, from placeholder_inputs().
+
+    Returns:
+      feed_dict: The feed dictionary mapping from placeholders to values.
+    """
+    # Create the feed_dict for the placeholders filled with the next
+    # `batch size ` examples.
+    images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,
+                                                   FLAGS.fake_data)
+    feed_dict = {
+        images_pl: images_feed,
+        labels_pl: labels_feed,
+    }
+    return feed_dict
+
+
+def do_eval(sess,
+            eval_correct,
+            images_placeholder,
+            labels_placeholder,
+            data_set):
+    """Runs one evaluation against the full epoch of data.
+
+    Args:
+      sess: The session in which the model has been trained.
+      eval_correct: The Tensor that returns the number of correct predictions.
+      images_placeholder: The images placeholder.
+      labels_placeholder: The labels placeholder.
+      data_set: The set of images and labels to evaluate, from
+        input_data.read_data_sets().
+    """
+    # And run one epoch of eval.
+    true_count = 0  # Counts the number of correct predictions.
+    steps_per_epoch = int(data_set.num_examples / FLAGS.batch_size)
+    num_examples = steps_per_epoch * FLAGS.batch_size
+    for step in xrange(steps_per_epoch):
+        feed_dict = fill_feed_dict(data_set,
+                                   images_placeholder,
+                                   labels_placeholder)
+        true_count += sess.run(eval_correct, feed_dict=feed_dict)
+    precision = float(true_count) / float(num_examples)
+    print '  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' % (
+        num_examples, true_count, precision)
+
+
+def run_training():
+    """Train MNIST for a number of steps."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        # Generate placeholders for the images and labels.
+        images_placeholder, labels_placeholder = placeholder_inputs(
+            FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images_placeholder,
+                                 FLAGS.hidden1,
+                                 FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels_placeholder)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels_placeholder)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        init = tf.initialize_all_variables()
+        sess.run(init)
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # And then after everything is built, start the training loop.
+        for step in xrange(FLAGS.max_steps):
+            start_time = time.time()
+
+            # Fill a feed dictionary with the actual set of images and labels
+            # for this particular training step.
+            feed_dict = fill_feed_dict(data_sets.train,
+                                       images_placeholder,
+                                       labels_placeholder)
+
+            # Run one step of the model.  The return values are the activations
+            # from the `train_op` (which is discarded) and the `loss` Op.  To
+            # inspect the values of your Ops or variables, you may include them
+            # in the list passed to sess.run() and the value tensors will be
+            # returned in the tuple from the call.
+            _, loss_value = sess.run([train_op, loss],
+                                     feed_dict=feed_dict)
+
+            duration = time.time() - start_time
+
+            # Write the summaries and print an overview fairly often.
+            if step % 100 == 0:
+                # Print status to stdout.
+                print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                           loss_value,
+                                                           duration)
+                # Update the events file.
+                summary_str = sess.run(summary_op, feed_dict=feed_dict)
+                summary_writer.add_summary(summary_str, step)
+
+            # Save a checkpoint and evaluate the model periodically.
+            if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
+                saver.save(sess, FLAGS.train_dir, global_step=step)
+                # Evaluate against the training set.
+                print 'Training Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.train)
+                # Evaluate against the validation set.
+                print 'Validation Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.validation)
+                # Evaluate against the test set.
+                print 'Test Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.test)
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/tutorials/mnist/input_data.py b/tensorflow/g3doc/tutorials/mnist/input_data.py
new file mode 100644
index 0000000000..88892027ff
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/input_data.py
@@ -0,0 +1,175 @@
+"""Functions for downloading and reading MNIST data."""
+import gzip
+import os
+import urllib
+
+import numpy
+
+SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
+
+
+def maybe_download(filename, work_directory):
+    """Download the data from Yann's website, unless it's already here."""
+    if not os.path.exists(work_directory):
+        os.mkdir(work_directory)
+    filepath = os.path.join(work_directory, filename)
+    if not os.path.exists(filepath):
+        filepath, _ = urllib.urlretrieve(SOURCE_URL + filename, filepath)
+        statinfo = os.stat(filepath)
+        print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.'
+    return filepath
+
+
+def _read32(bytestream):
+  dt = numpy.dtype(numpy.uint32).newbyteorder('>')
+  return numpy.frombuffer(bytestream.read(4), dtype=dt)
+
+
+def extract_images(filename):
+    """Extract the images into a 4D uint8 numpy array [index, y, x, depth]."""
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        magic = _read32(bytestream)
+        if magic != 2051:
+            raise ValueError(
+                'Invalid magic number %d in MNIST image file: %s' %
+                (magic, filename))
+        num_images = _read32(bytestream)
+        rows = _read32(bytestream)
+        cols = _read32(bytestream)
+        buf = bytestream.read(rows * cols * num_images)
+        data = numpy.frombuffer(buf, dtype=numpy.uint8)
+        data = data.reshape(num_images, rows, cols, 1)
+        return data
+
+
+def dense_to_one_hot(labels_dense, num_classes=10):
+  """Convert class labels from scalars to one-hot vectors."""
+  num_labels = labels_dense.shape[0]
+  index_offset = numpy.arange(num_labels) * num_classes
+  labels_one_hot = numpy.zeros((num_labels, num_classes))
+  labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
+  return labels_one_hot
+
+
+def extract_labels(filename, one_hot=False):
+    """Extract the labels into a 1D uint8 numpy array [index]."""
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        magic = _read32(bytestream)
+        if magic != 2049:
+            raise ValueError(
+                'Invalid magic number %d in MNIST label file: %s' %
+                (magic, filename))
+        num_items = _read32(bytestream)
+        buf = bytestream.read(num_items)
+        labels = numpy.frombuffer(buf, dtype=numpy.uint8)
+        if one_hot:
+            return dense_to_one_hot(labels)
+        return labels
+
+
+class DataSet(object):
+
+    def __init__(self, images, labels, fake_data=False):
+        if fake_data:
+            self._num_examples = 10000
+        else:
+            assert images.shape[0] == labels.shape[0], (
+                "images.shape: %s labels.shape: %s" % (images.shape,
+                                                       labels.shape))
+            self._num_examples = images.shape[0]
+
+            # Convert shape from [num examples, rows, columns, depth]
+            # to [num examples, rows*columns] (assuming depth == 1)
+            assert images.shape[3] == 1
+            images = images.reshape(images.shape[0],
+                                    images.shape[1] * images.shape[2])
+            # Convert from [0, 255] -> [0.0, 1.0].
+            images = images.astype(numpy.float32)
+            images = numpy.multiply(images, 1.0 / 255.0)
+        self._images = images
+        self._labels = labels
+        self._epochs_completed = 0
+        self._index_in_epoch = 0
+
+    @property
+    def images(self):
+        return self._images
+
+    @property
+    def labels(self):
+        return self._labels
+
+    @property
+    def num_examples(self):
+        return self._num_examples
+
+    @property
+    def epochs_completed(self):
+        return self._epochs_completed
+
+    def next_batch(self, batch_size, fake_data=False):
+        """Return the next `batch_size` examples from this data set."""
+        if fake_data:
+            fake_image = [1.0 for _ in xrange(784)]
+            fake_label = 0
+            return [fake_image for _ in xrange(batch_size)], [
+                fake_label for _ in xrange(batch_size)]
+        start = self._index_in_epoch
+        self._index_in_epoch += batch_size
+        if self._index_in_epoch > self._num_examples:
+            # Finished epoch
+            self._epochs_completed += 1
+            # Shuffle the data
+            perm = numpy.arange(self._num_examples)
+            numpy.random.shuffle(perm)
+            self._images = self._images[perm]
+            self._labels = self._labels[perm]
+            # Start next epoch
+            start = 0
+            self._index_in_epoch = batch_size
+            assert batch_size <= self._num_examples
+        end = self._index_in_epoch
+        return self._images[start:end], self._labels[start:end]
+
+
+def read_data_sets(train_dir, fake_data=False, one_hot=False):
+    class DataSets(object):
+        pass
+    data_sets = DataSets()
+
+    if fake_data:
+      data_sets.train = DataSet([], [], fake_data=True)
+      data_sets.validation = DataSet([], [], fake_data=True)
+      data_sets.test = DataSet([], [], fake_data=True)
+      return data_sets
+
+    TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'
+    TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
+    TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
+    TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
+    VALIDATION_SIZE = 5000
+
+    local_file = maybe_download(TRAIN_IMAGES, train_dir)
+    train_images = extract_images(local_file)
+
+    local_file = maybe_download(TRAIN_LABELS, train_dir)
+    train_labels = extract_labels(local_file, one_hot=one_hot)
+
+    local_file = maybe_download(TEST_IMAGES, train_dir)
+    test_images = extract_images(local_file)
+
+    local_file = maybe_download(TEST_LABELS, train_dir)
+    test_labels = extract_labels(local_file, one_hot=one_hot)
+
+    validation_images = train_images[:VALIDATION_SIZE]
+    validation_labels = train_labels[:VALIDATION_SIZE]
+    train_images = train_images[VALIDATION_SIZE:]
+    train_labels = train_labels[VALIDATION_SIZE:]
+
+    data_sets.train = DataSet(train_images, train_labels)
+    data_sets.validation = DataSet(validation_images, validation_labels)
+    data_sets.test = DataSet(test_images, test_labels)
+
+    return data_sets
diff --git a/tensorflow/g3doc/tutorials/mnist/mnist.py b/tensorflow/g3doc/tutorials/mnist/mnist.py
new file mode 100644
index 0000000000..acf4d01dd1
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/mnist.py
@@ -0,0 +1,148 @@
+"""Builds the MNIST network.
+
+Implements the inference/loss/training pattern for model building.
+
+1. inference() - Builds the model as far as is required for running the network
+forward to make predictions.
+2. loss() - Adds to the inference model the layers required to generate loss.
+3. training() - Adds to the loss model the Ops required to generate and
+apply gradients.
+
+This file is used by the various "fully_connected_*.py" files and not meant to
+be run.
+
+TensorFlow install instructions:
+https://tensorflow.org/get_started/os_setup.html
+
+MNIST tutorial:
+https://tensorflow.org/tutorials/mnist/tf/index.html
+"""
+import math
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+# The MNIST dataset has 10 classes, representing the digits 0 through 9.
+NUM_CLASSES = 10
+
+# The MNIST images are always 28x28 pixels.
+IMAGE_SIZE = 28
+IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE
+
+
+def inference(images, hidden1_units, hidden2_units):
+    """Build the MNIST model up to where it may be used for inference.
+
+    Args:
+      images: Images placeholder, from inputs().
+      hidden1: Size of the first hidden layer.
+      hidden2: Size of the second hidden layer.
+
+    Returns:
+      softmax_linear: Output tensor with the computed logits.
+    """
+    # Hidden 1
+    with tf.name_scope('hidden1') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
+                                stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([hidden1_units]),
+                             name='biases')
+        hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
+    # Hidden 2
+    with tf.name_scope('hidden2') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([hidden1_units, hidden2_units],
+                                stddev=1.0 / math.sqrt(float(hidden1_units))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([hidden2_units]),
+                             name='biases')
+        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
+    # Linear
+    with tf.name_scope('softmax_linear') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([hidden2_units, NUM_CLASSES],
+                                stddev=1.0 / math.sqrt(float(hidden2_units))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([NUM_CLASSES]),
+                             name='biases')
+        logits = tf.matmul(hidden2, weights) + biases
+    return logits
+
+
+def loss(logits, labels):
+    """Calculates the loss from the logits and the labels.
+
+    Args:
+      logits: Logits tensor, float - [batch_size, NUM_CLASSES].
+      labels: Labels tensor, int32 - [batch_size].
+
+    Returns:
+      loss: Loss tensor of type float.
+    """
+    # Convert from sparse integer labels in the range [0, NUM_CLASSSES)
+    # to 1-hot dense float vectors (that is we will have batch_size vectors,
+    # each with NUM_CLASSES values, all of which are 0.0 except there will
+    # be a 1.0 in the entry corresponding to the label).
+    batch_size = tf.size(labels)
+    labels = tf.expand_dims(labels, 1)
+    indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
+    concated = tf.concat(1, [indices, labels])
+    onehot_labels = tf.sparse_to_dense(
+        concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
+    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
+                                                            onehot_labels,
+                                                            name='xentropy')
+    loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
+    return loss
+
+
+def training(loss, learning_rate):
+    """Sets up the training Ops.
+
+    Creates a summarizer to track the loss over time in TensorBoard.
+
+    Creates an optimizer and applies the gradients to all trainable variables.
+
+    The Op returned by this function is what must be passed to the
+    `sess.run()` call to cause the model to train.
+
+    Args:
+      loss: Loss tensor, from loss().
+      learning_rate: The learning rate to use for gradient descent.
+
+    Returns:
+      train_op: The Op for training.
+    """
+    # Add a scalar summary for the snapshot loss.
+    tf.scalar_summary(loss.op.name, loss)
+    # Create the gradient descent optimizer with the given learning rate.
+    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
+    # Create a variable to track the global step.
+    global_step = tf.Variable(0, name='global_step', trainable=False)
+    # Use the optimizer to apply the gradients that minimize the loss
+    # (and also increment the global step counter) as a single training step.
+    train_op = optimizer.minimize(loss, global_step=global_step)
+    return train_op
+
+
+def evaluation(logits, labels):
+    """Evaluate the quality of the logits at predicting the label.
+
+    Args:
+      logits: Logits tensor, float - [batch_size, NUM_CLASSES].
+      labels: Labels tensor, int32 - [batch_size], with values in the
+        range [0, NUM_CLASSES).
+
+    Returns:
+      A scalar int32 tensor with the number of examples (out of batch_size)
+      that were predicted correctly.
+    """
+    # For a classifier model, we can use the in_top_k Op.
+    # It returns a bool tensor with shape [batch_size] that is true for
+    # the examples where the label's is was in the top k (here k=1)
+    # of all logits for that example.
+    correct = tf.nn.in_top_k(logits, labels, 1)
+    # Return the number of true entries.
+    return tf.reduce_sum(tf.cast(correct, tf.int32))
diff --git a/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py
new file mode 100644
index 0000000000..640ea29dac
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py
@@ -0,0 +1,33 @@
+"""A very simple MNIST classifer.
+
+See extensive documentation at ??????? (insert public URL)
+"""
+
+# Import data
+import input_data
+mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
+
+import tensorflow as tf
+sess = tf.InteractiveSession()
+
+# Create the model
+x = tf.placeholder("float", [None, 784])
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+
+# Define loss and optimizer
+y_ = tf.placeholder("float", [None,10])
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+
+# Train
+tf.initialize_all_variables().run()
+for i in range(1000):
+  batch_xs, batch_ys = mnist.train.next_batch(100)
+  train_step.run({x: batch_xs, y_: batch_ys})
+
+# Test trained model
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels})
diff --git a/tensorflow/g3doc/tutorials/mnist/pros/index.md b/tensorflow/g3doc/tutorials/mnist/pros/index.md
new file mode 100644
index 0000000000..17696712b0
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md
@@ -0,0 +1,390 @@
+# MNIST Deep Learning Example (For Experts)
+
+TensorFlow is a powerful library for doing large-scale numerical computation.
+One of the tasks at which it excels is implementing and training deep neural
+networks.
+In this tutorial we will learn the basic building blocks of a TensorFlow model
+while constructing a deep convolutional MNIST classifier.
+
+*This introduction assumes familiarity with neural networks and the MNIST
+dataset. If you don't have
+a background with them, check out the
+[introduction for beginners](../beginners/index.md).*
+
+## Setup
+
+Before we create our model, we will first load the MNIST dataset, and start a
+TensorFlow session.
+
+### Load MNIST Data
+
+For your convenience, we've included [a script](../input_data.py) which
+automatically downloads and imports the MNIST dataset. It will create a
+directory `'MNIST_data'` in which to store the data files.
+
+```python
+import input_data
+mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
+```
+
+Here `mnist` is a lightweight class which stores the training, validation, and
+testing sets as NumPy arrays.
+It also provides a function for iterating through data minibatches, which we
+will use below.
+
+### Start TensorFlow Session
+
+Tensorflow relies on a highly efficient C++ backend to do its computation. The
+connection to this backend is called a session. We will need to create a session
+before we can do any computation.
+
+```python
+import tensorflow as tf
+sess = tf.InteractiveSession()
+```
+
+Using an `InteractiveSession` makes TensorFlow more flexible about how you
+structure your code.
+It allows you to interleave operations which build a
+[computation graph](../../../get_started/basic_usage.md#the-computation-graph)
+with ones that run the graph.
+This is particularly convenient when working in interactive contexts like
+iPython.
+If you are not using an `InteractiveSession`, then you should build
+the entire computation graph before starting a session and [launching the
+graph](../../../get_started/basic_usage.md#launching-the-graph-in-a-session).
+
+#### Computation Graph
+
+To do efficient numerical computing in Python, we typically use libraries like
+NumPy that do expensive operations such as matrix multiplication outside Python,
+using highly efficient code implemented in another language.
+Unfortunately, there can still be a lot of overhead from switching back to
+Python every operation. This overhead is especially bad if you want to run
+computations on GPUs or in a distributed manner, where there can be a high cost
+to transferring data.
+
+TensorFlow also does its heavy lifting outside Python,
+but it takes things a step further to avoid this overhead.
+Instead of running a single expensive operation independently
+from Python, TensorFlow lets us describe a graph of interacting operations that
+run entirely outside Python.
+This approach is similar to that used in Theano or Torch.
+
+The role of the Python code is therefore to build this external computation
+graph, and to dictate which parts of the computation graph should be run. See
+the
+[Computation Graph](../../../get_started/basic_usage.md#the-computation-graph)
+section of
+[Basic Usage](../../../get_started/basic_usage.md)
+for more detail.
+
+## Build a Softmax Regression Model
+
+In this section we will build a softmax regression model with a single linear
+layer. In the next section, we will extend this to the case of softmax
+regression with a multilayer convolutional network.
+
+### Placeholders
+
+We start building the computation graph by creating nodes for the
+input images and target output classes.
+
+```python
+x = tf.placeholder("float", shape=[None, 784])
+y_ = tf.placeholder("float", shape=[None, 10])
+```
+
+Here `x` and `y_` aren't specific values. Rather, they are each a `placeholder`
+-- a value that we'll input when we ask TensorFlow to run a computation.
+
+The input images `x` will consist of a 2d tensor of floating point numbers.
+Here we assign it a `shape` of `[None, 784]`, where `784` is the dimensionality of
+a single flattened MNIST image, and `None` indicates that the first dimension,
+corresponding to the batch size, can be of any size.
+The target output classes `y_` will also consist of a 2d tensor,
+where each row is a one-hot 10-dimensional vector indicating
+which digit class the corresponding MNIST image belongs to.
+
+The `shape` argument to `placeholder` is optional, but it allows TensorFlow
+to automatically catch bugs stemming from inconsistent tensor shapes.
+
+### Variables
+
+We now define the weights `W` and biases `b` for our model. We could imagine treating
+these like additional inputs, but TensorFlow has an even better way to handle
+them: `Variable`.
+A `Variable` is a value that lives in TensorFlow's computation graph.
+It can be used and even modified by the computation. In machine
+learning applications, one generally has the model paramaters be `Variable`s.
+
+```python
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+```
+
+We pass the initial value for each parameter in the call to `tf.Variable`.
+In this case, we initialize both `W` and `b` as tensors full of
+zeros. `W` is a 784x10 matrix (because we have 784 input features
+and 10 outputs) and `b` is a 10-dimensional vector (because we have 10 classes).
+
+Before `Variable`s can be used within a session, they must be initialized using
+that session.
+This step takes the initial values (in this case tensors full of zeros) that
+have already been specified, and assigns them to each `Variable`. This can be
+done for all `Variables` at once.
+
+```python
+sess.run(tf.initialize_all_variables())
+```
+
+### Predicted Class and Cost Function
+
+We can now implement our regression model. It only takes one line!
+We multiply the vectorized input images `x` by the weight matrix `W`, add
+the bias `b`, and compute the softmax probabilities that are assigned to each
+class.
+
+```python
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+```
+
+The cost function to be minimized during training can be specified just as
+easily. Our cost function will be the cross-entropy between the target and the
+model's prediction.
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+```
+
+Note that `tf.reduce_sum` sums across all images in the minibatch, as well as
+all classes. We are computing the cross entropy for the entire minibatch.
+
+## Train the Model
+
+Now that we have defined our model and training cost function, it is
+straightforward to train using TensorFlow.
+Because TensorFlow knows the entire computation graph, it
+can use automatic differentiation to find the gradients of the cost with
+respect to each of the variables.
+TensorFlow has a variety of
+[builtin optimization algorithms]
+(../../../api_docs/python/train.md?#optimizers).
+For this example, we will use steepest gradient descent, with a step length of
+0.01, to descend the cross entropy.
+
+```python
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+```
+
+What TensorFlow actually did in that single line was to add new operations to
+the computation graph. These operations included ones to compute gradients,
+compute parameter update steps, and apply update steps to the parameters.
+
+The returned operation `train_step`, when run, will apply the gradient
+descent updates to the parameters. Training the model can therefore be
+accomplished by repeatedly running `train_step`.
+
+```python
+for i in range(1000):
+  batch = mnist.train.next_batch(50)
+  train_step.run(feed_dict={x: batch[0], y_: batch[1]})
+```
+
+Each training iteration we load 50 training examples. We then run the
+`train_step` operation, using `feed_dict` to replace the `placeholder` tensors
+`x` and `y_` with the training examples.
+Note that you can replace any tensor in your computation graph using `feed_dict`
+-- it's not restricted to just `placeholder`s.
+
+### Evaluate the Model
+
+How well did our model do?
+
+First we'll figure out where we predicted the correct label. `tf.argmax`
+is an extremely useful function which gives you the index of the highest entry
+in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our
+model thinks is most likely for each input, while `tf.argmax(y_,1)` is the
+true label. We can use `tf.equal` to check if our prediction matches the
+truth.
+
+```python
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+```
+
+That gives us a list of booleans. To determine what fraction are correct, we
+cast to floating point numbers and then take the mean. For example,
+`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`.
+
+```python
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+```
+
+Finally, we can evaluate our accuracy on the test data. This should be about
+91% correct.
+
+```python
+print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
+```
+
+## Build a Multilayer Convolutional Network
+
+Getting 91% accuracy on MNIST is bad. It's almost embarrassingly bad. In this
+section, we'll fix that, jumping from a very simple model to something moderatly
+sophisticated: a small convolutional neural network. This will get us to around
+99.2% accuracy -- not state of the art, but respectable.
+
+### Weight Initialization
+
+To create this model, we're going to need to create a lot of weights and biases.
+One should generally initialize weights with a small amount of noise for
+symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons,
+it is also good practice to initialize them with a slightly positive initial
+bias to avoid "dead neurons." Instead of doing this repeatedly while we build
+the model, let's create two handy functions to do it for us.
+
+```python
+def weight_variable(shape):
+  initial = tf.truncated_normal(shape, stddev=0.1)
+  return tf.Variable(initial)
+
+def bias_variable(shape):
+  initial = tf.constant(0.1, shape=shape)
+  return tf.Variable(initial)
+```
+
+### Convolution and Pooling
+
+TensorFlow also gives us a lot of flexibility in convolution and pooling
+operations. How do we handle the boundaries? What is our stride size?
+In this example, we're always going to choose the vanilla version.
+Our convolutions uses a stride of one and are zero padded so that the
+output is the same size as the input. Our pooling is plain old max pooling
+over 2x2 blocks. To keep our code cleaner, let's also abstract those operations
+into functions.
+
+```python
+def conv2d(x, W):
+  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
+
+def max_pool_2x2(x):
+  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
+                        strides=[1, 2, 2, 1], padding='SAME')
+```
+
+### First Convolutional Layer
+
+We can now implement our first layer. It will consist of convolution, followed
+by max pooling. The convolutional will compute 32 features for each 5x5 patch.
+Its weight tensor will have a shape of `[5, 5, 1, 32]`. The first two
+dimensions are the patch size, the next is the number of input channels, and
+the last is the number of output channels. We will also have a bias vector with
+a component for each output channel.
+
+```python
+W_conv1 = weight_variable([5, 5, 1, 32])
+b_conv1 = bias_variable([32])
+```
+
+To apply the layer, we first reshape `x` to a 4d tensor, with the second and
+third dimensions corresponding to image width and height, and the final
+dimension corresponding to the number of color channels.
+
+```python
+x_image = tf.reshape(x, [-1,28,28,1])
+```
+
+We then convolve `x_image` with the weight tensor, add the
+bias, apply the ReLU function, and finally max pool.
+
+```python
+h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
+h_pool1 = max_pool_2x2(h_conv1)
+```
+
+### Second Convolutional Layer
+
+In order to build a deep network, we stack several layers of this type. The
+second layer will have 64 features for each 5x5 patch.
+
+```python
+W_conv2 = weight_variable([5, 5, 32, 64])
+b_conv2 = bias_variable([64])
+
+h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
+h_pool2 = max_pool_2x2(h_conv2)
+```
+
+### Densely Connected Layer
+
+Now that the image size has been reduced to 7x7, we add a fully-connected layer
+with 1024 neurons to allow processing on the entire image. We reshape the tensor
+from the pooling layer into a batch of vectors,
+multiply by a weight matrix, add a bias, and apply a ReLU.
+
+```python
+W_fc1 = weight_variable([7 * 7 * 64, 1024])
+b_fc1 = bias_variable([1024])
+
+h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
+h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
+```
+
+#### Dropout
+
+To reduce overfitting, we will apply dropout before the readout layer.
+We create a `placeholder` for the probability that a neuron's output is kept
+during dropout. This allows us to turn dropout on during training, and turn it
+off during testing.
+TensorFlow's `tf.nn.dropout` op automatically handles scaling neuron outputs in
+addition to masking them, so dropout just works without any additional scaling.
+
+```python
+keep_prob = tf.placeholder("float")
+h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
+```
+
+### Readout Layer
+
+Finally, we add a softmax layer, just like for the one layer softmax regression
+above.
+
+```python
+W_fc2 = weight_variable([1024, 10])
+b_fc2 = bias_variable([10])
+
+y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
+```
+
+### Train and Evaluate the Model
+
+How well does this model do?
+To train and evaluate it we will use code that is nearly identical to that for
+the simple one layer SoftMax network above.
+The differences are that: we will replace the steepest gradient descent
+optimizer with the more sophisticated ADAM optimizer; we will include the
+additional parameter `keep_prob` in `feed_dict` to control the dropout rate;
+and we will add logging to every 100th iteration in the training process.
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
+train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
+correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+sess.run(tf.initialize_all_variables())
+for i in range(20000):
+  batch = mnist.train.next_batch(50)
+  if i%100 == 0:
+    train_accuracy = accuracy.eval(feed_dict={
+        x:batch[0], y_: batch[1], keep_prob: 1.0})
+    print "step %d, training accuracy %g"%(i, train_accuracy)
+  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
+
+print "test accuracy %g"%accuracy.eval(feed_dict={
+    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
+```
+
+The final test set accuracy after running this code should be approximately 99.2%.
+
+We have learned how to quickly and easily build, train, and evaluate a
+fairly sophisticated deep learning model using TensorFlow.
diff --git a/tensorflow/g3doc/tutorials/mnist/tf/index.md b/tensorflow/g3doc/tutorials/mnist/tf/index.md
new file mode 100644
index 0000000000..86f3296287
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/tf/index.md
@@ -0,0 +1,513 @@
+# Handwritten Digit Classification
+
+Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
+
+The goal of this tutorial is to show how to use TensorFlow to train and
+evaluate a simple feed-forward neural network for handwritten digit
+classification using the (classic) MNIST data set.  The intended audience for
+this tutorial is experienced machine learning users interested in using
+TensorFlow.
+
+These tutorials are not intended for teaching Machine Learning in general.
+
+Please ensure you have followed the instructions to [`Install TensorFlow`](../../../get_started/os_setup.md).
+
+## Tutorial Files
+
+This tutorial references the following files:
+
+File | Purpose
+--- | ---
+[`mnist.py`](../mnist.py) | The code to build a fully-connected MNIST model.
+[`fully_connected_feed.py`](../fully_connected_feed.py) | The main code, to train the built MNIST model against the downloaded dataset using a feed dictionary.
+
+Simply run the `fully_connected_feed.py` file directly to start training:
+
+`python fully_connected_feed.py`
+
+## Prepare the Data
+
+MNIST is a classic problem in machine learning. The problem is to look at
+greyscale 28x28 pixel images of handwritten digits and determine which digit
+the image represents, for all the digits from zero to nine.
+
+![MNIST Digits](./mnist_digits.png "MNIST Digits")
+
+For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
+
+### Download
+
+At the top of the `run_training()` method, the `input_data.read_data_sets()`
+function will ensure that the correct data has been downloaded to your local
+training folder and then unpack that data to return a dictionary of `DataSet`
+instances.
+
+```python
+data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+```
+
+**NOTE**: The `fake_data` flag is used for unit-testing purposes and may be
+safely ignored by the reader.
+
+Dataset | Purpose
+--- | ---
+`data_sets.train` | 55000 images and labels, for primary training.
+`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy.
+`data_sets.test` | 10000 images and labels, for final testing of trained accuracy.
+
+For more information about the data, please read the [`Download`](../download/index.md)
+tutorial.
+
+### Inputs and Placeholders
+
+The `placeholder_inputs()` function creates two [`tf.placeholder`](../../../api_docs/python/io_ops.md#placeholder)
+ops that define the shape of the inputs, including the `batch_size`, to the
+rest of the graph and into which the actual training examples will be fed.
+
+```python
+images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
+                                                       IMAGE_PIXELS))
+labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
+```
+
+Further down, in the training loop, the full image and label datasets are
+sliced to fit the `batch_size` for each step, matched with these placeholder
+ops, and then passed into the `sess.run()` function using the `feed_dict`
+parameter.
+
+## Build the Graph
+
+After creating placeholders for the data, the graph is built from the
+`mnist.py` file according to a 3-stage pattern: `inference()`, `loss()`, and
+`training()`.
+
+1.  `inference()` - Builds the graph as far as is required for running
+the network forward to make predictions.
+1.  `loss()` - Adds to the inference graph the ops required to generate
+loss.
+1.  `training()` - Adds to the loss graph the ops required to compute
+and apply gradients.
+
+<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" src="./mnist_subgraph.png">
+</div>
+
+### Inference
+
+The `inference()` function builds the graph as far as needed to
+return the tensor that would contain the output predictions.
+
+It takes the images placeholder as input and builds on top
+of it a pair of fully connected layers with ReLu activation followed by a ten
+node linear layer specifying the output logits.
+
+Each layer is created beneath a unique [`tf.name_scope`](../../../api_docs/python/framework.md#name_scope)
+that acts as a prefix to the items created within that scope.
+
+```python
+with tf.name_scope('hidden1') as scope:
+```
+
+Within the defined scope, the weights and biases to be used by each of these
+layers are generated into [`tf.Variable`](../../../api_docs/python/state_ops.md#Variable)
+instances, with their desired shapes:
+
+```python
+weights = tf.Variable(
+    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
+                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
+    name='weights')
+biases = tf.Variable(tf.zeros([hidden1_units]),
+                     name='biases')
+```
+
+When, for instance, these are created under the `hidden1` scope, the unique
+name given to the weights variable would be "`hidden1/weights`".
+
+Each variable is given initializer ops as part of their construction.
+
+In this most common case, the weights are initialized with the
+[`tf.truncated_normal`](../../../api_docs/python/constant_op.md#truncated_normal)
+and given their shape of a 2d tensor with
+the first dim representing the number of units in the layer from which the
+weights connect and the second dim representing the number of
+units in the layer to which the weights connect.  For the first layer, named
+`hidden1`, the dimensions are `[IMAGE_PIXELS, hidden1_units]` because the
+weights are connecting the image inputs to the hidden1 layer.  The
+`tf.truncated_normal` initializer generates a random distribution with a given
+mean and standard deviation.
+
+Then the biases are initialized with [`tf.zeros`](../../../api_docs/python/constant_op.md#zeros)
+to ensure they start with all zero values, and their shape is simply the number
+of units in the layer to which they connect.
+
+The graph's three primary ops -- two [`tf.nn.relu`](../../../api_docs/python/nn.md#relu)
+ops wrapping [`tf.matmul`](../../../api_docs/python/math_ops.md#matmul)
+for the hidden layers and one extra `tf.matmul` for the logits -- are then
+created, each in turn, with their `tf.Variable` instances connected to the
+input placeholder or the output tensor of the layer beneath each.
+
+```python
+hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
+```
+
+```python
+hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
+```
+
+```python
+logits = tf.matmul(hidden2, weights) + biases
+```
+
+Finally, the `logits` tensor that will contain the output is returned.
+
+### Loss
+
+The `loss()` function further builds the graph by adding the required loss
+ops.
+
+First, the values from the label_placeholder are encoded as a tensor of 1-hot
+values. For example, if the class identifier is '3' the value is converted to:
+<br>`[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`
+
+```python
+batch_size = tf.size(labels)
+labels = tf.expand_dims(labels, 1)
+indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
+concated = tf.concat(1, [indices, labels])
+onehot_labels = tf.sparse_to_dense(
+    concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
+```
+
+A [`tf.nn.softmax_cross_entropy_with_logits`](../../../api_docs/python/nn.md#softmax_cross_entropy_with_logits)
+op is then added to compare the output logits from the `inference()` function
+and the 1-hot labels.
+
+```python
+cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
+                                                        onehot_labels,
+                                                        name='xentropy')
+```
+
+It then uses [`tf.reduce_mean`](../../../api_docs/python/math_ops.md#reduce_mean)
+to average the cross entropy values across the batch dimension (the first
+dimension) as the total loss.
+
+```python
+loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
+```
+
+And the tensor that will then contain the loss value is returned.
+
+> Note: Cross-entropy is an idea from information theory that allows us
+> to describe how bad it is to believe the predictions of the neural network,
+> given what is actually true. For more information, read the blog post Visual
+> Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/)
+
+### Training
+
+The `training()` function adds the operations needed to minimize the loss via
+gradient descent.
+
+Firstly, it takes the loss tensor from the `loss()` function and hands it to a
+[`tf.scalar_summary`](../../../api_docs/python/train.md#scalar_summary),
+an op for generating summary values into the events file when used with a
+`SummaryWriter` (see below).  In this case, it will emit the snapshot value of
+the loss every time the summaries are written out.
+
+```python
+tf.scalar_summary(loss.op.name, loss)
+```
+
+Next, we instantiate a [`tf.train.GradientDescentOptimizer`](../../../api_docs/python/train.md#GradientDescentOptimizer)
+responsible for applying gradients with the requested learning rate.
+
+```python
+optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
+```
+
+We then generate a single variable to contain a counter for the global
+training step and the [`minimize()`](../../../api_docs/python/train.md#Optimizer.minimize)
+op is used to both update the trainable weights in the system and increment the
+global step.  This is, by convention, known as the `train_op` and is what must
+be run by a TensorFlow session in order to induce one full step of training
+(see below).
+
+```python
+global_step = tf.Variable(0, name='global_step', trainable=False)
+train_op = optimizer.minimize(loss, global_step=global_step)
+```
+
+The tensor containing the outputs of the training op is returned.
+
+## Train the Model
+
+Once the graph is built, it can be iteratively trained and evaluated in a loop
+controlled by the user code in `fully_connected_feed.py`.
+
+### The Graph
+
+At the top of the `run_training()` function is a python `with` command that
+indicates all of the built ops are to be associated with the default
+global [`tf.Graph`](../../../api_docs/python/framework.md#Graph)
+instance.
+
+```python
+with tf.Graph().as_default():
+```
+
+A `tf.Graph` is a collection of ops that may be executed together as a group.
+Most TensorFlow uses will only need to rely on the single default graph.
+
+More complicated uses with multiple graphs are possible, but beyond the scope of
+this simple tutorial.
+
+### The Session
+
+Once all of the build preparation has been completed and all of the necessary
+ops generated, a [`tf.Session`](../../../api_docs/python/client.md#Session)
+is created for running the graph.
+
+```python
+sess = tf.Session()
+```
+
+Alternately, a `Session` may be generated into a `with` block for scoping:
+
+```python
+with tf.Session() as sess:
+```
+
+The empty parameter to session indicates that this code will attach to
+(or create if not yet created) the default local session.
+
+Immediately after creating the session, all of the `tf.Variable`
+instances are initialized by calling `sess.run()` on their initialization op.
+
+```python
+init = tf.initialize_all_variables()
+sess.run(init)
+```
+
+The [`sess.run()`](../../../api_docs/python/client.md#Session.run)
+method will run the complete subset of the graph that
+corresponds to the op(s) passed as parameters.  In this first call, the `init`
+op is a [`tf.group`](../../../api_docs/python/control_flow_ops.md#group)
+that contains only the initializers for the variables.  None of the rest of the
+graph is run here, that happens in the training loop below.
+
+### Train Loop
+
+After initializing the variables with the session, training may begin.
+
+The user code controls the training per step, and the simplest loop that
+can do useful training is:
+
+```python
+for step in xrange(max_steps):
+    sess.run([train_op])
+```
+
+However, this tutorial is slightly more complicated in that it must also slice
+up the input data for each step to match the previously generated placeholders.
+
+#### Feed the Graph
+
+For each step, the code will generate a feed dictionary that will contain the
+set of examples on which to train for the step, keyed by the placeholder
+ops they represent.
+
+In the `fill_feed_dict()` function, the given `DataSet` is queried for its next
+`batch_size` set of images and labels, and tensors matching the placeholders are
+filled containing the next images and labels.
+
+```python
+images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)
+```
+
+A python dictionary object is then generated with the placeholders as keys and
+the representative feed tensors as values.
+
+```python
+feed_dict = {
+    images_placeholder: images_feed,
+    labels_placeholder: labels_feed,
+}
+```
+
+This is passed into the `sess.run()` function's `feed_dict` parameter to provide
+the input examples for this step of training.
+
+#### Check the Status
+
+The code specifies two op-tensors in its run call: `[train_op, loss]`:
+
+```python
+for step in xrange(FLAGS.max_steps):
+    feed_dict = fill_feed_dict(data_sets.train,
+                               images_placeholder,
+                               labels_placeholder)
+    _, loss_value = sess.run([train_op, loss],
+                             feed_dict=feed_dict)
+```
+
+Because there are two tensors passed as parameters, the return from
+`sess.run()` is a tuple with two items.  The returned items are themselves
+tensors, filled with the values of the passed op-tensors during this step of
+training.
+
+The value of the `train_op` is actually `None` and, thus, discarded.  But the
+value of the `loss` tensor may become NaN if the model diverges during training.
+
+Assuming that the training runs fine without NaNs, the training loop also
+prints a simple status text every 100 steps to let the user know the state of
+training.
+
+```python
+if step % 100 == 0:
+    print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)
+```
+
+#### Visualize the Status
+
+In order to emit the events files used by [TensorBoard](../../../how_tos/summaries_and_tensorboard/index.md),
+all of the summaries (in this case, only one) are collected into a single op
+during the graph building phase.
+
+```python
+summary_op = tf.merge_all_summaries()
+```
+
+And then after the Session is generated, a [`tf.train.SummaryWriter`](../../../api_docs/python/train.md#SummaryWriter)
+may be instantiated to output into the given directory the events files,
+containing the Graph itself and the values of the summaries.
+
+```python
+summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                        graph_def=sess.graph_def)
+```
+
+Lastly, the events file will be updated with new summary values every time the
+`summary_op` is run and the ouput passed to the writer's `add_summary()`
+function.
+
+```python
+summary_str = sess.run(summary_op, feed_dict=feed_dict)
+summary_writer.add_summary(summary_str, step)
+```
+
+When the events files are written, TensorBoard may be run against the training
+folder to display the values from the summaries.
+
+![MNIST TensorBoard](./mnist_tensorboard.png "MNIST TensorBoard")
+
+**NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial [Tensorboard: Visualizing Your Training](../../../how_tos/summaries_and_tensorboard/index.md).
+
+#### Save a Checkpoint
+
+In order to emit a checkpoint file that may be used to later restore a model
+for further training or evaluation, we instantiate a
+[`tf.train.Saver`](../../../api_docs/python/state_ops.md#Saver).
+
+```python
+saver = tf.train.Saver()
+```
+
+In the training loop, the [`saver.save()`](../../../api_docs/python/state_ops.md#Saver.save)
+method will periodically be called to write a checkpoint file to the training
+directory with the current values of all the trainable variables.
+
+```python
+saver.save(sess, FLAGS.train_dir, global_step=step)
+```
+
+At some later point in the future, training might be resumed by using the
+[`saver.restore()`](../../../api_docs/python/state_ops.md#Saver.restore)
+method to reload the model parameters.
+
+```python
+saver.restore(sess, FLAGS.train_dir)
+```
+
+## Evaluate the Model
+
+Every thousand steps, the code will attempt to evaluate the model against both
+the training and test datasets.  The `do_eval()` function is called thrice, for
+the training, validation, and test datasets.
+
+```python
+print 'Training Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.train)
+print 'Validation Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.validation)
+print 'Test Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.test)
+```
+
+> Note that more complicated usage would usually sequester the `data_sets.test`
+> to only be checked after significant amounts of hyperparameter tuning.  For
+> the sake of a simple little MNIST problem, however, we evaluate against all of
+> the data.
+
+### Build the Eval Graph
+
+Before opening the default Graph, the test data should have been fetched by
+calling the `get_data(train=False)` function with the parameter set to grab
+the test dataset.
+
+```python
+test_all_images, test_all_labels = get_data(train=False)
+```
+
+Before entering the training loop, the Eval op should have been built
+by calling the `evaluation()` function from `mnist.py` with the same
+logits/labels parameters as the `loss()` function.
+
+```python
+eval_correct = mnist.evaluation(logits, labels_placeholder)
+```
+
+The `evaluation()` function simply generates a [`tf.nn.in_top_k`](../../../api_docs/python/nn.md#in_top_k)
+op that can automatically score each model output as correct if the true label
+can be found in the K most-likely predictions.  In this case, we set the value
+of K to 1 to only consider a prediction correct if it is for the true label.
+
+```python
+eval_correct = tf.nn.in_top_k(logits, labels, 1)
+```
+
+### Eval Output
+
+One can then create a loop for filling a `feed_dict` and calling `sess.run()`
+against the `eval_correct` op to evaluate the model on the given dataset.
+
+```python
+for step in xrange(steps_per_epoch):
+    feed_dict = fill_feed_dict(data_set,
+                               images_placeholder,
+                               labels_placeholder)
+    true_count += sess.run(eval_correct, feed_dict=feed_dict)
+```
+
+The `true_count` variable simply accumulates all of the predictions that the
+`in_top_k` op has determined to be correct.  From there, the precision may be
+calculated from simply dividing by the total number of examples.
+
+```python
+precision = float(true_count) / float(num_examples)
+print '  Num examples: %d  Num correct: %d  Precision @ 1: %0.02f' % (
+    num_examples, true_count, precision)
+```