diff options
Diffstat (limited to 'tensorflow/g3doc/tutorials/mnist/pros')
-rw-r--r-- | tensorflow/g3doc/tutorials/mnist/pros/index.md | 390 |
1 files changed, 390 insertions, 0 deletions
diff --git a/tensorflow/g3doc/tutorials/mnist/pros/index.md b/tensorflow/g3doc/tutorials/mnist/pros/index.md new file mode 100644 index 0000000000..17696712b0 --- /dev/null +++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md @@ -0,0 +1,390 @@ +# MNIST Deep Learning Example (For Experts) + +TensorFlow is a powerful library for doing large-scale numerical computation. +One of the tasks at which it excels is implementing and training deep neural +networks. +In this tutorial we will learn the basic building blocks of a TensorFlow model +while constructing a deep convolutional MNIST classifier. + +*This introduction assumes familiarity with neural networks and the MNIST +dataset. If you don't have +a background with them, check out the +[introduction for beginners](../beginners/index.md).* + +## Setup + +Before we create our model, we will first load the MNIST dataset, and start a +TensorFlow session. + +### Load MNIST Data + +For your convenience, we've included [a script](../input_data.py) which +automatically downloads and imports the MNIST dataset. It will create a +directory `'MNIST_data'` in which to store the data files. + +```python +import input_data +mnist = input_data.read_data_sets('MNIST_data', one_hot=True) +``` + +Here `mnist` is a lightweight class which stores the training, validation, and +testing sets as NumPy arrays. +It also provides a function for iterating through data minibatches, which we +will use below. + +### Start TensorFlow Session + +Tensorflow relies on a highly efficient C++ backend to do its computation. The +connection to this backend is called a session. We will need to create a session +before we can do any computation. + +```python +import tensorflow as tf +sess = tf.InteractiveSession() +``` + +Using an `InteractiveSession` makes TensorFlow more flexible about how you +structure your code. +It allows you to interleave operations which build a +[computation graph](../../../get_started/basic_usage.md#the-computation-graph) +with ones that run the graph. +This is particularly convenient when working in interactive contexts like +iPython. +If you are not using an `InteractiveSession`, then you should build +the entire computation graph before starting a session and [launching the +graph](../../../get_started/basic_usage.md#launching-the-graph-in-a-session). + +#### Computation Graph + +To do efficient numerical computing in Python, we typically use libraries like +NumPy that do expensive operations such as matrix multiplication outside Python, +using highly efficient code implemented in another language. +Unfortunately, there can still be a lot of overhead from switching back to +Python every operation. This overhead is especially bad if you want to run +computations on GPUs or in a distributed manner, where there can be a high cost +to transferring data. + +TensorFlow also does its heavy lifting outside Python, +but it takes things a step further to avoid this overhead. +Instead of running a single expensive operation independently +from Python, TensorFlow lets us describe a graph of interacting operations that +run entirely outside Python. +This approach is similar to that used in Theano or Torch. + +The role of the Python code is therefore to build this external computation +graph, and to dictate which parts of the computation graph should be run. See +the +[Computation Graph](../../../get_started/basic_usage.md#the-computation-graph) +section of +[Basic Usage](../../../get_started/basic_usage.md) +for more detail. + +## Build a Softmax Regression Model + +In this section we will build a softmax regression model with a single linear +layer. In the next section, we will extend this to the case of softmax +regression with a multilayer convolutional network. + +### Placeholders + +We start building the computation graph by creating nodes for the +input images and target output classes. + +```python +x = tf.placeholder("float", shape=[None, 784]) +y_ = tf.placeholder("float", shape=[None, 10]) +``` + +Here `x` and `y_` aren't specific values. Rather, they are each a `placeholder` +-- a value that we'll input when we ask TensorFlow to run a computation. + +The input images `x` will consist of a 2d tensor of floating point numbers. +Here we assign it a `shape` of `[None, 784]`, where `784` is the dimensionality of +a single flattened MNIST image, and `None` indicates that the first dimension, +corresponding to the batch size, can be of any size. +The target output classes `y_` will also consist of a 2d tensor, +where each row is a one-hot 10-dimensional vector indicating +which digit class the corresponding MNIST image belongs to. + +The `shape` argument to `placeholder` is optional, but it allows TensorFlow +to automatically catch bugs stemming from inconsistent tensor shapes. + +### Variables + +We now define the weights `W` and biases `b` for our model. We could imagine treating +these like additional inputs, but TensorFlow has an even better way to handle +them: `Variable`. +A `Variable` is a value that lives in TensorFlow's computation graph. +It can be used and even modified by the computation. In machine +learning applications, one generally has the model paramaters be `Variable`s. + +```python +W = tf.Variable(tf.zeros([784,10])) +b = tf.Variable(tf.zeros([10])) +``` + +We pass the initial value for each parameter in the call to `tf.Variable`. +In this case, we initialize both `W` and `b` as tensors full of +zeros. `W` is a 784x10 matrix (because we have 784 input features +and 10 outputs) and `b` is a 10-dimensional vector (because we have 10 classes). + +Before `Variable`s can be used within a session, they must be initialized using +that session. +This step takes the initial values (in this case tensors full of zeros) that +have already been specified, and assigns them to each `Variable`. This can be +done for all `Variables` at once. + +```python +sess.run(tf.initialize_all_variables()) +``` + +### Predicted Class and Cost Function + +We can now implement our regression model. It only takes one line! +We multiply the vectorized input images `x` by the weight matrix `W`, add +the bias `b`, and compute the softmax probabilities that are assigned to each +class. + +```python +y = tf.nn.softmax(tf.matmul(x,W) + b) +``` + +The cost function to be minimized during training can be specified just as +easily. Our cost function will be the cross-entropy between the target and the +model's prediction. + +```python +cross_entropy = -tf.reduce_sum(y_*tf.log(y)) +``` + +Note that `tf.reduce_sum` sums across all images in the minibatch, as well as +all classes. We are computing the cross entropy for the entire minibatch. + +## Train the Model + +Now that we have defined our model and training cost function, it is +straightforward to train using TensorFlow. +Because TensorFlow knows the entire computation graph, it +can use automatic differentiation to find the gradients of the cost with +respect to each of the variables. +TensorFlow has a variety of +[builtin optimization algorithms] +(../../../api_docs/python/train.md?#optimizers). +For this example, we will use steepest gradient descent, with a step length of +0.01, to descend the cross entropy. + +```python +train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) +``` + +What TensorFlow actually did in that single line was to add new operations to +the computation graph. These operations included ones to compute gradients, +compute parameter update steps, and apply update steps to the parameters. + +The returned operation `train_step`, when run, will apply the gradient +descent updates to the parameters. Training the model can therefore be +accomplished by repeatedly running `train_step`. + +```python +for i in range(1000): + batch = mnist.train.next_batch(50) + train_step.run(feed_dict={x: batch[0], y_: batch[1]}) +``` + +Each training iteration we load 50 training examples. We then run the +`train_step` operation, using `feed_dict` to replace the `placeholder` tensors +`x` and `y_` with the training examples. +Note that you can replace any tensor in your computation graph using `feed_dict` +-- it's not restricted to just `placeholder`s. + +### Evaluate the Model + +How well did our model do? + +First we'll figure out where we predicted the correct label. `tf.argmax` +is an extremely useful function which gives you the index of the highest entry +in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our +model thinks is most likely for each input, while `tf.argmax(y_,1)` is the +true label. We can use `tf.equal` to check if our prediction matches the +truth. + +```python +correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) +``` + +That gives us a list of booleans. To determine what fraction are correct, we +cast to floating point numbers and then take the mean. For example, +`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`. + +```python +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +``` + +Finally, we can evaluate our accuracy on the test data. This should be about +91% correct. + +```python +print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}) +``` + +## Build a Multilayer Convolutional Network + +Getting 91% accuracy on MNIST is bad. It's almost embarrassingly bad. In this +section, we'll fix that, jumping from a very simple model to something moderatly +sophisticated: a small convolutional neural network. This will get us to around +99.2% accuracy -- not state of the art, but respectable. + +### Weight Initialization + +To create this model, we're going to need to create a lot of weights and biases. +One should generally initialize weights with a small amount of noise for +symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons, +it is also good practice to initialize them with a slightly positive initial +bias to avoid "dead neurons." Instead of doing this repeatedly while we build +the model, let's create two handy functions to do it for us. + +```python +def weight_variable(shape): + initial = tf.truncated_normal(shape, stddev=0.1) + return tf.Variable(initial) + +def bias_variable(shape): + initial = tf.constant(0.1, shape=shape) + return tf.Variable(initial) +``` + +### Convolution and Pooling + +TensorFlow also gives us a lot of flexibility in convolution and pooling +operations. How do we handle the boundaries? What is our stride size? +In this example, we're always going to choose the vanilla version. +Our convolutions uses a stride of one and are zero padded so that the +output is the same size as the input. Our pooling is plain old max pooling +over 2x2 blocks. To keep our code cleaner, let's also abstract those operations +into functions. + +```python +def conv2d(x, W): + return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') + +def max_pool_2x2(x): + return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], + strides=[1, 2, 2, 1], padding='SAME') +``` + +### First Convolutional Layer + +We can now implement our first layer. It will consist of convolution, followed +by max pooling. The convolutional will compute 32 features for each 5x5 patch. +Its weight tensor will have a shape of `[5, 5, 1, 32]`. The first two +dimensions are the patch size, the next is the number of input channels, and +the last is the number of output channels. We will also have a bias vector with +a component for each output channel. + +```python +W_conv1 = weight_variable([5, 5, 1, 32]) +b_conv1 = bias_variable([32]) +``` + +To apply the layer, we first reshape `x` to a 4d tensor, with the second and +third dimensions corresponding to image width and height, and the final +dimension corresponding to the number of color channels. + +```python +x_image = tf.reshape(x, [-1,28,28,1]) +``` + +We then convolve `x_image` with the weight tensor, add the +bias, apply the ReLU function, and finally max pool. + +```python +h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) +h_pool1 = max_pool_2x2(h_conv1) +``` + +### Second Convolutional Layer + +In order to build a deep network, we stack several layers of this type. The +second layer will have 64 features for each 5x5 patch. + +```python +W_conv2 = weight_variable([5, 5, 32, 64]) +b_conv2 = bias_variable([64]) + +h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) +h_pool2 = max_pool_2x2(h_conv2) +``` + +### Densely Connected Layer + +Now that the image size has been reduced to 7x7, we add a fully-connected layer +with 1024 neurons to allow processing on the entire image. We reshape the tensor +from the pooling layer into a batch of vectors, +multiply by a weight matrix, add a bias, and apply a ReLU. + +```python +W_fc1 = weight_variable([7 * 7 * 64, 1024]) +b_fc1 = bias_variable([1024]) + +h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) +h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) +``` + +#### Dropout + +To reduce overfitting, we will apply dropout before the readout layer. +We create a `placeholder` for the probability that a neuron's output is kept +during dropout. This allows us to turn dropout on during training, and turn it +off during testing. +TensorFlow's `tf.nn.dropout` op automatically handles scaling neuron outputs in +addition to masking them, so dropout just works without any additional scaling. + +```python +keep_prob = tf.placeholder("float") +h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) +``` + +### Readout Layer + +Finally, we add a softmax layer, just like for the one layer softmax regression +above. + +```python +W_fc2 = weight_variable([1024, 10]) +b_fc2 = bias_variable([10]) + +y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) +``` + +### Train and Evaluate the Model + +How well does this model do? +To train and evaluate it we will use code that is nearly identical to that for +the simple one layer SoftMax network above. +The differences are that: we will replace the steepest gradient descent +optimizer with the more sophisticated ADAM optimizer; we will include the +additional parameter `keep_prob` in `feed_dict` to control the dropout rate; +and we will add logging to every 100th iteration in the training process. + +```python +cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) +train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) +correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) +accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) +sess.run(tf.initialize_all_variables()) +for i in range(20000): + batch = mnist.train.next_batch(50) + if i%100 == 0: + train_accuracy = accuracy.eval(feed_dict={ + x:batch[0], y_: batch[1], keep_prob: 1.0}) + print "step %d, training accuracy %g"%(i, train_accuracy) + train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) + +print "test accuracy %g"%accuracy.eval(feed_dict={ + x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}) +``` + +The final test set accuracy after running this code should be approximately 99.2%. + +We have learned how to quickly and easily build, train, and evaluate a +fairly sophisticated deep learning model using TensorFlow. |