TensorFlow: Upstream commits to git.

Changes: - More documentation edits, fixes to anchors, fixes to mathjax, new images, etc. - Add rnn models to pip install package. Base CL: 107312343
author: Vijay Vasudevan <vrv@google.com> 2015-11-07 13:58:24 -0800
committer: Vijay Vasudevan <vrv@google.com> 2015-11-07 13:58:24 -0800
commit: fddaed524622417900d745fe8f115562c55ac49a (patch)
tree: cabb2fc16540a27748b60329195966d535f48837 /tensorflow/g3doc/tutorials
parent: 7de9099a739c9dc62b1ca55c1eeef90acbfa7be9 (diff)
11 files changed, 151 insertions, 152 deletions
diff --git a/tensorflow/g3doc/tutorials/deep_cnn/index.md b/tensorflow/g3doc/tutorials/deep_cnn/index.md
index 906093009e..be23e7ccaa 100644
--- a/tensorflow/g3doc/tutorials/deep_cnn/index.md
+++ b/tensorflow/g3doc/tutorials/deep_cnn/index.md
@@ -1,9 +1,9 @@
-# Convolutional Neural Networks
+# Convolutional Neural Networks <a class="md-anchor" id="AUTOGENERATED-convolutional-neural-networks"></a>
 
 **NOTE:** This tutorial is intended for *advanced* users of TensorFlow
 and assumes expertise and experience in machine learning.
 
-## Overview
+## Overview <a class="md-anchor" id="AUTOGENERATED-overview"></a>
 
 CIFAR-10 classification is a common benchmark problem in machine learning.  The
 problem is to classify RGB 32x32 pixel images across 10 categories:
@@ -15,7 +15,7 @@ For more details refer to the [CIFAR-10 page](http://www.cs.toronto.edu/~kriz/ci
 and a [Tech Report](http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf)
 by Alex Krizhevsky.
 
-### Goals
+### Goals <a class="md-anchor" id="AUTOGENERATED-goals"></a>
 
 The goal of this tutorial is to build a relatively small convolutional neural
 network (CNN) for recognizing images. In the process this tutorial:
@@ -29,7 +29,7 @@ exercise much of TensorFlow's ability to scale to large models. At the same
 time, the model is small enough to train fast in order to test new ideas and
 experiments.
 
-### Highlights of the Tutorial
+### Highlights of the Tutorial <a class="md-anchor" id="AUTOGENERATED-highlights-of-the-tutorial"></a>
 The CIFAR-10 tutorial demonstrates several important constructs for
 designing larger and more sophisticated models in TensorFlow:
 
@@ -60,7 +60,7 @@ We also provide a multi-GPU version of the model which demonstrates:
 We hope that this tutorial provides a launch point for building larger CNNs for
 vision tasks on TensorFlow.
 
-### Model Architecture
+### Model Architecture <a class="md-anchor" id="AUTOGENERATED-model-architecture"></a>
 
 The model in this CIFAR-10 tutorial is a multi-layer architecture consisting of
 alternating convolutions and nonlinearities. These layers are followed by fully
@@ -74,7 +74,7 @@ of training time on a GPU. Please see [below](#evaluating-a-model) and the code
 for details.  It consists of 1,068,298 learnable parameters and requires about
 19.5M multiply-add operations to compute inference on a single image.
 
-## Code Organization
+## Code Organization <a class="md-anchor" id="AUTOGENERATED-code-organization"></a>
 
 The code for this tutorial resides in
 [`tensorflow/models/image/cifar10/`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/).
@@ -88,7 +88,7 @@ File | Purpose
 [`cifar10_eval.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_eval.py) | Evaluates the predictive performance of a CIFAR-10 model.
 
 
-## CIFAR-10 Model
+## CIFAR-10 Model <a class="md-anchor" id="AUTOGENERATED-cifar-10-model"></a>
 
 The CIFAR-10 network is largely contained in
 [`cifar10.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py).
@@ -105,7 +105,7 @@ adds operations that perform inference, i.e. classification, on supplied images.
 add operations that compute the loss,
 gradients, variable updates and visualization summaries.
 
-### Model Inputs
+### Model Inputs <a class="md-anchor" id="AUTOGENERATED-model-inputs"></a>
 
 The input part of the model is built by the functions `inputs()` and
 `distorted_inputs()` which read images from the CIFAR-10 binary data files.
@@ -143,7 +143,7 @@ processing time. To prevent these operations from slowing down training, we run
 them inside 16 separate threads which continuously fill a TensorFlow
 [queue](../../api_docs/python/io_ops.md#shuffle_batch).
 
-### Model Prediction
+### Model Prediction <a class="md-anchor" id="AUTOGENERATED-model-prediction"></a>
 
 The prediction part of the model is constructed by the `inference()` function
 which adds operations to compute the *logits* of the predictions. That part of
@@ -181,7 +181,7 @@ the CIFAR-10 model specified in
 layers are locally connected and not fully connected. Try editing the
 architecture to exactly replicate that fully connected model.
 
-### Model Training
+### Model Training <a class="md-anchor" id="AUTOGENERATED-model-training"></a>
 
 The usual method for training a network to perform N-way classification is
 [multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression),
@@ -199,7 +199,7 @@ loss and all these weight decay terms, as returned by the `loss()` function.
 We visualize it in TensorBoard with a [scalar_summary](../../api_docs/python/train.md?#scalar_summary):
 
 ![CIFAR-10 Loss](./cifar_loss.png "CIFAR-10 Total Loss")
-###### [View this TensorBoard live! (Chrome/FF)](/tensorboard/cifar.html)
+###### [View this TensorBoard live! (Chrome/FF)](/tensorboard/cifar.html) <a class="md-anchor" id="AUTOGENERATED--view-this-tensorboard-live---chrome-ff----tensorboard-cifar.html-"></a>
 
 We train the model using standard
 [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent)
@@ -209,7 +209,7 @@ with a learning rate that
 over time.
 
 ![CIFAR-10 Learning Rate Decay](./cifar_lr_decay.png "CIFAR-10 Learning Rate Decay")
-###### [View this TensorBoard live! (Chrome/FF)](/tensorboard/cifar.html)
+###### [View this TensorBoard live! (Chrome/FF)](/tensorboard/cifar.html) <a class="md-anchor" id="AUTOGENERATED--view-this-tensorboard-live---chrome-ff----tensorboard-cifar.html-"></a>
 
 The `train()` function adds the operations needed to minimize the objective by
 calculating the gradient and updating the learned variables (see
@@ -217,7 +217,7 @@ calculating the gradient and updating the learned variables (see
 for details).  It returns an operation that executes all of the calculations
 needed to train and update the model for one batch of images.
 
-## Launching and Training the Model
+## Launching and Training the Model <a class="md-anchor" id="AUTOGENERATED-launching-and-training-the-model"></a>
 
 We have built the model, let's now launch it and run the training operation with
 the script `cifar10_train.py`.
@@ -302,7 +302,7 @@ values.  See how the scripts use
 [ExponentialMovingAverage](../../api_docs/python/train.md#ExponentialMovingAverage)
 for this purpose.
 
-## Evaluating a Model
+## Evaluating a Model <a class="md-anchor" id="AUTOGENERATED-evaluating-a-model"></a>
 
 Let us now evaluate how well the trained model performs on a hold-out data set.
 the model is evaluated by the script `cifar10_eval.py`.  It constructs the model
@@ -346,7 +346,7 @@ the averaged parameters for the model and verify that the predictive performance
 drops.
 
 
-## Training a Model Using Multiple GPU Cards
+## Training a Model Using Multiple GPU Cards <a class="md-anchor" id="AUTOGENERATED-training-a-model-using-multiple-gpu-cards"></a>
 
 Modern workstations may contain multiple GPUs for scientific computation.
 TensorFlow can leverage this environment to run the training operation
@@ -390,7 +390,7 @@ The GPUs are synchronized in operation. All gradients are accumulated from
 the GPUs and averaged (see green box). The model parameters are updated with
 the gradients averaged across all model replicas.
 
-### Placing Variables and Operations on Devices
+### Placing Variables and Operations on Devices <a class="md-anchor" id="AUTOGENERATED-placing-variables-and-operations-on-devices"></a>
 
 Placing operations and variables on devices requires some special
 abstractions.
@@ -414,7 +414,7 @@ All variables are pinned to the CPU and accessed via
 in order to share them in a multi-GPU version.
 See how-to on [Sharing Variables](../../how_tos/variable_scope/index.md).
 
-### Launching and Training the Model on Multiple GPU cards
+### Launching and Training the Model on Multiple GPU cards <a class="md-anchor" id="AUTOGENERATED-launching-and-training-the-model-on-multiple-gpu-cards"></a>
 
 If you have several GPU cards installed on your machine you can use them to
 train the model faster with the `cifar10_multi_gpu_train.py` script.  It is a
@@ -446,7 +446,7 @@ you ask for more.
 run on a batch size of 128. Try running `cifar10_multi_gpu_train.py` on 2 GPUs
 with a batch size of 64 and compare the training speed.
 
-## Next Steps
+## Next Steps <a class="md-anchor" id="AUTOGENERATED-next-steps"></a>
 
 [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0). You have
 completed the CIFAR-10 tutorial.
diff --git a/tensorflow/g3doc/tutorials/index.md b/tensorflow/g3doc/tutorials/index.md
index 202b87c73c..4ee9ad0497 100644
--- a/tensorflow/g3doc/tutorials/index.md
+++ b/tensorflow/g3doc/tutorials/index.md
@@ -1,7 +1,7 @@
-# Overview
+# Overview <a class="md-anchor" id="AUTOGENERATED-overview"></a>
 
 
-## MNIST For ML Beginners
+## MNIST For ML Beginners <a class="md-anchor" id="AUTOGENERATED-mnist-for-ml-beginners"></a>
 
 If you're new to machine learning, we recommend starting here.  You'll learn
 about a classic problem, handwritten digit classification (MNIST), and get a
@@ -10,7 +10,7 @@ gentle introduction to multiclass classification.
 [View Tutorial](mnist/beginners/index.md)
 
 
-## Deep MNIST for Experts
+## Deep MNIST for Experts <a class="md-anchor" id="AUTOGENERATED-deep-mnist-for-experts"></a>
 
 If you're already familiar with other deep learning software packages, and are
 already familiar with MNIST, this tutorial with give you a very brief primer on
@@ -19,7 +19,7 @@ TensorFlow.
 [View Tutorial](mnist/pros/index.md)
 
 
-## TensorFlow Mechanics 101
+## TensorFlow Mechanics 101 <a class="md-anchor" id="AUTOGENERATED-tensorflow-mechanics-101"></a>
 
 This is a technical tutorial, where we walk you through the details of using
 TensorFlow infrastructure to train models at scale.  We use again MNIST as the
@@ -28,7 +28,7 @@ example.
 [View Tutorial](mnist/tf/index.md)
 
 
-## Convolutional Neural Networks
+## Convolutional Neural Networks <a class="md-anchor" id="AUTOGENERATED-convolutional-neural-networks"></a>
 
 An introduction to convolutional neural networks using the CIFAR-10 data set.
 Convolutional neural nets are particularly tailored to images, since they
@@ -38,7 +38,7 @@ representations of visual content.
 [View Tutorial](deep_cnn/index.md)
 
 
-## Vector Representations of Words
+## Vector Representations of Words <a class="md-anchor" id="AUTOGENERATED-vector-representations-of-words"></a>
 
 This tutorial motivates why it is useful to learn to represent words as vectors
 (called *word embeddings*). It introduces the word2vec model as an efficient
@@ -49,7 +49,7 @@ embeddings).
 [View Tutorial](word2vec/index.md)
 
 
-## Recurrent Neural Networks
+## Recurrent Neural Networks <a class="md-anchor" id="AUTOGENERATED-recurrent-neural-networks"></a>
 
 An introduction to RNNs, wherein we train an LSTM network to predict the next
 word in an English sentence.  (A task sometimes called language modeling.)
@@ -57,7 +57,7 @@ word in an English sentence.  (A task sometimes called language modeling.)
 [View Tutorial](recurrent/index.md)
 
 
-## Sequence-to-Sequence Models
+## Sequence-to-Sequence Models <a class="md-anchor" id="AUTOGENERATED-sequence-to-sequence-models"></a>
 
 A follow on to the RNN tutorial, where we assemble a sequence-to-sequence model
 for machine translation.  You will learn to build your own English-to-French
@@ -66,7 +66,7 @@ translator, entirely machine learned, end-to-end.
 [View Tutorial](seq2seq/index.md)
 
 
-## Mandelbrot Set
+## Mandelbrot Set <a class="md-anchor" id="AUTOGENERATED-mandelbrot-set"></a>
 
 TensorFlow can be used for computation that has nothing to do with machine
 learning.  Here's a naive implementation of Mandelbrot set visualization.
@@ -74,7 +74,7 @@ learning.  Here's a naive implementation of Mandelbrot set visualization.
 [View Tutorial](mandelbrot/index.md)
 
 
-## Partial Differential Equations
+## Partial Differential Equations <a class="md-anchor" id="AUTOGENERATED-partial-differential-equations"></a>
 
 As another example of non-machine learning computation, we offer an example of
 a naive PDE simulation of raindrops landing on a pond.
@@ -82,7 +82,7 @@ a naive PDE simulation of raindrops landing on a pond.
 [View Tutorial](pdes/index.md)
 
 
-## MNIST Data Download
+## MNIST Data Download <a class="md-anchor" id="AUTOGENERATED-mnist-data-download"></a>
 
 Details about downloading the MNIST handwritten digits data set.  Exciting
 stuff.
@@ -90,7 +90,7 @@ stuff.
 [View Tutorial](mnist/download/index.md)
 
 
-## Visual Object Recognition
+## Visual Object Recognition <a class="md-anchor" id="AUTOGENERATED-visual-object-recognition"></a>
 
 We will be releasing our state-of-the-art Inception object recognition model,
 complete and already trained.
@@ -98,7 +98,7 @@ complete and already trained.
 COMING SOON
 
 
-## Deep Dream Visual Hallucinations
+## Deep Dream Visual Hallucinations <a class="md-anchor" id="AUTOGENERATED-deep-dream-visual-hallucinations"></a>
 
 Building on the Inception recognition model, we will release a TensorFlow
 version of the [Deep Dream](https://github.com/google/deepdream) neural network
diff --git a/tensorflow/g3doc/tutorials/mandelbrot/index.md b/tensorflow/g3doc/tutorials/mandelbrot/index.md
index b3d5a185f9..fa06e6b882 100755
--- a/tensorflow/g3doc/tutorials/mandelbrot/index.md
+++ b/tensorflow/g3doc/tutorials/mandelbrot/index.md
@@ -1,4 +1,4 @@
-# Mandelbrot Set
+# Mandelbrot Set <a class="md-anchor" id="AUTOGENERATED-mandelbrot-set"></a>
 
 ```
 #Import libraries for simulation
diff --git a/tensorflow/g3doc/tutorials/mnist/beginners/index.md b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
index fff7484959..eddd4f324a 100644
--- a/tensorflow/g3doc/tutorials/mnist/beginners/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
@@ -1,4 +1,4 @@
-# MNIST For ML Beginners
+# MNIST For ML Beginners <a class="md-anchor" id="AUTOGENERATED-mnist-for-ml-beginners"></a>
 
 *This tutorial is intended for readers who are new to both machine learning and
 TensorFlow. If you already
@@ -31,7 +31,7 @@ important to understand the ideas behind it: both how TensorFlow works and the
 core machine learning concepts. Because of this, we are going to very carefully
 work through the code.
 
-## The MNIST Data
+## The MNIST Data <a class="md-anchor" id="AUTOGENERATED-the-mnist-data"></a>
 
 The MNIST data is hosted on
 [Yann LeCun's website](http://yann.lecun.com/exdb/mnist/).
@@ -88,9 +88,9 @@ The corresponding labels in MNIST are numbers between 0 and 9, describing
 which digit a given image is of.
 For the purposes of this tutorial, we're going to want our labels as
 as "one-hot vectors". A one-hot vector is a vector which is 0 in most
-dimensions, and 1 in a single dimension. In this case, the $$n$$th digit will be
-represented as a vector which is 1 in the $$n$$th dimensions. For example, 0
-would be $$[1,0,0,0,0,0,0,0,0,0,0]$$.
+dimensions, and 1 in a single dimension. In this case, the \(n\)th digit will be
+represented as a vector which is 1 in the \(n\)th dimensions. For example, 0
+would be \([1,0,0,0,0,0,0,0,0,0,0]\).
 Consequently, `mnist.train.labels` is a
 `[60000, 10]` array of floats.
 
@@ -100,7 +100,7 @@ Consequently, `mnist.train.labels` is a
 
 We're now ready to actually make our model!
 
-## Softmax Regressions
+## Softmax Regressions <a class="md-anchor" id="AUTOGENERATED-softmax-regressions"></a>
 
 We know that every image in MNIST is a digit, whether it's a zero or a nine. We
 want to be able to look at an image and give probabilities for it being each
@@ -131,14 +131,14 @@ weights.
 
 We also add some extra evidence called a bias. Basically, we want to be able
 to say that some things are more likely independent of the input. The result is
-that the evidence for a class $$i$$ given an input $$x$$ is:
+that the evidence for a class \(i\) given an input \(x\) is:
 
 $$\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i$$
 
-where $$W_i$$ is the weights and $$b_i$$ is the bias for class $$i$$, and $$j$$
-is an index for summing over the pixels in our input image $$x$$. We then
+where \(W_i\) is the weights and \(b_i\) is the bias for class \(i\), and \(j\)
+is an index for summing over the pixels in our input image \(x\). We then
 convert the evidence tallies into our predicted probabilities
-$$y$$ using the "softmax" function:
+\(y\) using the "softmax" function:
 
 $$y = \text{softmax}(\text{evidence})$$
 
@@ -168,8 +168,8 @@ on it in Michael Nieslen's book, complete with an interactive visualization.)
 
 
 You can picture our softmax regression as looking something like the following,
-although with a lot more $$x$$s. For each output, we compute a weighted sum of
-the $$x$$s, add a bias, and then apply softmax.
+although with a lot more \(x\)s. For each output, we compute a weighted sum of
+the \(x\)s, add a bias, and then apply softmax.
 
 <div style="width:55%; margin:auto; margin-bottom:10px; margin-top:20px;">
 <img style="width:100%" src="img/softmax-regression-scalargraph.png">
@@ -194,7 +194,7 @@ More compactly, we can just write:
 $$y = \text{softmax}(Wx + b)$$
 
 
-## Implementing the Regression
+## Implementing the Regression <a class="md-anchor" id="AUTOGENERATED-implementing-the-regression"></a>
 
 
 To do efficient numerical computing in Python, we typically use libraries like
@@ -261,7 +261,7 @@ y = tf.nn.softmax(tf.matmul(x,W) + b)
 ```
 
 First, we multiply `x` by `W` with the expression `tf.matmul(x,W)`. This is
-flipped from when we multiplied them in our equation, where we had $$Wx$$, as a
+flipped from when we multiplied them in our equation, where we had \(Wx\), as a
 small trick
 to deal with `x` being a 2D tensor with multiple inputs. We then add `b`, and
 finally apply `tf.nn.softmax`.
@@ -274,7 +274,7 @@ simulations. And once defined, our model can be run on different devices:
 your computer's CPU, GPUs, and even phones!
 
 
-## Training
+## Training <a class="md-anchor" id="AUTOGENERATED-training"></a>
 
 In order to train our model, we need to define what it means for the  model to
 be good. Well, actually, in machine learning we typically define what it means
@@ -288,7 +288,7 @@ from gambling to machine learning. It's defined:
 
 $$H_{y'}(y) = -\sum_i y'_i \log(y_i)$$
 
-Where $$y$$ is our predicted probability distribution, and $$y'$$ is the true
+Where \(y\) is our predicted probability distribution, and \(y'\) is the true
 distribution (the one-hot vector we'll input).  In some rough sense, the
 cross-entropy is measuring how inefficient our predictions are for describing
 the truth. Going into more detail about cross-entropy is beyond the scope of
@@ -302,7 +302,7 @@ the correct answers:
 y_ = tf.placeholder("float", [None,10])
 ```
 
-Then we can implement the cross-entropy, $$-\sum y'\log(y)$$:
+Then we can implement the cross-entropy, \(-\sum y'\log(y)\):
 
 ```python
 cross_entropy = -tf.reduce_sum(y_*tf.log(y))
@@ -378,7 +378,7 @@ every time. Doing this is cheap and has much of the same benefit.
 
 
 
-## Evaluating Our Model
+## Evaluating Our Model <a class="md-anchor" id="AUTOGENERATED-evaluating-our-model"></a>
 
 How well does our model do?
 
diff --git a/tensorflow/g3doc/tutorials/mnist/download/index.md b/tensorflow/g3doc/tutorials/mnist/download/index.md
index df6245df78..e985a2204d 100644
--- a/tensorflow/g3doc/tutorials/mnist/download/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/download/index.md
@@ -1,11 +1,11 @@
-# MNIST Data Download
+# MNIST Data Download <a class="md-anchor" id="AUTOGENERATED-mnist-data-download"></a>
 
 Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
 
 The goal of this tutorial is to show how to download the dataset files required
 for handwritten digit classification using the (classic) MNIST data set.
 
-## Tutorial Files
+## Tutorial Files <a class="md-anchor" id="AUTOGENERATED-tutorial-files"></a>
 
 This tutorial references the following files:
 
@@ -13,7 +13,7 @@ File | Purpose
 --- | ---
 [`input_data.py`](../input_data.py) | The code to download the MNIST dataset for training and evaluation.
 
-## Prepare the Data
+## Prepare the Data <a class="md-anchor" id="AUTOGENERATED-prepare-the-data"></a>
 
 MNIST is a classic problem in machine learning. The problem is to look at
 greyscale 28x28 pixel images of handwritten digits and determine which digit
@@ -24,7 +24,7 @@ the image represents, for all the digits from zero to nine.
 For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
 or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
 
-### Download
+### Download <a class="md-anchor" id="AUTOGENERATED-download"></a>
 
 [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
 also hosts the training and test data for download.
@@ -42,7 +42,7 @@ files are downloaded into a local data folder for training.
 The folder name is specified in a flag variable at the top of the
 `fully_connected_feed.py` file and may be changed to fit your needs.
 
-### Unpack and Reshape
+### Unpack and Reshape <a class="md-anchor" id="AUTOGENERATED-unpack-and-reshape"></a>
 
 The files themselves are not in any standard image format and are manually
 unpacked (following the instructions available at the website) by the
@@ -64,7 +64,7 @@ The label data is extracted into a 1d tensor of: `[image index]`
 with the class identifier for each example as the value. For the training set
 labels, this would then be of shape `[55000]`.
 
-### DataSet Object
+### DataSet Object <a class="md-anchor" id="AUTOGENERATED-dataset-object"></a>
 
 The underlying code will download, unpack, and reshape images and labels for
 the following datasets:
diff --git a/tensorflow/g3doc/tutorials/mnist/pros/index.md b/tensorflow/g3doc/tutorials/mnist/pros/index.md
index 34853ccf66..15892a957d 100644
--- a/tensorflow/g3doc/tutorials/mnist/pros/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md
@@ -1,4 +1,4 @@
-#  Deep MNIST for Experts
+#  Deep MNIST for Experts <a class="md-anchor" id="AUTOGENERATED-deep-mnist-for-experts"></a>
 
 TensorFlow is a powerful library for doing large-scale numerical computation.
 One of the tasks at which it excels is implementing and training deep neural
@@ -11,12 +11,12 @@ dataset. If you don't have
 a background with them, check out the
 [introduction for beginners](../beginners/index.md).*
 
-## Setup
+## Setup <a class="md-anchor" id="AUTOGENERATED-setup"></a>
 
 Before we create our model, we will first load the MNIST dataset, and start a
 TensorFlow session.
 
-### Load MNIST Data
+### Load MNIST Data <a class="md-anchor" id="AUTOGENERATED-load-mnist-data"></a>
 
 For your convenience, we've included [a script](../input_data.py) which
 automatically downloads and imports the MNIST dataset. It will create a
@@ -32,7 +32,7 @@ testing sets as NumPy arrays.
 It also provides a function for iterating through data minibatches, which we
 will use below.
 
-### Start TensorFlow InteractiveSession
+### Start TensorFlow InteractiveSession <a class="md-anchor" id="AUTOGENERATED-start-tensorflow-interactivesession"></a>
 
 Tensorflow relies on a highly efficient C++ backend to do its computation. The
 connection to this backend is called a session.  The common usage for TensorFlow
@@ -55,7 +55,7 @@ import tensorflow as tf
 sess = tf.InteractiveSession()
 ```
 
-#### Computation Graph
+#### Computation Graph <a class="md-anchor" id="AUTOGENERATED-computation-graph"></a>
 
 To do efficient numerical computing in Python, we typically use libraries like
 NumPy that do expensive operations such as matrix multiplication outside Python,
@@ -80,13 +80,13 @@ section of
 [Basic Usage](../../../get_started/basic_usage.md)
 for more detail.
 
-## Build a Softmax Regression Model
+## Build a Softmax Regression Model <a class="md-anchor" id="AUTOGENERATED-build-a-softmax-regression-model"></a>
 
 In this section we will build a softmax regression model with a single linear
 layer. In the next section, we will extend this to the case of softmax
 regression with a multilayer convolutional network.
 
-### Placeholders
+### Placeholders <a class="md-anchor" id="AUTOGENERATED-placeholders"></a>
 
 We start building the computation graph by creating nodes for the
 input images and target output classes.
@@ -110,7 +110,7 @@ which digit class the corresponding MNIST image belongs to.
 The `shape` argument to `placeholder` is optional, but it allows TensorFlow
 to automatically catch bugs stemming from inconsistent tensor shapes.
 
-### Variables
+### Variables <a class="md-anchor" id="AUTOGENERATED-variables"></a>
 
 We now define the weights `W` and biases `b` for our model. We could imagine treating
 these like additional inputs, but TensorFlow has an even better way to handle
@@ -139,7 +139,7 @@ done for all `Variables` at once.
 sess.run(tf.initialize_all_variables())
 ```
 
-### Predicted Class and Cost Function
+### Predicted Class and Cost Function <a class="md-anchor" id="AUTOGENERATED-predicted-class-and-cost-function"></a>
 
 We can now implement our regression model. It only takes one line!
 We multiply the vectorized input images `x` by the weight matrix `W`, add
@@ -161,7 +161,7 @@ cross_entropy = -tf.reduce_sum(y_*tf.log(y))
 Note that `tf.reduce_sum` sums across all images in the minibatch, as well as
 all classes. We are computing the cross entropy for the entire minibatch.
 
-## Train the Model
+## Train the Model <a class="md-anchor" id="AUTOGENERATED-train-the-model"></a>
 
 Now that we have defined our model and training cost function, it is
 straightforward to train using TensorFlow.
@@ -198,7 +198,7 @@ Each training iteration we load 50 training examples. We then run the
 Note that you can replace any tensor in your computation graph using `feed_dict`
 -- it's not restricted to just `placeholder`s.
 
-### Evaluate the Model
+### Evaluate the Model <a class="md-anchor" id="AUTOGENERATED-evaluate-the-model"></a>
 
 How well did our model do?
 
@@ -228,14 +228,14 @@ Finally, we can evaluate our accuracy on the test data. This should be about
 print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
 ```
 
-## Build a Multilayer Convolutional Network
+## Build a Multilayer Convolutional Network <a class="md-anchor" id="AUTOGENERATED-build-a-multilayer-convolutional-network"></a>
 
 Getting 91% accuracy on MNIST is bad. It's almost embarrassingly bad. In this
 section, we'll fix that, jumping from a very simple model to something moderatly
 sophisticated: a small convolutional neural network. This will get us to around
 99.2% accuracy -- not state of the art, but respectable.
 
-### Weight Initialization
+### Weight Initialization <a class="md-anchor" id="AUTOGENERATED-weight-initialization"></a>
 
 To create this model, we're going to need to create a lot of weights and biases.
 One should generally initialize weights with a small amount of noise for
@@ -254,7 +254,7 @@ def bias_variable(shape):
   return tf.Variable(initial)
 ```
 
-### Convolution and Pooling
+### Convolution and Pooling <a class="md-anchor" id="AUTOGENERATED-convolution-and-pooling"></a>
 
 TensorFlow also gives us a lot of flexibility in convolution and pooling
 operations. How do we handle the boundaries? What is our stride size?
@@ -273,7 +273,7 @@ def max_pool_2x2(x):
                         strides=[1, 2, 2, 1], padding='SAME')
 ```
 
-### First Convolutional Layer
+### First Convolutional Layer <a class="md-anchor" id="AUTOGENERATED-first-convolutional-layer"></a>
 
 We can now implement our first layer. It will consist of convolution, followed
 by max pooling. The convolutional will compute 32 features for each 5x5 patch.
@@ -303,7 +303,7 @@ h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
 h_pool1 = max_pool_2x2(h_conv1)
 ```
 
-### Second Convolutional Layer
+### Second Convolutional Layer <a class="md-anchor" id="AUTOGENERATED-second-convolutional-layer"></a>
 
 In order to build a deep network, we stack several layers of this type. The
 second layer will have 64 features for each 5x5 patch.
@@ -316,7 +316,7 @@ h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
 h_pool2 = max_pool_2x2(h_conv2)
 ```
 
-### Densely Connected Layer
+### Densely Connected Layer <a class="md-anchor" id="AUTOGENERATED-densely-connected-layer"></a>
 
 Now that the image size has been reduced to 7x7, we add a fully-connected layer
 with 1024 neurons to allow processing on the entire image. We reshape the tensor
@@ -331,7 +331,7 @@ h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
 h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
 ```
 
-#### Dropout
+#### Dropout <a class="md-anchor" id="AUTOGENERATED-dropout"></a>
 
 To reduce overfitting, we will apply dropout before the readout layer.
 We create a `placeholder` for the probability that a neuron's output is kept
@@ -345,7 +345,7 @@ keep_prob = tf.placeholder("float")
 h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
 ```
 
-### Readout Layer
+### Readout Layer <a class="md-anchor" id="AUTOGENERATED-readout-layer"></a>
 
 Finally, we add a softmax layer, just like for the one layer softmax regression
 above.
@@ -357,7 +357,7 @@ b_fc2 = bias_variable([10])
 y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
 ```
 
-### Train and Evaluate the Model
+### Train and Evaluate the Model <a class="md-anchor" id="AUTOGENERATED-train-and-evaluate-the-model"></a>
 
 How well does this model do?
 To train and evaluate it we will use code that is nearly identical to that for
diff --git a/tensorflow/g3doc/tutorials/mnist/tf/index.md b/tensorflow/g3doc/tutorials/mnist/tf/index.md
index 5ce996af12..c1fc07e373 100644
--- a/tensorflow/g3doc/tutorials/mnist/tf/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/tf/index.md
@@ -1,4 +1,4 @@
-# TensorFlow Mechanics 101
+# TensorFlow Mechanics 101 <a class="md-anchor" id="AUTOGENERATED-tensorflow-mechanics-101"></a>
 
 Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
 
@@ -12,7 +12,7 @@ These tutorials are not intended for teaching Machine Learning in general.
 
 Please ensure you have followed the instructions to [`Install TensorFlow`](../../../get_started/os_setup.md).
 
-## Tutorial Files
+## Tutorial Files <a class="md-anchor" id="AUTOGENERATED-tutorial-files"></a>
 
 This tutorial references the following files:
 
@@ -25,7 +25,7 @@ Simply run the `fully_connected_feed.py` file directly to start training:
 
 `python fully_connected_feed.py`
 
-## Prepare the Data
+## Prepare the Data <a class="md-anchor" id="AUTOGENERATED-prepare-the-data"></a>
 
 MNIST is a classic problem in machine learning. The problem is to look at
 greyscale 28x28 pixel images of handwritten digits and determine which digit
@@ -36,7 +36,7 @@ the image represents, for all the digits from zero to nine.
 For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
 or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
 
-### Download
+### Download <a class="md-anchor" id="AUTOGENERATED-download"></a>
 
 At the top of the `run_training()` method, the `input_data.read_data_sets()`
 function will ensure that the correct data has been downloaded to your local
@@ -59,7 +59,7 @@ Dataset | Purpose
 For more information about the data, please read the [`Download`](../download/index.md)
 tutorial.
 
-### Inputs and Placeholders
+### Inputs and Placeholders <a class="md-anchor" id="AUTOGENERATED-inputs-and-placeholders"></a>
 
 The `placeholder_inputs()` function creates two [`tf.placeholder`](../../../api_docs/python/io_ops.md#placeholder)
 ops that define the shape of the inputs, including the `batch_size`, to the
@@ -76,7 +76,7 @@ sliced to fit the `batch_size` for each step, matched with these placeholder
 ops, and then passed into the `sess.run()` function using the `feed_dict`
 parameter.
 
-## Build the Graph
+## Build the Graph <a class="md-anchor" id="AUTOGENERATED-build-the-graph"></a>
 
 After creating placeholders for the data, the graph is built from the
 `mnist.py` file according to a 3-stage pattern: `inference()`, `loss()`, and
@@ -93,7 +93,7 @@ and apply gradients.
   <img style="width:100%" src="./mnist_subgraph.png">
 </div>
 
-### Inference
+### Inference <a class="md-anchor" id="AUTOGENERATED-inference"></a>
 
 The `inference()` function builds the graph as far as needed to
 return the tensor that would contain the output predictions.
@@ -162,7 +162,7 @@ logits = tf.matmul(hidden2, weights) + biases
 
 Finally, the `logits` tensor that will contain the output is returned.
 
-### Loss
+### Loss <a class="md-anchor" id="AUTOGENERATED-loss"></a>
 
 The `loss()` function further builds the graph by adding the required loss
 ops.
@@ -205,7 +205,7 @@ And the tensor that will then contain the loss value is returned.
 > given what is actually true. For more information, read the blog post Visual
 > Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/)
 
-### Training
+### Training <a class="md-anchor" id="AUTOGENERATED-training"></a>
 
 The `training()` function adds the operations needed to minimize the loss via
 gradient descent.
@@ -241,12 +241,12 @@ train_op = optimizer.minimize(loss, global_step=global_step)
 
 The tensor containing the outputs of the training op is returned.
 
-## Train the Model
+## Train the Model <a class="md-anchor" id="AUTOGENERATED-train-the-model"></a>
 
 Once the graph is built, it can be iteratively trained and evaluated in a loop
 controlled by the user code in `fully_connected_feed.py`.
 
-### The Graph
+### The Graph <a class="md-anchor" id="AUTOGENERATED-the-graph"></a>
 
 At the top of the `run_training()` function is a python `with` command that
 indicates all of the built ops are to be associated with the default
@@ -263,7 +263,7 @@ Most TensorFlow uses will only need to rely on the single default graph.
 More complicated uses with multiple graphs are possible, but beyond the scope of
 this simple tutorial.
 
-### The Session
+### The Session <a class="md-anchor" id="AUTOGENERATED-the-session"></a>
 
 Once all of the build preparation has been completed and all of the necessary
 ops generated, a [`tf.Session`](../../../api_docs/python/client.md#Session)
@@ -297,7 +297,7 @@ op is a [`tf.group`](../../../api_docs/python/control_flow_ops.md#group)
 that contains only the initializers for the variables.  None of the rest of the
 graph is run here, that happens in the training loop below.
 
-### Train Loop
+### Train Loop <a class="md-anchor" id="AUTOGENERATED-train-loop"></a>
 
 After initializing the variables with the session, training may begin.
 
@@ -312,7 +312,7 @@ for step in xrange(max_steps):
 However, this tutorial is slightly more complicated in that it must also slice
 up the input data for each step to match the previously generated placeholders.
 
-#### Feed the Graph
+#### Feed the Graph <a class="md-anchor" id="AUTOGENERATED-feed-the-graph"></a>
 
 For each step, the code will generate a feed dictionary that will contain the
 set of examples on which to train for the step, keyed by the placeholder
@@ -339,7 +339,7 @@ feed_dict = {
 This is passed into the `sess.run()` function's `feed_dict` parameter to provide
 the input examples for this step of training.
 
-#### Check the Status
+#### Check the Status <a class="md-anchor" id="AUTOGENERATED-check-the-status"></a>
 
 The code specifies two op-tensors in its run call: `[train_op, loss]`:
 
@@ -369,7 +369,7 @@ if step % 100 == 0:
     print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)
 ```
 
-#### Visualize the Status
+#### Visualize the Status <a class="md-anchor" id="AUTOGENERATED-visualize-the-status"></a>
 
 In order to emit the events files used by [TensorBoard](../../../how_tos/summaries_and_tensorboard/index.md),
 all of the summaries (in this case, only one) are collected into a single op
@@ -404,7 +404,7 @@ folder to display the values from the summaries.
 
 **NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial [Tensorboard: Visualizing Your Training](../../../how_tos/summaries_and_tensorboard/index.md).
 
-#### Save a Checkpoint
+#### Save a Checkpoint <a class="md-anchor" id="AUTOGENERATED-save-a-checkpoint"></a>
 
 In order to emit a checkpoint file that may be used to later restore a model
 for further training or evaluation, we instantiate a
@@ -430,7 +430,7 @@ method to reload the model parameters.
 saver.restore(sess, FLAGS.train_dir)
 ```
 
-## Evaluate the Model
+## Evaluate the Model <a class="md-anchor" id="AUTOGENERATED-evaluate-the-model"></a>
 
 Every thousand steps, the code will attempt to evaluate the model against both
 the training and test datasets.  The `do_eval()` function is called thrice, for
@@ -462,7 +462,7 @@ do_eval(sess,
 > the sake of a simple little MNIST problem, however, we evaluate against all of
 > the data.
 
-### Build the Eval Graph
+### Build the Eval Graph <a class="md-anchor" id="AUTOGENERATED-build-the-eval-graph"></a>
 
 Before opening the default Graph, the test data should have been fetched by
 calling the `get_data(train=False)` function with the parameter set to grab
@@ -489,7 +489,7 @@ of K to 1 to only consider a prediction correct if it is for the true label.
 eval_correct = tf.nn.in_top_k(logits, labels, 1)
 ```
 
-### Eval Output
+### Eval Output <a class="md-anchor" id="AUTOGENERATED-eval-output"></a>
 
 One can then create a loop for filling a `feed_dict` and calling `sess.run()`
 against the `eval_correct` op to evaluate the model on the given dataset.
diff --git a/tensorflow/g3doc/tutorials/pdes/index.md b/tensorflow/g3doc/tutorials/pdes/index.md
index 26f36d5536..a7c84ebd63 100755
--- a/tensorflow/g3doc/tutorials/pdes/index.md
+++ b/tensorflow/g3doc/tutorials/pdes/index.md
@@ -1,6 +1,6 @@
-# Partial Differential Equations
+# Partial Differential Equations <a class="md-anchor" id="AUTOGENERATED-partial-differential-equations"></a>
 
-## Basic Setup
+## Basic Setup <a class="md-anchor" id="AUTOGENERATED-basic-setup"></a>
 
 
 ```
@@ -30,7 +30,7 @@ def DisplayArray(a, fmt='jpeg', rng=[0,1]):
 sess = tf.InteractiveSession()
 ```
 
-## Computational Convenience Functions
+## Computational Convenience Functions <a class="md-anchor" id="AUTOGENERATED-computational-convenience-functions"></a>
 
 
 ```
@@ -54,7 +54,7 @@ def laplace(x):
   return simple_conv(x, laplace_k)
 ```
 
-## Define the PDE
+## Define the PDE <a class="md-anchor" id="AUTOGENERATED-define-the-pde"></a>
 
 
 ```
@@ -103,7 +103,7 @@ step = tf.group(
   Ut.Assign(Ut_) )
 ```
 
-## Run The Simulation
+## Run The Simulation <a class="md-anchor" id="AUTOGENERATED-run-the-simulation"></a>
 
 
 ```
diff --git a/tensorflow/g3doc/tutorials/recurrent/index.md b/tensorflow/g3doc/tutorials/recurrent/index.md
index 29d058cd5d..c2ae1afb70 100644
--- a/tensorflow/g3doc/tutorials/recurrent/index.md
+++ b/tensorflow/g3doc/tutorials/recurrent/index.md
@@ -1,12 +1,12 @@
-# Recurrent Neural Networks
+# Recurrent Neural Networks <a class="md-anchor" id="AUTOGENERATED-recurrent-neural-networks"></a>
 
-## Introduction
+## Introduction <a class="md-anchor" id="AUTOGENERATED-introduction"></a>
 
 Take a look at [this great article]
 (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
 for an introduction to recurrent neural networks and LSTMs in particular.
 
-## Language Modeling
+## Language Modeling <a class="md-anchor" id="AUTOGENERATED-language-modeling"></a>
 
 In this tutorial we will show how to train a recurrent neural network on
 a challenging task of language modeling. The goal of the problem is to fit a
@@ -24,7 +24,7 @@ For the purpose of this tutorial, we will reproduce the results from
 [Zaremba et al., 2014] (http://arxiv.org/abs/1409.2329), which achieves very
 good results on the PTB dataset.
 
-## Tutorial Files
+## Tutorial Files <a class="md-anchor" id="AUTOGENERATED-tutorial-files"></a>
 
 This tutorial references the following files from `models/rnn/ptb`:
 
@@ -33,7 +33,7 @@ File | Purpose
 `ptb_word_lm.py` | The code to train a language model on the PTB dataset.
 `reader.py` | The code to read the dataset.
 
-## Download and Prepare the Data
+## Download and Prepare the Data <a class="md-anchor" id="AUTOGENERATED-download-and-prepare-the-data"></a>
 
 The data required for this tutorial is in the data/ directory of the
 PTB dataset from Tomas Mikolov's webpage:
@@ -44,9 +44,9 @@ including the end-of-sentence marker and a special symbol (\<unk\>) for rare
 words. We convert all of them in the `reader.py` to unique integer identifiers
 to make it easy for the neural network to process.
 
-## The Model
+## The Model <a class="md-anchor" id="AUTOGENERATED-the-model"></a>
 
-### LSTM
+### LSTM <a class="md-anchor" id="AUTOGENERATED-lstm"></a>
 
 The core of the model consists of an LSTM cell that processes one word at the
 time and computes probabilities of the possible continuations of the sentence.
@@ -72,7 +72,7 @@ for current_batch_of_words in words_in_dataset:
     loss += loss_function(probabilities, target_words)
 ```
 
-### Truncated Backpropagation
+### Truncated Backpropagation <a class="md-anchor" id="AUTOGENERATED-truncated-backpropagation"></a>
 
 In order to make the learning process tractable, it is a common practice to
 truncate the gradients for backpropagation to a fixed number (`num_steps`)
@@ -114,7 +114,7 @@ for current_batch_of_words in words_in_dataset:
     total_loss += current_loss
 ```
 
-### Inputs
+### Inputs <a class="md-anchor" id="AUTOGENERATED-inputs"></a>
 
 The word IDs will be embedded into a dense representation (see the
 [Vectors Representations Tutorial](../word2vec/index.md)) before feeding to
@@ -129,7 +129,7 @@ word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
 The embedding matrix will be initialized randomly and the model will learn to
 differentiate the meaning of words just by looking at the data.
 
-### Loss Fuction
+### Loss Fuction <a class="md-anchor" id="AUTOGENERATED-loss-fuction"></a>
 
 We want to minimize the average negative log probability of the target words:
 
@@ -145,7 +145,7 @@ $$e^{-\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i}} = e^{\text{loss}} $$
 
 and we will monitor its value throughout the training process.
 
-### Stacking multiple LSTMs
+### Stacking multiple LSTMs <a class="md-anchor" id="AUTOGENERATED-stacking-multiple-lstms"></a>
 
 To give the model more expressive power, we can add multiple layers of LSTMs
 to process the data. The output of the first layer will become the input of
@@ -168,7 +168,7 @@ for i in range(len(num_steps)):
 final_state = state
 ```
 
-## Compile and Run the Code
+## Compile and Run the Code <a class="md-anchor" id="AUTOGENERATED-compile-and-run-the-code"></a>
 
 First, the library needs to be built. To compile it on CPU:
 
@@ -198,7 +198,7 @@ The larger the model, the better results it should get. The `small` model should
 be able to reach perplexity below 120 on the test set and the `large` one below
 80, though it might take several hours to train.
 
-## What Next?
+## What Next? <a class="md-anchor" id="AUTOGENERATED-what-next-"></a>
 
 There are several tricks that we haven't mentioned that make the model better,
 including:
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
index b91688691d..ee9808a5dd 100644
--- a/tensorflow/g3doc/tutorials/seq2seq/index.md
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -1,4 +1,4 @@
-# Sequence-to-Sequence Models
+# Sequence-to-Sequence Models <a class="md-anchor" id="AUTOGENERATED-sequence-to-sequence-models"></a>
 
 Recurrent neural networks can learn to model language, as already discussed
 in the [RNN Tutorial](../recurrent/index.md)
@@ -32,7 +32,7 @@ File | What's in it?
 `translate/translate.py` | Binary that trains and runs the translation model.
 
 
-## Sequence-to-Sequence Basics
+## Sequence-to-Sequence Basics <a class="md-anchor" id="AUTOGENERATED-sequence-to-sequence-basics"></a>
 
 A basic sequence-to-sequence model, as introduced in
 [Cho et al., 2014](http://arxiv.org/pdf/1406.1078v3.pdf),
@@ -64,7 +64,7 @@ attention mechanism in the decoder looks like this.
 <img style="width:100%" src="attention_seq2seq.png" />
 </div>
 
-## TensorFlow seq2seq Library
+## TensorFlow seq2seq Library <a class="md-anchor" id="AUTOGENERATED-tensorflow-seq2seq-library"></a>
 
 As you can see above, there are many different sequence-to-sequence
 models. Each of these models can use different RNN cells, but all
@@ -141,14 +141,14 @@ more sequence-to-sequence models in `seq2seq.py`, take a look there. They all
 have similar interfaces, so we will not describe them in detail. We will use
 `embedding_attention_seq2seq` for our translation model below.
 
-## Neural Translation Model
+## Neural Translation Model <a class="md-anchor" id="AUTOGENERATED-neural-translation-model"></a>
 
 While the core of the sequence-to-sequence model is constructed by
 the functions in `models/rnn/seq2seq.py`, there are still a few tricks
 that are worth mentioning that are used in our translation model in
 `models/rnn/translate/seq2seq_model.py`.
 
-### Sampled softmax and output projection
+### Sampled softmax and output projection <a class="md-anchor" id="AUTOGENERATED-sampled-softmax-and-output-projection"></a>
 
 For one, as already mentioned above, we want to use sampled softmax to
 handle large output vocabulary. To decode from it, we need to keep track
@@ -184,7 +184,7 @@ if output_projection is not None:
                      output_projection[1] for ...]
 ```
 
-### Bucketing and padding
+### Bucketing and padding <a class="md-anchor" id="AUTOGENERATED-bucketing-and-padding"></a>
 
 In addition to sampled softmax, our translation model also makes use
 of *bucketing*, which is a method to efficiently handle sentences of
@@ -230,8 +230,7 @@ with encoder inputs representing `[PAD PAD "." "go" "I"]` and decoder
 inputs `[GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]`.
 
 
-<a name="run_it"></a>
-## Let's Run It
+## Let's Run It <a class="md-anchor" id="run_it"></a>
 
 To train the model described above, we need to a large English-French corpus.
 We will use the *10^9-French-English corpus* from the
@@ -305,7 +304,7 @@ Reading model parameters from /tmp/translate.ckpt-340000
  Qui est le président des États-Unis ?
 ```
 
-## What Next?
+## What Next? <a class="md-anchor" id="AUTOGENERATED-what-next-"></a>
 
 The example above shows how you can build your own English-to-French
 translator, end-to-end. Run it and see how the model performs for yourself.
diff --git a/tensorflow/g3doc/tutorials/word2vec/index.md b/tensorflow/g3doc/tutorials/word2vec/index.md
index 290ff3627f..c9b66cab88 100644
--- a/tensorflow/g3doc/tutorials/word2vec/index.md
+++ b/tensorflow/g3doc/tutorials/word2vec/index.md
@@ -1,11 +1,11 @@
-# Vector Representations of Words
+# Vector Representations of Words <a class="md-anchor" id="AUTOGENERATED-vector-representations-of-words"></a>
 
 In this tutorial we look at the word2vec model by
 [Mikolov et al.](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).
 This model is used for learning vector representations of words, called *word
 embeddings*.
 
-## Highlights
+## Highlights <a class="md-anchor" id="AUTOGENERATED-highlights"></a>
 
 This tutorial is meant to highlight the interesting, substantive parts of
 building a word2vec model in TensorFlow.
@@ -32,7 +32,7 @@ But first, let's look at why we would want to learn word embeddings in the first
 place. Feel free to skip this section if you're an Embedding Pro and you'd just
 like to get your hands dirty with the details.
 
-## Motivation: Why Learn Word Embeddings?
+## Motivation: Why Learn Word Embeddings? <a class="md-anchor" id="AUTOGENERATED-motivation--why-learn-word-embeddings-"></a>
 
 Image and audio processing systems work with rich, high-dimensional datasets
 encoded as vectors of the individual raw pixel-intensities for image data, or
@@ -90,12 +90,12 @@ pair as a new observation, and this tends to do better when we have larger
 datasets. We will focus on the skip-gram model in the rest of this tutorial.
 
 
-## Scaling up with Noise-Contrastive Training
+## Scaling up with Noise-Contrastive Training <a class="md-anchor" id="AUTOGENERATED-scaling-up-with-noise-contrastive-training"></a>
 
 Neural probabilistic language models are traditionally trained using the
 [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood) (ML)
-principle  to maximize the probability of the next word $$w_t$$ (for 'target)
-given the previous words $$h$$ (for 'history') in terms of a
+principle  to maximize the probability of the next word \(w_t\) (for 'target)
+given the previous words \(h\) (for 'history') in terms of a
 [*softmax* function](https://en.wikipedia.org/wiki/Softmax_function),
 
 $$
@@ -106,8 +106,8 @@ P(w_t | h) &= \text{softmax}(\exp \{ \text{score}(w_t, h) \}) \\
 \end{align}
 $$
 
-where $$\text{score}(w_t, h)$$ computes the compatibility of word $$w_t$$ with
-the context $$h$$ (a dot product is commonly used). We train this model by
+where \(\text{score}(w_t, h)\) computes the compatibility of word \(w_t\) with
+the context \(h\) (a dot product is commonly used). We train this model by
 maximizing its log-likelihood on the training set, i.e. by maximizing
 
 $$
@@ -120,8 +120,8 @@ $$
 
 This yields a properly normalized probabilistic model for language modeling.
 However this is very expensive, because we need to compute and normalize each
-probability using the score for all other $$V$$ words $$w'$$ in the current
-context $$h$$, *at every training step*.
+probability using the score for all other \(V\) words \(w'\) in the current
+context \(h\), *at every training step*.
 
 <div style="width:60%; margin:auto; margin-bottom:10px; margin-top:20px;">
 <img style="width:100%" src="img/softmax-nplm.png" alt>
@@ -130,7 +130,7 @@ context $$h$$, *at every training step*.
 On the other hand, for feature learning in word2vec we do not need a full
 probabilistic model. The CBOW and skip-gram models are instead trained using a
 binary classification objective (logistic regression) to discriminate the real
-target words $$w_t$$ from $$k$$ imaginary (noise) words $$\tilde w$$, in the
+target words \(w_t\) from \(k\) imaginary (noise) words \(\tilde w\), in the
 same context. We illustrate this below for a CBOW model. For skip-gram the
 direction is simply inverted.
 
@@ -144,10 +144,10 @@ $$J_\text{NEG} = \log Q_\theta(D=1 |w_t, h) +
   k \mathop{\mathbb{E}}_{\tilde w \sim P_\text{noise}}
      \left[ \log Q_\theta(D = 0 |\tilde w, h) \right]$$,
 
-where $$Q_\theta(D=1 | w, h)$$ is the binary logistic regression probability
-under the model of seeing the word $$w$$ in the context $$h$$ in the dataset
-$$D$$, calculated in terms of the learned embedding vectors $$\theta$$. In
-practice we approximate the expectation by drawing $$k$$ constrastive words
+where \(Q_\theta(D=1 | w, h)\) is the binary logistic regression probability
+under the model of seeing the word \(w\) in the context \(h\) in the dataset
+\(D\), calculated in terms of the learned embedding vectors \(\theta\). In
+practice we approximate the expectation by drawing \(k\) constrastive words
 from the noise distribution (i.e. we compute a
 [Monte Carlo average](https://en.wikipedia.org/wiki/Monte_Carlo_integration)).
 
@@ -159,14 +159,14 @@ and there is good mathematical motivation for using this loss function:
 The updates it proposes approximate the updates of the softmax function in the
 limit. But computationally it is especially appealing because computing the
 loss function now scales only with the number of *noise words* that we
-select ($$k$$), and not *all words* in the vocabulary ($$V$$). This makes it
+select (\(k\)), and not *all words* in the vocabulary (\(V\)). This makes it
 much faster to train. We will actually make use of the very similar
 [noise-contrastive estimation (NCE)](http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf)
 loss, for which TensorFlow has a handy helper function `tf.nn.nce_loss()`.
 
 Let's get an intuitive feel for how this would work in practice!
 
-## The Skip-gram Model
+## The Skip-gram Model <a class="md-anchor" id="AUTOGENERATED-the-skip-gram-model"></a>
 
 As an example, let's consider the dataset
 
@@ -198,21 +198,21 @@ dataset, but we typically optimize this with
 where typically `16 <= batch_size <= 512`). So let's look at one step of
 this process.
 
-Let's imagine at training step $$t$$ we observe the first training case above,
+Let's imagine at training step \(t\) we observe the first training case above,
 where the goal is to predict `the` from `quick`. We select `num_noise` number
 of noisy (contrastive) examples by drawing from some noise distribution,
-typically the unigram distribution, $$P(w)$$. For simplicity let's say
+typically the unigram distribution, \(P(w)\). For simplicity let's say
 `num_noise=1` and we select `sheep` as a noisy example. Next we compute the
 loss for this pair of observed and noisy examples, i.e. the objective at time
-step $$t$$ becomes
+step \(t\) becomes
 
 $$J^{(t)}_\text{NEG} = \log Q_\theta(D=1 | \text{the, quick}) +
   \log(Q_\theta(D=0 | \text{sheep, quick}))$$.
 
-The goal is to make an update to the embedding parameters $$\theta$$ to improve
+The goal is to make an update to the embedding parameters \(\theta\) to improve
 (in this case, maximize) this objective function.  We do this by deriving the
-gradient of the loss with respect to the embedding parameters $$\theta$$, i.e.
-$$\frac{\partial}{\partial \theta} J_\text{NEG}$$ (luckily TensorFlow provides
+gradient of the loss with respect to the embedding parameters \(\theta\), i.e.
+\(\frac{\partial}{\partial \theta} J_\text{NEG}\) (luckily TensorFlow provides
 easy helper functions for doing this!). We then perform an update to the
 embeddings by taking a small step in the direction of the gradient. When this
 process is repeated over the entire training set, this has the effect of
@@ -243,7 +243,7 @@ NLP prediction tasks, such as part-of-speech tagging or named entity recognition
 
 But for now, let's just use them to draw pretty pictures!
 
-## Building the Graph
+## Building the Graph <a class="md-anchor" id="AUTOGENERATED-building-the-graph"></a>
 
 This is all about embeddings, so let's define our embedding matrix.
 This is just a big random matrix to start.  We'll initialize the values to be
@@ -307,7 +307,7 @@ gradient descent, and TensorFlow has handy helpers to make this easy.
 optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)
 ```
 
-## Training the Model
+## Training the Model <a class="md-anchor" id="AUTOGENERATED-training-the-model"></a>
 
 Training the model is then as simple as using a `feed_dict` to push data into
 the placeholders and calling `session.run` with this new data in a loop.
@@ -321,7 +321,7 @@ for inputs, labels in generate_batch(...):
 See the full example code in
 [tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py](./word2vec_basic.py).
 
-## Visualizing the Learned Embeddings
+## Visualizing the Learned Embeddings <a class="md-anchor" id="AUTOGENERATED-visualizing-the-learned-embeddings"></a>
 
 After training has finished we can visualize the learned embeddings using
 t-SNE.
@@ -335,7 +335,7 @@ other. For a more heavyweight implementation of word2vec that showcases more of
 the advanced features of TensorFlow, see the implementation in
 [tensorflow/models/embedding/word2vec.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec.py).
 
-## Evaluating Embeddings: Analogical Reasoning
+## Evaluating Embeddings: Analogical Reasoning <a class="md-anchor" id="AUTOGENERATED-evaluating-embeddings--analogical-reasoning"></a>
 
 Embeddings are useful for a wide variety of prediction tasks in NLP. Short of
 training a full-blown part-of-speech model or named-entity model, one simple way
@@ -356,7 +356,7 @@ very large dataset, carefully tuning the hyperparameters and making use of
 tricks like subsampling the data, which is out of the scope of this tutorial.
 
 
-## Optimizing the Implementation
+## Optimizing the Implementation <a class="md-anchor" id="AUTOGENERATED-optimizing-the-implementation"></a>
 
 Our vanilla implementation showcases the flexibility of TensorFlow. For
 example, changing the training objective is as simple as swapping out the call
@@ -386,7 +386,7 @@ example of this for the Skip-Gram case
 Feel free to benchmark these against each other to measure performance
 improvements at each stage.
 
-## Conclusion
+## Conclusion <a class="md-anchor" id="AUTOGENERATED-conclusion"></a>
 
 In this tutorial we covered the word2vec model, a computationally efficient
 model for learning word embeddings. We motivated why embeddings are useful,
author	Vijay Vasudevan <vrv@google.com>	2015-11-07 13:58:24 -0800
committer	Vijay Vasudevan <vrv@google.com>	2015-11-07 13:58:24 -0800
commit	fddaed524622417900d745fe8f115562c55ac49a (patch)
tree	cabb2fc16540a27748b60329195966d535f48837 /tensorflow/g3doc/tutorials
parent	7de9099a739c9dc62b1ca55c1eeef90acbfa7be9 (diff)