22 files changed, 3770 insertions, 0 deletions
diff --git a/tensorflow/g3doc/how_tos/__init__.py b/tensorflow/g3doc/how_tos/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/__init__.py
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/__init__.py b/tensorflow/g3doc/how_tos/adding_an_op/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/__init__.py
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc b/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc
new file mode 100644
index 0000000000..84e54c7219
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc
@@ -0,0 +1,31 @@
+#include <stdio.h>
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("RestrictedTypeExample").Attr("t: {int32, float, bool}");
+
+REGISTER_OP("NumberType").Attr("t: numbertype");
+
+REGISTER_OP("EnumExample").Attr("e: {'apple', 'orange'}");
+
+REGISTER_OP("MinIntExample").Attr("a: int >= 2");
+
+REGISTER_OP("TypeListExample").Attr("a: list({int32, float}) >= 3");
+
+REGISTER_OP("AttrDefaultExample").Attr("i: int = 0");
+
+REGISTER_OP("AttrDefaultExampleForAllTypes")
+    .Attr("s: string = 'foo'")
+    .Attr("i: int = 0")
+    .Attr("f: float = 1.0")
+    .Attr("b: bool = true")
+    .Attr("ty: type = DT_INT32")
+    .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+    .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+    .Attr("l_empty: list(int) = []")
+    .Attr("l_int: list(int) = [2, 3, 5, 7]");
+
+int main(int argc, char* argv[]) {
+  printf("All registered ops:\n%s\n",
+         tensorflow::OpRegistry::Global()->DebugString(false).c_str());
+  return 0;
+}
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py b/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py
new file mode 100644
index 0000000000..17a7028d98
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py
@@ -0,0 +1,16 @@
+"""Test that user ops can be used as expected."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class FactTest(tf.test.TestCase):
+
+  def test(self):
+    with self.test_session():
+      print tf.user_ops.my_fact().eval()
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/index.md b/tensorflow/g3doc/how_tos/adding_an_op/index.md
new file mode 100644
index 0000000000..5c6243cd9c
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/index.md
@@ -0,0 +1,1015 @@
+# Adding a New Op to TensorFlow
+
+PREREQUISITES:
+
+* Some familiarity with C++.
+* Must have [downloaded TensorFlow source](../../get_started/index.md#source),
+  and be able to build it.
+
+If you'd like to incorporate an operation that isn't covered by the existing
+library, you can create a custom Op. To incorporate your custom Op, you'll need
+to:
+
+* Register the new Op in a C++ file. The Op registration is independent of the
+  implementation, and describes the semantics of how the Op is invoked. For
+  example, it defines the Op name, and specifies its inputs and outputs.
+* Implement the Op in C++. This implementation is called a "kernel", and there
+  can be multiple kernels for different architectures (e.g. CPUs, GPUs) or
+  input / output types.
+* Create a Python wrapper. This wrapper is the public API to create the Op. A
+  default wrapper is generated from the Op registration, which can be used
+  directly or added to.
+* Optionally, write a function to compute gradients for the Op.
+* Optionally, write a function that describes the input and output shapes
+  for the Op.  This allows shape inference to work with your Op.
+* Test the Op, typically in Python. If you define gradients, verify them with
+  the Python [`GradientChecker`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/kernel_tests/gradient_checker.py).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Define the Op's interface](#define_interface)
+* [Implement the kernel for the Op](#AUTOGENERATED-implement-the-kernel-for-the-op)
+* [Generate the client wrapper](#AUTOGENERATED-generate-the-client-wrapper)
+  * [The Python Op wrapper](#AUTOGENERATED-the-python-op-wrapper)
+  * [The C++ Op wrapper](#AUTOGENERATED-the-c---op-wrapper)
+* [Verify it works](#AUTOGENERATED-verify-it-works)
+* [Validation](#validation)
+* [Op registration](#AUTOGENERATED-op-registration)
+  * [Attrs](#AUTOGENERATED-attrs)
+  * [Attr types](#AUTOGENERATED-attr-types)
+  * [Polymorphism](#polymorphism)
+  * [Inputs and Outputs](#AUTOGENERATED-inputs-and-outputs)
+  * [Backwards compatibility](#AUTOGENERATED-backwards-compatibility)
+* [GPU Support](#mult-archs)
+* [Implement the gradient in Python](#AUTOGENERATED-implement-the-gradient-in-python)
+* [Implement a shape function in Python](#AUTOGENERATED-implement-a-shape-function-in-python)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Define the Op's interface <div class="md-anchor" id="define_interface">{#define_interface}</div>
+
+You define the interface of an Op by registering it with the TensorFlow system.
+In the registration, you specify the name of your Op, its inputs (types and
+names) and outputs (types and names), as well as [docstrings](#docstrings) and
+any [attrs](#attrs) the Op might require.
+
+To see how this works, suppose you'd like to create an Op that takes a tensor of
+`int32`s and outputs a copy of the tensor, with all but the first element set to
+zero. Create file [`tensorflow/core/user_ops`][user_ops]`/zero_out.cc` and
+add a call to the `REGISTER_OP` macro that defines the interface for such an Op:
+
+```c++
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("ZeroOut")
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+```
+
+This `ZeroOut` Op takes one tensor `to_zero` of 32-bit integers as input, and
+outputs a tensor `zeroed` of 32-bit integers.
+
+> A note on naming: The name of the Op should be unique and CamelCase.  Names
+> starting with an underscore (`_`) are reserved for internal use.
+
+## Implement the kernel for the Op <div class="md-anchor" id="AUTOGENERATED-implement-the-kernel-for-the-op">{#AUTOGENERATED-implement-the-kernel-for-the-op}</div>
+
+After you define the interface, provide one or more implementations of the Op.
+To create one of these kernels, create a class that extends `OpKernel` and
+overrides the `Compute` method. The `Compute` method provides one `context`
+argument of type `OpKernelContext*`, from which you can access useful things
+like the input and output tensors.
+
+Add your kernel to the file you created above. The kernel might look something
+like this:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<int32>();
+
+    // Create an output tensor
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+                                                     &output_tensor));
+    auto output = output_tensor->template flat<int32>();
+
+    // Set all but the first element of the output tensor to 0.
+    const int N = input.size();
+    for (int i = 1; i < N; i++) {
+      output(i) = 0;
+    }
+
+    // Preserve the first input value if possible.
+    if (N > 0) output(0) = input(0);
+  }
+};
+```
+
+After implementing your kernel, you register it with the TensorFlow system. In
+the registration, you specify different constraints under which this kernel
+will run. For example, you might have one kernel made for CPUs, and a separate
+one for GPUs.
+
+To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
+
+```c++
+REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
+```
+
+TODO: instructions or pointer to building TF
+
+At this point, the Tensorflow system can reference and use the Op when
+requested.
+
+## Generate the client wrapper <div class="md-anchor" id="AUTOGENERATED-generate-the-client-wrapper">{#AUTOGENERATED-generate-the-client-wrapper}</div>
+### The Python Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-python-op-wrapper">{#AUTOGENERATED-the-python-op-wrapper}</div>
+
+Python op wrappers are created automatically in
+`bazel-genfiles/tensorflow/python/ops/gen_user_ops.py` for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory when you build Tensorflow.
+Those ops are imported into
+[`tensorflow/python/user_ops/user_ops.py`][python-user_ops] with the statement:
+
+```python
+from tensorflow.python.ops.gen_user_ops import *
+```
+
+You may optionally use your own function instead.  To do this, you first hide
+the generated code for that op by adding its name to the `hidden` list in the
+`"user_ops"` rule in
+[`tensorflow/python/BUILD`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD):
+
+```python
+tf_gen_op_wrapper_py(
+    name = "user_ops",
+    hidden = [
+        "Fact",
+    ],
+    require_shape_functions = False,
+)
+```
+
+List your op next to `"Fact"`.  Next you add your replacement function to
+[`tensorflow/python/user_ops/user_ops.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py).
+Typically your function will call the generated function to actually add the op
+to the graph.  The hidden version of the generated function will be in the
+`gen_user_ops` package and start with an underscore ("`_`").  For example:
+
+```python
+def my_fact():
+    """Example of overriding the generated code for an Op."""
+    return gen_user_ops._fact()
+```
+
+### The C++ Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-c---op-wrapper">{#AUTOGENERATED-the-c---op-wrapper}</div>
+
+C++ op wrappers are created automatically for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory, when you build Tensorflow. For
+example, ops in `tensorflow/core/user_ops/zero_out.cc` will generate wrappers in
+`bazel-genfiles/tensorflow/cc/ops/user_ops.{h,cc}`.
+
+All generated wrappers for user ops are automatically
+imported into [`tensorflow/cc/ops/standard_ops.h`][standard_ops-cc] with the
+statement
+
+```c++
+#include "tensorflow/cc/ops/user_ops.h"
+```
+
+## Verify it works <div class="md-anchor" id="AUTOGENERATED-verify-it-works">{#AUTOGENERATED-verify-it-works}</div>
+
+A good way to verify that you've successfully implemented your Op is to write a
+test for it. Create the file
+`tensorflow/python/kernel_tests/zero_out_op_test.py` with the contents:
+[TODO]:# (put tests somewhere else and make sure it works)
+
+```python
+import tensorflow as tf
+
+
+class ZeroOutTest(tf.test.TestCase):
+  def testZeroOut(self):
+    with self.test_session():
+      result = tf.user_ops.zero_out([5, 4, 3, 2, 1])
+      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+```
+
+Then run your test:
+
+```sh
+$ bazel test tensorflow/python:zero_out_op_test
+```
+
+## Validation <div class="md-anchor" id="validation">{#validation}</div>
+
+The example above assumed that the Op applied to a tensor of any shape.  What
+if it only applied to vectors?  That means adding a check to the above OpKernel
+implementation.
+
+```c++
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_tensor.shape()),
+                errors::InvalidArgument("ZeroOut expects a 1-D vector."));
+    // ...
+  }
+```
+
+This asserts that the input is a vector, and returns having set the
+`InvalidArgument` status if it isn't.  The
+[OP_REQUIRES macro][validation-macros] takes three arguments:
+
+*   The `context`, which can either be an `OpKernelContext` or
+    `OpKernelConstruction` pointer (see
+    [`tensorflow/core/framework/op_kernel.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_kernel.h)),
+    for its `SetStatus()` method.
+*   The condition.  For example, there are functions for validating the shape
+    of a tensor in [`tensorflow/core/public/tensor_shape.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/tensor_shape.h) 
+*   The error itself, which is represented by a `Status` object, see
+    [`tensorflow/core/public/status.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/status.h). A
+    `Status` has both a type (frequently `InvalidArgument`, but see the list of
+    types) and a message.  Functions for constructing an error may be found in
+    [`tensorflow/core/lib/core/errors.h`][validation-macros].
+
+Alternatively, if you want to test whether a `Status` object returned from some
+function is an error, and if so return it, use
+[`OP_REQUIRES_OK`][validation-macros].  Both of these macros return from the
+function on error.
+
+## Op registration <div class="md-anchor" id="AUTOGENERATED-op-registration">{#AUTOGENERATED-op-registration}</div>
+
+### Attrs <div class="md-anchor" id="AUTOGENERATED-attrs">{#AUTOGENERATED-attrs}</div>
+
+Ops can have attrs, whose values are set when the Op is added to a graph. These
+are used to configure the Op, and their values can be accessed both within the
+kernel implementation and in the types of inputs and outputs in the Op
+registration. Prefer using an input instead of an attr when possible, since
+inputs are more flexible.  They can change every step, be set using a feed, etc.
+Attrs are used for things that can't be done with inputs: any configuration
+that affects the signature (number or type of inputs or outputs) or that
+can't change from step-to-step.
+
+You define an attr when you register the Op, by specifying its name and type
+using the `Attr` method, which expects a spec of the form:
+
+```
+<name>: <attr-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores, and `<attr-type-expr>` is a type expression of the
+form [described below](#attr-types)
+
+For example, if you'd like the `ZeroOut` Op to preserve a user-specified index,
+instead of only the 0th element, you can register the Op like so:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("preserve_index: int")</b>
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+</pre></code>
+
+Your kernel can then access this attr in its constructor via the `context`
+parameter:
+
+<code class="lang-c++"><pre>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {<b>
+    // Get the index of the value to preserve
+    OP_REQUIRES_OK(context->GetAttr("preserve\_index", &preserve\_index\_));
+  </b>}
+  void Compute(OpKernelContext\* context) override {
+    // ...
+  }
+ <b>private:
+  int preserve\_index\_;</b>
+}
+</pre></code>
+
+which can then be used in the `Compute` method:
+
+<code class="lang-c++"><pre>
+  void Compute(OpKernelContext\* context) override {
+    // ...
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i=0; i < N; i++) {
+      output\_flat(i) = 0;
+    }<br>
+    <b>// Preserve the requested input value
+    output\_flat(preserve\_index\_) = input(preserve\_index\_);</b>
+  }
+</pre></code>
+
+[TODO]:# (check the code in this section in and test it)
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>     <b>.Attr("preserve\_index: int = 0")</b>
+>     .Input("to_zero: int32")
+>     .Output("zeroed: int32");
+> </pre></code>
+
+### Attr types <div class="md-anchor" id="AUTOGENERATED-attr-types">{#AUTOGENERATED-attr-types}</div>
+
+The following types are supported in an attr:
+
+* `string`: Any sequence of bytes (not required to be UTF8).
+* `int`: A signed integer.
+* `float`: A floating point number.
+* `bool`: True or false.
+* `type`: One of the (non-ref) values of [`DataType`][DataTypeString].
+* `shape`: A [`TensorShapeProto`][TensorShapeProto].
+* `tensor`: A [`TensorProto`][TensorProto].
+* `list(<type>)`: A list of `<type>`, where `<type>` is one of the above types.
+  Note that `list(list(<type>))` is invalid.
+
+See also: [op_def_builder.cc:FinalizeAttr][FinalizeAttr] for a definitive list.
+
+#### Default values & constraints
+
+Attrs may have default values, and some types of attrs can have constraints. To
+define an attr with constraints, you can use the following `<attr-type-expr>`s:
+
+* `{'<string1>', '<string2>'}`: The value must be a string that has either the
+  value `<string1>` or `<string2>`.  The name of the type, `string`, is implied
+  when you use this syntax.  This emulates an enum:
+
+  ```c++
+  REGISTER_OP("EnumExample")
+      .Attr("e: {'apple', 'orange'}");
+  ```
+
+* `{<type1>, <type2>}`: The value is of type `type`, and must be one of
+  `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
+  [tensor types](../../resources/dims_types.md#data-types).  You don't specify
+  that the type of the attr is `type`. This is implied when you have a list of
+  types in `{...}`.  For example, in this case the attr `t` is a type that must
+  be an `int32`, a `float`, or a `bool`:
+
+  ```c++
+  REGISTER_OP("RestrictedTypeExample")
+      .Attr("t: {int32, float, bool}");
+  ```
+
+* There are shortcuts for common type constraints:
+    * `numbertype`: Type `type` restricted to the numeric (non-string and
+      non-bool) types.
+    * `realnumbertype`: Like `numbertype` without complex types.
+    * `quantizedtype`: Like `numbertype` but just the quantized number types.
+
+    The specific lists of types allowed by these are defined by the functions
+    (like `NumberTypes()`) in
+    [`tensorflow/core/framework/types.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.h).
+    In this example the attr `t` must be one of the numeric types:
+
+    ```c++
+    REGISTER_OP("NumberType")
+        .Attr("t: numbertype");
+    ```
+
+    For this op:
+
+    ```python
+    tf.number_type(t=tf.int32)  # Valid
+    tf.number_type(t=tf.bool)   # Invalid
+    ```
+
+* `int >= <n>`: The value must be an int whose value is greater than or equal to
+  `<n>`, where `<n>` is a natural number.
+
+  For example, the following Op registration specifies that the attr `a` must
+  have a value that is at least `2`:
+
+  ```c++
+  REGISTER_OP("MinIntExample")
+      .Attr("a: int >= 2");
+  ```
+
+* `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
+  or equal to `<n>`.
+
+  For example, the following Op registration specifies that the attr `a` is a
+  list of types (either `int32` or `float`), and that there must be at least 3
+  of them:
+
+  ```c++
+  REGISTER_OP("TypeListExample")
+      .Attr("a: list({int32, float}) >= 3");
+  ```
+
+To set a default value for an attr (making it optional in the generated code),
+add `= <default>` to the end, as in:
+
+```c++
+REGISTER_OP("AttrDefaultExample")
+    .Attr("i: int = 0");
+```
+
+The supported syntax of the default value is what would be used in the proto
+representation of the resulting GraphDef definition.
+
+Here are examples for how to specify a default for all types:
+
+```c++
+REGISTER_OP("AttrDefaultExampleForAllTypes")
+   .Attr("s: string = 'foo'")
+   .Attr("i: int = 0")
+   .Attr("f: float = 1.0")
+   .Attr("b: bool = true")
+   .Attr("ty: type = DT_INT32")
+   .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+   .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+   .Attr("l_empty: list(int) = []")
+   .Attr("l_int: list(int) = [2, 3, 5, 7]");
+```
+
+Note in particular that the values of type `type` use [the `DT_*` names
+for the types](../../resources/dims_types.md#data-types).
+
+### Polymorphism <div class="md-anchor" id="polymorphism">{#polymorphism}</div>
+#### Type Polymorphism {#type-polymorphism}
+
+For ops that can take different types as input or produce different output
+types, you can specify [an attr](#attrs) in
+[an input or output type](#inputs-outputs) in the Op registration.  Typically
+you would then register an `OpKernel` for each supported type.
+
+For instance, if you'd like the `ZeroOut` Op to work on `float`s
+in addition to `int32`s, your Op registration might look like:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Your Op registration now specifies that the input's type must be `float`, or
+`int32`, and that its output will be the same type, since both have type `T`.
+
+> A note on naming:{#naming} Inputs, outputs, and attrs generally should be
+> given snake_case names.  The one exception is attrs that are used as the type
+> of an input or in the type of an input. Those attrs can be inferred when the
+> op is added to the graph and so don't appear in the op's function.  For
+> example, this last definition of ZeroOut will generate a Python function that
+> looks like:
+>
+> ```python
+> def zero_out(to_zero, name=None):
+>   """...
+>   Args:
+>     to_zero: A `Tensor`. Must be one of the following types:
+>         `float32`, `int32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor`. Has the same type as `x`.
+>   """
+> ```
+>
+> If `to_zero` is passed an `int32` tensor, then `T` is automatically set to
+> `int32` (well, actually `DT_INT32`). Those inferred attrs are given
+> Capitalized or CamelCase names.
+>
+> Compare this with an op that has a type attr that determines the output
+> type:
+>
+> ```c++
+> REGISTER_OP("StringToNumber")
+>     .Input("string_tensor: string")
+>     .Output("output: out_type")
+>     .Attr("out_type: {float, int32}");
+>     .Doc(R"doc(
+> Converts each string in the input Tensor to the specified numeric type.
+> )doc");
+> ```
+>
+> In this case, the user has to specify the output type, as in the generated
+> Python:
+>
+> ```python
+> def string_to_number(string_tensor, out_type=None, name=None):
+>   """Converts each string in the input Tensor to the specified numeric type.
+>
+>   Args:
+>     string_tensor: A `Tensor` of type `string`.
+>     out_type: An optional `tf.DType` from: `tf.float32, tf.int32`.
+>       Defaults to `tf.float32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor` of type `out_type`.
+>   """
+> ```
+
+<code class="lang-c++"><pre>
+\#include "tensorflow/core/framework/op_kernel.h"<br/>
+class ZeroOut<b>Int32</b>Op : public OpKernel {
+  // as before
+};<br/>
+class ZeroOut<b>Float</b>Op : public OpKernel {
+ public:
+  explicit ZeroOut<b>Float</b>Op(OpKernelConstruction\* context)
+      : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat&lt;<b>float</b>&gt;();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat&lt;<b>float</b>&gt;();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/><b>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    <b>.TypeConstraint&lt;int32&gt;("T"),</b>
+    ZeroOutOp<b>Int32</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    ZeroOutFloatOp);
+</b></pre></code>
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>   <b>.Attr("T: {float, int32} = DT_INT32")</b>
+>   .Input("to_zero: T")
+>   .Output("zeroed: T")
+> </pre></code>
+
+Lets say you wanted to add more types, say `double`:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, <b>double,</b> int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Instead of writing another `OpKernel` with redundant code as above, often you
+will be able to use a C++ template instead.  You will still have one kernel
+registration (`REGISTER\_KERNEL\_BUILDER` call) per overload.
+
+<code class="lang-c++"><pre>
+<b>template &lt;typename T&gt;</b>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat<b>&lt;T&gt;</b>();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat<b>&lt;T&gt;</b>();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;int32&gt;("T"),
+    <b>ZeroOutOp&lt;int32&gt;</b>);
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    <b>ZeroOutOp&lt;float&gt;</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;double&gt;("T"),
+    ZeroOutOp&lt;double&gt;);
+</b></pre></code>
+
+If you have more than a couple overloads, you can put the registration in a
+macro.
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+
+#undef REGISTER_KERNEL
+```
+
+Depending on the list of types you are registering the kernel for, you may be
+able to use a macro provided by
+[`tensorflow/core/framework/register_types.h`][register_types]:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+
+REGISTER_OP("ZeroOut")
+    .Attr("T: realnumbertypes")
+    .Input("to_zero: T")
+    .Output("zeroed: T");
+
+template <typename T>
+class ZeroOutOp : public OpKernel { ... };
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
+
+#undef REGISTER_KERNEL
+```
+
+#### List Inputs and Outputs {#list-input-output}
+
+In addition to being able to accept or produce different types, ops can consume
+or produce a variable number of tensors.
+
+In the next example, the attr `T` holds a *list* of types, and is used as the
+type of both the input `in` and the output `out`.  The input and output are
+lists of tensors of that type (and the number and types of tensors in the output
+are the same as the input, since both have type `T`).
+
+```c++
+REGISTER_OP("PolymorphicListExample")
+    .Attr("T: list(type)")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+You can also place restrictions on what types can be specified in the list. In
+this next case, the input is a list of `float` and `double` tensors. The Op
+accepts, for example, input types `(float, double, float)` and in that case the
+output type would also be `(float, double, float)`.
+
+```c++
+REGISTER_OP("ListTypeRestrictionExample")
+    .Attr("T: list({float, double})")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+If you want all the tensors in a list to be of the same type, you might do
+something like:
+
+```c++
+REGISTER_OP("IntListInputExample")
+    .Attr("N: int")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+This accepts a list of `int32` tensors, and uses an `int` attr `N` to
+specify the length of the list.
+
+This can be made [type polymorphic](#type-polymorphism) as well.  In the next
+example, the input is a list of tensors (with length `"N"`) of the same (but
+unspecified) type (`"T"`), and the output is a single tensor of matching type:
+
+```c++
+REGISTER_OP("SameListInputExample")
+    .Attr("N: int")
+    .Attr("T: type")
+    .Input("in: N * T")
+    .Output("out: T");
+```
+
+By default, tensor lists have a minimum length of 1. You can change that default
+using
+[a `">="` constraint on the corresponding attr](#default-values-constraints).
+In this next example, the input is a list of at least 2 `int32` tensors:
+
+```c++
+REGISTER_OP("MinLengthIntListExample")
+    .Attr("N: int >= 2")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+The same syntax works with `"list(type)"` attrs:
+
+```c++
+REGISTER_OP("MinimumLengthPolymorphicListExample")
+    .Attr("T: list(type) >= 3")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+### Inputs and Outputs <div class="md-anchor" id="AUTOGENERATED-inputs-and-outputs">{#AUTOGENERATED-inputs-and-outputs}</div>
+
+To summarize the above, an Op registration can have multiple inputs and outputs:
+
+```c++
+REGISTER_OP("MultipleInsAndOuts")
+    .Input("y: int32")
+    .Input("z: float")
+    .Output("a: string")
+    .Output("b: int32");
+```
+
+Each input or output spec is of the form:
+
+```
+<name>: <io-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores. `<io-type-expr>` is one of the following type
+expressions:
+
+* `<type>`, where `<type>` is a supported input type (e.g. `float`, `int32`,
+  `string`). This specifies a single tensor of the given type.
+
+  See
+  [the list of supported Tensor types](../../resources/dims_types.md#data-types).
+
+  ```c++
+  REGISTER_OP("BuiltInTypesExample")
+      .Input("integers: int32")
+      .Input("complex_numbers: scomplex64");
+  ```
+
+* `<attr-type>`, where `<attr-type>` is the name of an [Attr](#attrs) with type
+  `type` or `list(type)` (with a possible type restriction). This syntax allows
+  for [polymorphic ops](#polymorphism).
+
+  ```c++
+  REGISTER_OP("PolymorphicSingleInput")
+      .Attr("T: type")
+      .Input("in: T);
+
+  REGISTER_OP("RestrictedPolymorphicSingleInput")
+      .Attr("T: {int32, int64}")
+      .Input("in: T);
+  ```
+
+  Referencing an attr of type `list(type)` allows you to accept a sequence of
+  tensors.
+
+  ```c++
+  REGISTER_OP("ArbitraryTensorSequenceExample")
+      .Attr("T: list(type)")
+      .Input("in: T")
+      .Output("out: T");
+
+  REGISTER_OP("RestrictedTensorSequenceExample")
+      .Attr("T: list({int32, int64})")
+      .Input("in: T")
+      .Output("out: T");
+  ```
+
+  Note that the number and types of tensors in the output `out` is the same as
+  in the input `in`, since both are of type `T`.
+
+* For a sequence of tensors with the same type: `<number> * <type>`, where
+  `<number>` is the name of an [Attr](#attrs) with type `int`.  The `<type>` can
+  either be
+  [a specific type like `int32` or `float`](../../resources/dims_types.md#data-types),
+  or the name of an attr with type `type`.  As an example of the first, this
+  Op accepts a list of `int32` tensors:
+
+  ```c++
+  REGISTER_OP("Int32SequenceExample")
+      .Attr("NumTensors: int")
+      .Input("in: NumTensors * int32")
+  ```
+
+  Whereas this Op accepts a list of tensors of any type, as long as they are all
+  the same:
+
+  ```c++
+  REGISTER_OP("SameTypeSequenceExample")
+      .Attr("NumTensors: int")
+      .Attr("T: type")
+      .Input("in: NumTensors * T")
+  ```
+
+* For a reference to a tensor: `Ref(<type>)`, where `<type>` is one of the
+  previous types.
+
+> A note on naming: Any attr used in the type of an input will be inferred.  By
+> convention those inferred attrs use capital names (like `T` or `N`).
+> Otherwise inputs, outputs, and attrs have names like function parameters
+> (e.g. `num_outputs`).  For more details, see the
+> [earlier note on naming](#naming).
+
+For more details, see
+[`tensorflow/core/framework/op_def_builder.h`][op_def_builder].
+
+### Backwards compatibility <div class="md-anchor" id="AUTOGENERATED-backwards-compatibility">{#AUTOGENERATED-backwards-compatibility}</div>
+
+In general, changes to specifications must be backwards-compatible: changing the
+specification of an Op must not break prior serialized GraphDefs constructed
+from older specfications.
+
+There are several ways to preserve backwards-compatibility.
+
+1. Any new attrs added to an operation must have default values defined, and
+   with that default value the Op must have the original behavior. To change an
+   operation from not polymorphic to polymorphic, you *must* give a default
+   value to the new type attr to preserve the original signature by default. For
+   example, if your operation was:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: float")
+       .Output("out: float");
+   ```
+
+   you can make it polymorphic in a backwards-compatible way using:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: T")
+       .Output("out: T")
+       .Attr("T: numerictype = float");
+   ```
+
+1. You can safely make a constraint on an attr less restrictive.  For example,
+   you can change from `{int32, int64}` to `{int32, int64, float}` or from
+   `{"apple", "orange"}` to `{"apple", "banana", "orange"}`.
+
+1. Namespace any new Ops you create, by prefixing the Op names with something
+   unique to your project. This avoids having your Op colliding with any Ops
+   that might be included in future versions of Tensorflow.
+
+1. Plan ahead! Try to anticipate future uses for the Op. Some signature changes
+   can't be done in a compatible way (for example, adding an input, or making a
+   single input into a list).
+
+If you cannot make your change to an operation backwards compatible, then
+create a new operation with a new name with the new semantics.
+
+## GPU Support <div class="md-anchor" id="mult-archs">{#mult-archs}</div>
+
+You can implement different OpKernels and register one for CPU and another for
+GPU, just like you can [register kernels for different types](#polymorphism).
+There are several examples of kernels with GPU support in
+[tensorflow/core/kernels/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/).
+Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
+ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.
+
+For example, the [`pad` op](../../api_docs/python/array_ops.md#pad) has
+everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
+The GPU kernel is in
+[`tensorflow/core/kernels/pad_op_gpu.cu.cc`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op_gpu.cu.cc),
+and the shared code is a templated class defined in
+[`tensorflow/core/kernels/pad_op.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.h).
+One thing to note, even when the GPU kernel version of `pad` is used, it still
+needs its `"paddings"` input in CPU memory.  To mark that inputs or outputs are
+kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
+
+```c++
+#define REGISTER_GPU_KERNEL(T)                         \
+  REGISTER_KERNEL_BUILDER(Name("Pad")                  \
+                              .Device(DEVICE_GPU)      \
+                              .TypeConstraint<T>("T")  \
+                              .HostMemory("paddings"), \
+                          PadOp<GPUDevice, T>)
+```
+
+## Implement the gradient in Python <div class="md-anchor" id="AUTOGENERATED-implement-the-gradient-in-python">{#AUTOGENERATED-implement-the-gradient-in-python}</div>
+
+[TODO]:# (Write this!)
+
+## Implement a shape function in Python <div class="md-anchor" id="AUTOGENERATED-implement-a-shape-function-in-python">{#AUTOGENERATED-implement-a-shape-function-in-python}</div>
+
+The TensorFlow Python API has a feature called "shape inference" that provides
+information about the shapes of tensors without having to execute the
+graph. Shape inference is supported by "shape functions" that are registered for
+each op type, and perform two roles: asserting that the shapes of the inputs are
+compatible, and specifying the shapes for the outputs. A shape function is a
+Python function that takes an
+[`Operation`](../../api_docs/python/framework.md#Operation) as input, and
+returns a list of
+[`TensorShape`](../../api_docs/python/framework.md#TensorShape) objects (one per
+output of the op). To register a shape function, apply the
+[`tf.RegisterShape` decorator](../../api_docs/python/framework.md#RegisterShape)
+to a shape function. For example, the
+[ZeroOut op defined above](#define_interface) would have a shape function like
+the following:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the unconstrained version of ZeroOut, which produces an output
+  with the same shape as its input.
+  """
+  return [op.inputs[0].get_shape()]
+```
+
+A shape function can also constrain the shape of an input. For the version of
+[ZeroOut with a vector shape constraint](#validation), the shape function
+would be as follows:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the constrained version of ZeroOut, which requires the input to
+  have rank 1 (a vector).
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(1)
+  return [input_shape]
+```
+
+If your op is [polymorphic with multiple inputs](#polymorphism), use the
+properties of the operation to determine the number of shapes to check:
+
+```
+@tf.RegisterShape("IntListInputExample")
+def _int_list_input_example_shape(op):
+  """Shape function for the "IntListInputExample" op.
+
+  All inputs and the output are matrices of the same size.
+  """
+  output_shape = tf.TensorShape(None)
+  for input in op.inputs:
+    output_shape = output_shape.merge_with(input.get_shape().with_rank(2))
+  return [output_shape]
+```
+
+Since shape inference is an optional feature, and the shapes of tensors may vary
+dynamically, shape functions must be robust to incomplete shape information for
+any of the inputs. The [`merge_with()`](../../api_docs/python/framework.md)
+method allows the caller to assert that two shapes are the same, even if either
+or both of them do not have complete information. Shape functions are defined
+for all of the
+[standard Python ops](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/),
+and provide many different usage examples.
+
+[core-array_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/ops/array_ops.cc
+[python-user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py
+[tf-kernels]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/
+[user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/user_ops/
+[pad_op]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.cc
+[standard_ops-py]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/standard_ops.py
+[standard_ops-cc]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/cc/ops/standard_ops.h
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[validation-macros]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/lib/core/errors.h
+[op_def_builder]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.h
+[register_types]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/register_types.h
+[FinalizeAttr]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.cc#FinalizeAttr
+[DataTypeString]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.cc#DataTypeString
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[types-proto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.proto
+[TensorShapeProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor_shape.proto
+[TensorProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor.proto
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc b/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc
new file mode 100644
index 0000000000..3d2f50d16e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc
@@ -0,0 +1,64 @@
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+
+using namespace tensorflow;
+
+template <typename T>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<T>();
+
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input_tensor.shape(), &output));
+    auto output_flat = output->template flat<T>();
+
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i < N; i++) {
+      output_flat(i) = 0;
+    }
+
+    // Preserve the first input value
+    if (N > 0) output_flat(0) = input(0);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        ZeroOutOp<float>);
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        ZeroOutOp<double>);
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<int>("T"),
+                        ZeroOutOp<int>);
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+REGISTER_KERNEL(int32);
+
+#undef REGISTER_KERNEL
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
+
+#undef REGISTER_KERNEL
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py
new file mode 100644
index 0000000000..321f603adf
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py
@@ -0,0 +1,18 @@
+"""Test for version 1 of the zero_out op."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+from tensorflow.g3doc.how_tos.adding_an_op import gen_zero_out_op_1
+
+
+class ZeroOut1Test(tf.test.TestCase):
+
+  def test(self):
+    with self.test_session():
+      result = gen_zero_out_op_1.zero_out([5, 4, 3, 2, 1])
+      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc
new file mode 100644
index 0000000000..e960adc047
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+REGISTER_OP("ZeroOut")
+    .Input("to_zero: int32")
+    .Output("zeroed: int32")
+    .Doc(R"doc(
+Zeros out all but the first value of a Tensor.
+
+zeroed: A Tensor whose first value is identical to `to_zero`, and 0
+  otherwise.
+
+)doc");
+
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<int32>();
+
+    // Create an output tensor
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+                                                     &output_tensor));
+    auto output = output_tensor->template flat<int32>();
+
+    // Set all but the first element of the output tensor to 0.
+    const int N = input.size();
+    for (int i = 1; i < N; i++) {
+      output(i) = 0;
+    }
+
+    // Preserve the first input value.
+    if (N > 0) output(0) = input(0);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
diff --git a/tensorflow/g3doc/how_tos/graph_viz/index.md b/tensorflow/g3doc/how_tos/graph_viz/index.md
new file mode 100644
index 0000000000..f0a1fc2fe7
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/graph_viz/index.md
@@ -0,0 +1,205 @@
+# TensorBoard: Visualizing Your Graph
+
+TensorFlow computation graphs are powerful but complicated. The graph visualization can help you understand and debug them. Here's an example of the visualization at work.
+
+![Visualization of a TensorFlow graph](./graph_vis_animation.gif "Visualization of a TensorFlow graph")
+*Visualization of a TensorFlow graph.*
+
+To see your own graph, run TensorBoard pointing it to the log directory of the job, click on the graph tab on the top pane and select the appropriate run using the menu at the upper left corner. For in depth information on how to run TensorBoard and make sure you are logging all the necessary information, see [Summaries and TensorBoard](../summaries_and_tensorboard/index.md).
+
+## Name scoping and nodes
+
+Typical TensorFlow graphs can have many thousands of nodes--far too many to see easily all at once, or even to lay out using standard graph tools. To simplify, variable's name can be scoped and the visualization uses this information to define a hierarchy structure on the nodes in the graph, and by default only shows the top of this hierarchy. Here is an example that defines three operations under the `hidden` name scope using [`tf.name_scope()`](https://tensorflow.org/api_docs/python/framework.html?cl=head#name_scope):
+
+```python
+import tensorflow as tf
+
+with tf.name_scope('hidden') as scope:
+  a = tf.constant(5, name='alpha')
+  W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0), name='weights')
+  b = tf.Variable(tf.zeros([1]), name='biases')
+```
+
+This results in the following three op names:
+
+* *hidden*/alpha
+* *hidden*/weights
+* *hidden*/biases
+
+The visualization will, by default, collapse all three into a node labeled `hidden`.
+The extra detail isn't lost. You can double-click, or click
+on the orange `+` sign in the top right to expand the node, and then you'll see
+three subnodes, for `alpha`, `weights` and `biases`.
+
+Here's a real-life example of a more complicated node in its initial and
+expanded states.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./pool1_collapsed.png" alt="Unexpanded name scope" title="Unexpanded name scope" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./pool1_expanded.png" alt="Expanded name scope" title="Expanded name scope" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Initial view of top-level name scope <code>pool_1</code>. Clicking on the orange <code>+</code> button on the top right or double-clicking on the node itself will expand it.
+    </td>
+    <td style="width: 50%;">
+      Expanded view of <code>pool_1</code> name scope. Clicking on the orange <code>-</code> button on the top right or double-clicking on the node itself will collapse the name scope.
+    </td>
+  </tr>
+</table>
+
+Grouping nodes by name scopes is critical to making a legible graph. If you're
+building a model, name scopes give you control over the resulting visualization.
+**The better your name scopes, the better your visualization.**
+
+The figure above illustrates a second aspect of the visualization. TensorFlow
+graphs have two kinds of connections: data dependencies and control
+dependencies. Data dependencies show the flow of tensors between two ops and
+are shown as solid arrows, while control dependencies use dotted lines. In the
+expanded view (right side of the figure above) all the connections are data
+dependencies with the exception of the dotted line connecting `CheckNumerics`
+and `control_dependency`.
+
+There's a second trick to simplifying the layout. Most TensorFlow graphs have a
+few nodes with many connections to other nodes. For example, many nodes might
+have a control dependencies on an initialization step. Drawing all edges
+between the `init` node and its dependencies would create a very cluttered
+view.
+
+To reduce clutter, the visualization separates out all high-degree nodes to an
+"auxiliary" area on the right and doesn't draw lines to represent their edges.
+Instead of lines, we draw small "node icons" to indicate the connections.
+Separating out the auxiliary nodes typically doesn't remove critical
+information since these nodes are usually related to bookkeeping functions.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./conv_1.png" alt="conv_1 is part of the main graph" title="conv_1 is part of the main graph" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./save.png" alt="save is extracted as auxiliary node" title="save is extracted as auxiliary node" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Node <code>conv_1</code> is connected to <code>save</code>. Note the little <code>save</code> node icon on its right.
+    </td>
+    <td style="width: 50%;">
+      <code>save</code> has a high degree, and will appear as an auxiliary node. The connection with <code>conv_1</code> is shown as a node icon on its left. To further reduce clutter, since <code>save</code> has a lot of connections, we show the first 5 and abbreviate the others as <code>... 12 more</code>.
+    </td>
+  </tr>
+</table>
+
+One last structural simplification is "series collapsing". Sequential
+motifs--that is, nodes whose names differ by a number at the end and have
+isomorphic structures--are collapsed into a single "stack" of nodes, as shown
+below. For networks with long sequences, this greatly simplifies the view. As
+with hierarchical nodes, double-clicking expands the series.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./series.png" alt="Sequence of nodes" title="Sequence of nodes" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./series_expanded.png" alt="Expanded sequence of nodes" title="Expanded sequence of nodes" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      A collapsed view of a node sequence.
+    </td>
+    <td style="width: 50%;">
+      A small piece of the expanded view, after double-click.
+    </td>
+  </tr>
+</table>
+
+Finally, as one last aid to legibility, the visualization uses special icons
+for constants and summary nodes. To summarize, here's a table of node symbols:
+
+Symbol | Meaning
+--- | ---
+![Name scope](./namespace_node.png "Name scope") | "High-level" node representing a name scope. Double-click to expand a high-level node.
+![Sequence of unconnected nodes](./horizontal_stack.png "Sequence of unconnected nodes") | Sequence of numbered nodes that are not connected to each other.
+![Sequence of connected nodes](./vertical_stack.png "Sequence of connected nodes") | Sequence of numbered nodes that are connected to each other.
+![Operation node](./op_node.png "Operation node") | An individual operation node.
+![Constant node](./constant.png "Constant node") | A constant.
+![Summary node](./summary.png "Summary node") | A summary node.
+![Data flow edge](./dataflow_edge.png "Data flow edge") | Edge showing the data flow between operations.
+![Control dependency edge](./control_edge.png "Control dependency edge") | Edge showing the control dependency between operations.
+![Reference edge](./reference_edge.png "Reference edge") | A reference edge showing that the outgoing operation node can mutate the incoming tensor.
+
+## Interaction
+
+Navigate the graph by panning and zooming. Click and drag to pan, and use a
+scroll gesture to zoom. Double-click on a node, or click on its `+` button, to
+expand a name scope that represents a group of operations. To easily keep
+track of the current viewpoint when zooming and panning, there is a minimap in
+the bottom right corner.
+
+To close an open node, double-click it again or click its `-` button. You can
+also click once to select a node. It will turn a darker color, and details
+about it and the nodes it connects to will appear in the info card at upper
+right corner of the visualization.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./infocard.png" alt="Info card of a name scope" title="Info card of a name scope" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./infocard_op.png" alt="Info card of operation node" title="Info card of operation node" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Info card showing detailed information for the <code>conv2</code> name scope. The inputs and outputs are combined from the inputs and outputs of the operation nodes inside the name scope. For name scopes no attributes are shown.
+    </td>
+    <td style="width: 50%;">
+      Info card showing detailed information for the <code>DecodeRaw</code> operation node. In addition to inputs and outputs, the card shows the device and the attributes associated with the current operation.
+    </td>
+  </tr>
+</table>
+
+Selection can also be helpful in understanding high-degree nodes. Select any
+high-degree node, and the corresponding "node icons" for its other connections
+will be selected as well. This makes it easy, for example, to see which nodes
+are being saved--and which aren't.
+
+Clicking on a node name in the info card will select it. If necessary, the
+viewpoint will automatically pan so that the node is visible.
+
+Finally, you can choose two color schemes for your graph, using the color menu
+above the legend. The default "Structure View" shows structure: when two
+high-level nodes have the same structure, they appear in the same color of the
+rainbow. Uniquely structured nodes are gray. There's a second view, which shows
+what device the different operations run on. Name scopes are colored
+proportionally to the fraction of devices for the operations inside them.
+
+The images below give an illustration for a piece of a real-life graph.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./colorby_structure.png" alt="Color by structure" title="Color by structure" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./colorby_device.png" alt="Color by device" title="Color by device" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Structure view: The gray nodes have unique structure. The orange <code>conv1</code> and <code>conv2</code> nodes have the same structure, and analogously for nodes with other colors.
+    </td>
+    <td style="width: 50%;">
+      Device view: Name scopes are colored proportionally to the fraction of devices of the operation nodes inside them. Here, purple means GPU and the green is CPU.
+    </td>
+  </tr>
+</table>
diff --git a/tensorflow/g3doc/how_tos/index.md b/tensorflow/g3doc/how_tos/index.md
new file mode 100644
index 0000000000..f5c74715e8
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/index.md
@@ -0,0 +1,102 @@
+# Overview
+
+
+## Variables: Creation, Initializing, Saving, and Restoring
+
+TensorFlow Variables are in-memory buffers containing tensors.  Learn how to
+use them to hold and update model parameters during training.
+
+[View Tutorial](variables/index.md)
+
+
+## TensorFlow Mechanics 101
+
+A step-by-step walk through of the details of using TensorFlow infrastructure
+to train models at scale, using MNIST handwritten digit recognition as a toy
+example.
+
+[View Tutorial](../tutorials/mnist/tf/index.md)
+
+
+## TensorBoard: Visualizing Your Training
+
+TensorBoard is a useful tool for visualizing the training and evaluation of
+your model(s).  This tutorial describes how to build and run TensorBoard as well
+as how to add Summary ops to automatically output data to the Events files that
+TensorBoard uses for display.
+
+[View Tutorial](summaries_and_tensorboard/index.md)
+
+
+## TensorBoard: Visualizing Your Graph
+
+This tutorial describes how to use the graph visualizer in TensorBoard to help
+you understand the dataflow graph and debug it.
+
+[View Tutorial](graph_viz/index.md)
+
+
+## Reading Data
+
+This tutorial describes the three main methods of getting data into your
+TensorFlow program: Feeding, Reading and Preloading.
+
+[View Tutorial](reading_data/index.md)
+
+
+## Threading and Queues
+
+This tutorial describes the various constructs implemented by TensorFlow
+to facilitate asynchronous and concurrent training.
+
+[View Tutorial](threading_and_queues/index.md)
+
+
+## Adding a New Op
+
+TensorFlow already has a large suite of node operations from which you can
+compose in your graph, but here are the details of how to add you own custom Op.
+
+[View Tutorial](adding_an_op/index.md)
+
+
+## New Data Formats
+
+If you have a sizable custom data set, you may want to consider extending
+TensorFlow to read your data directly in it's native format.  Here's how.
+
+[View Tutorial](new_data_formats/index.md)
+
+
+## Using One or More GPUs
+
+This tutorial describes how to construct and execute models on GPU(s).
+
+[View Tutorial](using_gpu/index.md)
+
+
+## Sharing Variables
+
+When deploying large models on multiple GPUs, or when unrolling complex LSTMs
+or RNNs, it is often necessary to access the same Variable objects from
+different locations in the model construction code.
+
+The "Variable Scope" mechanism is designed to facilitate that.
+
+[View Tutorial](variable_scope/index.md)
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- variables/index.md -->
+<!-- ../tutorials/mnist/tf/index.md -->
+<!-- summaries_and_tensorboard/index.md -->
+<!-- graph_viz/index.md -->
+<!-- reading_data/index.md -->
+<!-- threading_and_queues/index.md -->
+<!-- adding_an_op/index.md -->
+<!-- new_data_formats/index.md -->
+<!-- using_gpu/index.md -->
+<!-- variable_scope/index.md -->
+-->
+</div>
+
diff --git a/tensorflow/g3doc/how_tos/new_data_formats/index.md b/tensorflow/g3doc/how_tos/new_data_formats/index.md
new file mode 100644
index 0000000000..b1b09fe1ff
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/new_data_formats/index.md
@@ -0,0 +1,225 @@
+# Extending TF: Supporting new data formats
+
+PREREQUISITES:
+
+*   Some familiarity with C++.
+*   Must have
+    [downloaded TensorFlow source](../../get_started/os_setup.md#source), and be
+    able to build it.
+
+We divide the task of supporting a file format into two pieces:
+
+*   File formats: We use a *Reader* Op to read a *record* (which can be any
+    string) from a file.
+*   Record formats: We use decoder or parsing Ops to turn a string record
+    into tensors usable by TensorFlow.
+
+For example, to read a
+[CSV file](https://en.wikipedia.org/wiki/Comma-separated_values), we use
+[a Reader for text files](../../api_docs/python/io_ops.md#TextLineReader)
+followed by
+[an Op that parses CSV data from a line of text](../../api_docs/python/io_ops.md#decode_csv).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Writing a Reader for a file format](#AUTOGENERATED-writing-a-reader-for-a-file-format)
+* [Writing an Op for a record format](#AUTOGENERATED-writing-an-op-for-a-record-format)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Writing a Reader for a file format <div class="md-anchor" id="AUTOGENERATED-writing-a-reader-for-a-file-format">{#AUTOGENERATED-writing-a-reader-for-a-file-format}</div>
+
+A `Reader` is something that reads records from a file.  There are some examples
+of Reader Ops already built into TensorFlow:
+
+*   [`tf.TFRecordReader`](../../api_docs/python/io_ops.md#TFRecordReader)
+    ([source in kernels/tf_record_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/tf_record_reader_op.cc))
+*   [`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
+    ([source in kernels/fixed_length_record_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/fixed_length_record_reader_op.cc))
+*   [`tf.TextLineReader`](../../api_docs/python/io_ops.md#TextLineReader)
+    ([source in kernels/text_line_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/text_line_reader_op.cc))
+
+You can see these all expose the same interface, the only differences
+are in their constructors.  The most important method is `read()`.
+It takes a queue argument, which is where it gets filenames to
+read from whenever it needs one (e.g. when the `read` op first runs, or
+the previous `read` reads the last record from a file).  It produces
+two scalar tensors: a string key and and a string value.
+
+To create a new reader called `SomeReader`, you will need to:
+
+1.  In C++, define a subclass of
+    [`tensorflow::ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/reader_base.h)
+    called `SomeReader`.
+2.  In C++, register a new reader op and kernel with the name `"SomeReader"`.
+3.  In Python, define a subclass of [`tf.ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py) called `SomeReader`.
+
+You can put all the C++ code in a file in
+`tensorflow/core/user_ops/some_reader_op.cc`.  The code to read a file will live
+in a descendant of the C++ `ReaderBase` class, which is defined in
+[tensorflow/core/kernels/reader_base.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/reader_base.h).
+You will need to implement the following methods:
+
+*   `OnWorkStartedLocked`: open the next file
+*   `ReadLocked`: read a record or report EOF/error
+*   `OnWorkFinishedLocked`: close the current file, and
+*   `ResetLocked`: get a clean slate after, e.g., an error
+
+These methods have names ending in "Locked" since `ReaderBase` makes sure
+to acquire a mutex before calling any of these methods, so you generally don't
+have to worry about thread safety (though that only protects the members of the
+class, not global state).
+
+For `OnWorkStartedLocked`, the name of the file to open is the value returned by
+the `current_work()` method.  `ReadLocked()` has this signature:
+
+```c++
+Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
+```
+
+If `ReadLocked()` successfully reads a record from the file, it should fill in:
+
+*   `*key`: with an identifier for the record, that a human could use to find
+    this record again.  You can include the filename from `current_work()`,
+    and append a record number or whatever.
+*   `*value`: with the contents of the record.
+*   `*produced`: set to `true`.
+
+If you hit the end of a file (EOF), set `*at_end` to `true`.  In either case,
+return `Status::OK()`.  If there is an error, simply return it using one of the
+helper functions from
+[tensorflow/core/lib/core/errors.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/lib/core/errors.h)
+without modifying any arguments.
+
+Next you will create the actual Reader op.  It will help if you are familiar
+with [the adding an op how-to](../adding_an_op/index.md).  The main steps
+are:
+
+*   Registering the op.
+*   Define and register an `OpKernel`.
+
+To register the op, you will use a `REGISTER_OP()` call defined in
+[tensorflow/core/framework/op.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op.h).
+Reader ops never take any input and always have a single output with type
+`Ref(string)`.  They should always call `SetIsStateful()`, and have a string
+`container` and `shared_name` attrs.  You may optionally define additional attrs
+for configuration or include documentation in a `Doc()`.  For examples, see
+[tensorflow/core/ops/io_ops.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/ops/io_ops.cc),
+e.g.:
+
+```c++
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("TextLineReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("skip_header_lines: int = 0")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the lines of a file delimited by '\n'.
+)doc");
+```
+    
+To define an `OpKernel`, Readers can use the shortcut of descending from
+`ReaderOpKernel`, defined in
+[tensorflow/core/framework/reader_op_kernel.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/reader_op_kernel.h),
+and implement a constructor that calls `SetReaderFactory()`.  After defining
+your class, you will need to register it using `REGISTER_KERNEL_BUILDER(...)`.
+An example with no attrs:
+
+```c++
+#include "tensorflow/core/framework/reader_op_kernel.h"
+
+class TFRecordReaderOp : public ReaderOpKernel {
+ public:
+  explicit TFRecordReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    Env* env = context->env();
+    SetReaderFactory([this, env]() { return new TFRecordReader(name(), env); });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TFRecordReader").Device(DEVICE_CPU),
+                        TFRecordReaderOp);
+```
+
+An example with attrs:
+
+```c++
+#include "tensorflow/core/framework/reader_op_kernel.h"
+
+class TextLineReaderOp : public ReaderOpKernel {
+ public:
+  explicit TextLineReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    int skip_header_lines = -1;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("skip_header_lines", &skip_header_lines));
+    OP_REQUIRES(context, skip_header_lines >= 0,
+                errors::InvalidArgument("skip_header_lines must be >= 0 not ",
+                                        skip_header_lines));
+    Env* env = context->env();
+    SetReaderFactory([this, skip_header_lines, env]() {
+      return new TextLineReader(name(), skip_header_lines, env);
+    });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU),
+                        TextLineReaderOp);
+```
+
+The last step is to add the Python wrapper.  You will import
+`tensorflow.python.ops.io_ops` in
+[tensorflow/python/user_ops/user_ops.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py)
+and add a descendant of [`io_ops.ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py).
+
+```python
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import io_ops
+
+class SomeReader(io_ops.ReaderBase):
+
+    def __init__(self, name=None):
+        rr = gen_user_ops.some_reader(name=name)
+        super(SomeReader, self).__init__(rr)
+
+
+ops.NoGradient("SomeReader")
+ops.RegisterShape("SomeReader")(common_shapes.scalar_shape)
+```
+
+You can see some examples in
+[`tensorflow/python/ops/io_ops.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py).
+
+## Writing an Op for a record format <div class="md-anchor" id="AUTOGENERATED-writing-an-op-for-a-record-format">{#AUTOGENERATED-writing-an-op-for-a-record-format}</div>
+
+Generally this is an ordinary op that takes a scalar string record as input, and
+so follow [the instructions to add an Op](../adding_an_op/index.md).  You may
+optionally take a scalar string key as input, and include that in error messages
+reporting improperly formatted data.  That way users can more easily track down
+where the bad data came from.
+
+Examples of Ops useful for decoding records:
+
+*   [`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
+    (and
+    [`tf.parse_example`](../../api_docs/python/io_ops.md#parse_example))
+*   [`tf.decode_csv`](../../api_docs/python/io_ops.md#decode_csv)
+*   [`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw)
+
+Note that it can be useful to use multiple Ops to decode a particular record
+format.  For example, you may have an image saved as a string in
+[a tf.train.Example protocol buffer](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/example.proto).
+Depending on the format of that image, you might take the corresponding output
+from a
+[`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
+op and call [`tf.decode_jpeg`](../../api_docs/python/image.md#decode_jpeg),
+[`tf.decode_png`](../../api_docs/python/image.md#decode_png), or
+[`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw).  It is common to
+take the output of `tf.decode_raw` and use
+[`tf.slice`](../../api_docs/python/array_ops.md#slice) and
+[`tf.reshape`](../../api_docs/python/array_ops.md#reshape) to extract pieces.
diff --git a/tensorflow/g3doc/how_tos/reading_data/__init__.py b/tensorflow/g3doc/how_tos/reading_data/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/__init__.py
diff --git a/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py b/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py
new file mode 100644
index 0000000000..1d510cdfa9
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py
@@ -0,0 +1,87 @@
+"""Converts MNIST data to TFRecords file format with Example protos."""
+
+import os
+import tensorflow.python.platform
+
+import numpy
+import tensorflow as tf
+from tensorflow.g3doc.tutorials.mnist import input_data
+
+
+TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'  # MNIST filenames
+TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
+TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
+TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
+
+
+tf.app.flags.DEFINE_string('directory', 'data',
+                           'Directory to download data files and write the '
+                           'converted result')
+tf.app.flags.DEFINE_integer('validation_size', 5000,
+                            'Number of examples to separate from the training '
+                            'data for the validation set.')
+FLAGS = tf.app.flags.FLAGS
+
+
+def _int64_feature(value):
+  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
+
+
+def _bytes_feature(value):
+  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
+
+
+def convert_to(images, labels, name):
+    num_examples = labels.shape[0]
+    if images.shape[0] != num_examples:
+        raise ValueError("Images size %d does not match label size %d." %
+                         (dat.shape[0], num_examples))
+    rows = images.shape[1]
+    cols = images.shape[2]
+    depth = images.shape[3]
+
+    filename = os.path.join(FLAGS.directory, name + '.tfrecords')
+    print 'Writing', filename
+    writer = tf.python_io.TFRecordWriter(filename)
+    for index in range(num_examples):
+        image_raw = images[index].tostring()
+        example = tf.train.Example(features=tf.train.Features(feature={
+            'height':_int64_feature(rows),
+            'width':_int64_feature(cols),
+            'depth':_int64_feature(depth),
+            'label':_int64_feature(int(labels[index])),
+            'image_raw':_bytes_feature(image_raw)}))
+        writer.write(example.SerializeToString())
+
+
+def main(argv):
+    # Get the data.
+    train_images_filename = input_data.maybe_download(
+        TRAIN_IMAGES, FLAGS.directory)
+    train_labels_filename = input_data.maybe_download(
+        TRAIN_LABELS, FLAGS.directory)
+    test_images_filename = input_data.maybe_download(
+        TEST_IMAGES, FLAGS.directory)
+    test_labels_filename = input_data.maybe_download(
+        TEST_LABELS, FLAGS.directory)
+
+    # Extract it into numpy arrays.
+    train_images = input_data.extract_images(train_images_filename)
+    train_labels = input_data.extract_labels(train_labels_filename)
+    test_images = input_data.extract_images(test_images_filename)
+    test_labels = input_data.extract_labels(test_labels_filename)
+
+    # Generate a validation set.
+    validation_images = train_images[:FLAGS.validation_size, :, :, :]
+    validation_labels = train_labels[:FLAGS.validation_size]
+    train_images = train_images[FLAGS.validation_size:, :, :, :]
+    train_labels = train_labels[FLAGS.validation_size:]
+
+    # Convert to Examples and write the result to TFRecords.
+    convert_to(train_images, train_labels, 'train')
+    convert_to(validation_images, validation_labels, 'validation')
+    convert_to(test_images, test_labels, 'test')
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py
new file mode 100644
index 0000000000..b2436cd2ab
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py
@@ -0,0 +1,134 @@
+"""Trains the MNIST network using preloaded data in a constant.
+
+Command to run this py_binary target:
+
+bazel run -c opt \
+    <...>/tensorflow/g3doc/how_tos/reading_data:fully_connected_preloaded
+"""
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def run_training():
+    """Train MNIST for a number of epochs."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        with tf.name_scope('input'):
+            # Input data
+            input_images = tf.constant(data_sets.train.images)
+            input_labels = tf.constant(data_sets.train.labels)
+
+            image, label = tf.train.slice_input_producer(
+                [input_images, input_labels], num_epochs=FLAGS.num_epochs)
+            label = tf.cast(label, tf.int32)
+            images, labels = tf.train.batch(
+                [image, label], batch_size=FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create the op for initializing variables.
+        init_op = tf.initialize_all_variables()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        sess.run(init_op)
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        # And then after everything is built, start the training loop.
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Write the summaries and print an overview fairly often.
+                if step % 100 == 0:
+                    # Print status to stdout.
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                    # Update the events file.
+                    summary_str = sess.run(summary_op)
+                    summary_writer.add_summary(summary_str, step)
+                    step += 1
+
+                # Save a checkpoint periodically.
+                if (step + 1) % 1000 == 0:
+                    print 'Saving'
+                    saver.save(sess, FLAGS.train_dir, global_step=step)
+
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Saving'
+            saver.save(sess, FLAGS.train_dir, global_step=step)
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py
new file mode 100644
index 0000000000..89abd60d0e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py
@@ -0,0 +1,146 @@
+"""Trains the MNIST network using preloaded data stored in a variable.
+
+Command to run this py_binary target:
+
+bazel run -c opt \
+    <...>/tensorflow/g3doc/how_tos/reading_data:fully_connected_preloaded_var
+"""
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def run_training():
+    """Train MNIST for a number of epochs."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        with tf.name_scope('input'):
+            # Input data
+            images_initializer = tf.placeholder(
+                dtype=data_sets.train.images.dtype,
+                shape=data_sets.train.images.shape)
+            labels_initializer = tf.placeholder(
+                dtype=data_sets.train.labels.dtype,
+                shape=data_sets.train.labels.shape)
+            input_images = tf.Variable(
+                images_initializer, trainable=False, collections=[])
+            input_labels = tf.Variable(
+                labels_initializer, trainable=False, collections=[])
+
+            image, label = tf.train.slice_input_producer(
+                [input_images, input_labels], num_epochs=FLAGS.num_epochs)
+            label = tf.cast(label, tf.int32)
+            images, labels = tf.train.batch(
+                [image, label], batch_size=FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create the op for initializing variables.
+        init_op = tf.initialize_all_variables()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        sess.run(init_op)
+        sess.run(input_images.initializer,
+                 feed_dict={images_initializer: data_sets.train.images})
+        sess.run(input_labels.initializer,
+                 feed_dict={labels_initializer: data_sets.train.labels})
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        # And then after everything is built, start the training loop.
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Write the summaries and print an overview fairly often.
+                if step % 100 == 0:
+                    # Print status to stdout.
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                    # Update the events file.
+                    summary_str = sess.run(summary_op)
+                    summary_writer.add_summary(summary_str, step)
+                    step += 1
+
+                # Save a checkpoint periodically.
+                if (step + 1) % 1000 == 0:
+                    print 'Saving'
+                    saver.save(sess, FLAGS.train_dir, global_step=step)
+
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Saving'
+            saver.save(sess, FLAGS.train_dir, global_step=step)
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py
new file mode 100644
index 0000000000..f1e10ca34e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py
@@ -0,0 +1,180 @@
+"""Train and Eval the MNIST network.
+
+This version is like fully_connected_feed.py but uses data converted
+to a TFRecords file containing tf.train.Example protocol buffers.
+See tensorflow/g3doc/how_tos/reading_data.md#reading-from-files
+for context.
+
+YOU MUST run convert_to_records before running this (but you only need to
+run it once).
+"""
+
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.')
+flags.DEFINE_string('train_dir', 'data', 'Directory with the training data.')
+
+# Constants used for dealing with the files, matches convert_to_records.
+TRAIN_FILE = 'train.tfrecords'
+VALIDATION_FILE = 'validation.tfrecords'
+
+
+def read_and_decode(filename_queue):
+    reader = tf.TFRecordReader()
+    _, serialized_example = reader.read(filename_queue)
+    features = tf.parse_single_example(
+        serialized_example,
+        dense_keys=['image_raw', 'label'],
+        # Defaults are not specified since both keys are required.
+        dense_types=[tf.string, tf.int64])
+
+    # Convert from a scalar string tensor (whose single string has
+    # length mnist.IMAGE_PIXELS) to a uint8 tensor with shape
+    # [mnist.IMAGE_PIXELS].
+    image = tf.decode_raw(features['image_raw'], tf.uint8)
+    image.set_shape([mnist.IMAGE_PIXELS])
+
+    # OPTIONAL: Could reshape into a 28x28 image and apply distortions
+    # here.  Since we are not applying any distortions in this
+    # example, and the next step expects the image to be flattened
+    # into a vector, we don't bother.
+
+    # Convert from [0, 255] -> [-0.5, 0.5] floats.
+    image = tf.cast(image, tf.float32) * (1. / 255) - 0.5
+
+    # Convert label from a scalar uint8 tensor to an int32 scalar.
+    label = tf.cast(features['label'], tf.int32)
+
+    return image, label
+
+
+def inputs(train, batch_size, num_epochs):
+    """Reads input data num_epochs times.
+
+    Args:
+      train: Selects between the training (True) and validation (False) data.
+      batch_size: Number of examples per returned batch.
+      num_epochs: Number of times to read the input data, or 0/None to
+         train forever.
+
+    Returns:
+      A tuple (images, labels), where:
+      * images is a float tensor with shape [batch_size, mnist.IMAGE_PIXELS]
+        in the range [-0.5, 0.5].
+      * labels is an int32 tensor with shape [batch_size] with the true label,
+        a number in the range [0, mnist.NUM_CLASSES).
+      Note that an tf.train.QueueRunner is added to the graph, which
+      must be run using e.g. tf.train.start_queue_runners().
+    """
+    if not num_epochs: num_epochs = None
+    filename = os.path.join(FLAGS.train_dir,
+                            TRAIN_FILE if train else VALIDATION_FILE)
+
+    with tf.name_scope('input'):
+        filename_queue = tf.train.string_input_producer(
+            [filename], num_epochs=num_epochs)
+
+        # Even when reading in multiple threads, share the filename
+        # queue.
+        image, label = read_and_decode(filename_queue)
+
+        # Shuffle the examples and collect them into batch_size batches.
+        # (Internally uses a RandomShuffleQueue.)
+        # We run this in two threads to avoid being a bottleneck.
+        images, sparse_labels = tf.train.shuffle_batch(
+            [image, label], batch_size=batch_size, num_threads=2,
+            capacity=1000 + 3 * batch_size,
+            # Ensures a minimum amount of shuffling of examples.
+            min_after_dequeue=1000)
+
+        return images, sparse_labels
+
+
+def run_training():
+    """Train MNIST for a number of steps."""
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        # Input images and labels.
+        images, labels = inputs(train=True, batch_size=FLAGS.batch_size,
+                                num_epochs=FLAGS.num_epochs)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images,
+                                 FLAGS.hidden1,
+                                 FLAGS.hidden2)
+
+        # Add to the Graph the loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph operations that train the model.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # The op for initializing the variables.
+        init_op = tf.initialize_all_variables();
+
+        # Create a session for running operations in the Graph.
+        sess = tf.Session()
+
+        # Initialize the variables (the trained variables and the
+        # epoch counter).
+        sess.run(init_op)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.  The return values are
+                # the activations from the `train_op` (which is
+                # discarded) and the `loss` op.  To inspect the values
+                # of your ops or variables, you may include them in
+                # the list passed to sess.run() and the value tensors
+                # will be returned in the tuple from the call.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Print an overview fairly often.
+                if step % 100 == 0:
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md
new file mode 100644
index 0000000000..2b305f9333
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/index.md
@@ -0,0 +1,495 @@
+# Reading data
+
+There are three main methods of getting data into a TensorFlow program:
+
+*   Feeding: Python code provides the data when running each step.
+*   Reading from files: an input pipeline reads the data from files
+    at the beginning of a TensorFlow graph.
+*   Preloaded data: a constant or variable in the TensorFlow graph holds
+    all the data (for small data sets).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Feeding](#Feeding)
+* [Reading from files](#AUTOGENERATED-reading-from-files)
+  * [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits)
+  * [File formats](#AUTOGENERATED-file-formats)
+  * [Preprocessing](#AUTOGENERATED-preprocessing)
+  * [Batching](#AUTOGENERATED-batching)
+  * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner)
+  * [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record)
+  * [Sparse input data](#AUTOGENERATED-sparse-input-data)
+* [Preloaded data](#AUTOGENERATED-preloaded-data)
+* [Multiple input pipelines](#AUTOGENERATED-multiple-input-pipelines)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Feeding <div class="md-anchor" id="Feeding">{#Feeding}</div>
+
+TensorFlow's feed mechanism lets you inject data into any Tensor in a
+computation graph. A python computation can thus feed data directly into the
+graph.
+
+Supply feed data through the `feed_dict` argument to a run() or eval() call
+that initiates computation.
+
+```python
+with tf.Session():
+  input = tf.placeholder(tf.float32)
+  classifier = ...
+  print classifier.eval(feed_dict={input: my_python_preprocessing_fn()})
+```
+
+While you can replace any Tensor with feed data, including variables and
+constants, the best practice is to use a
+[`placeholder` op](../../api_docs/python/io_ops.md#placeholder) node. A
+`placeholder` exists solely to serve as the target of feeds. It is not
+initialized and contains no data. A placeholder generates an error if
+it is executed without a feed, so you won't forget to feed it.
+
+An example using `placeholder` and feeding to train on MNIST data can be found
+in
+[tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py),
+and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md).
+
+## Reading from files <div class="md-anchor" id="AUTOGENERATED-reading-from-files">{#AUTOGENERATED-reading-from-files}</div>
+
+A typical pipeline for reading records from files has the following stages:
+
+1.  The list of filenames
+2.  *Optional* filename shuffling
+3.  *Optional* epoch limit
+4.  Filename queue
+5.  A Reader for the file format
+6.  A decoder for a record read by the reader
+7.  *Optional* preprocessing
+8.  Example queue
+
+### Filenames, shuffling, and epoch limits <div class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits">{#AUTOGENERATED-filenames--shuffling--and-epoch-limits}</div>
+
+For the list of filenames, use either a constant string Tensor (like
+`["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the
+[tf.train.match_filenames_once
+function](../../api_docs/python/io_ops.md#match_filenames_once).
+
+Pass the list of filenames to the [tf.train.string_input_producer
+function](../../api_docs/python/io_ops.md#string_input_producer).
+`string_input_producer` creates a FIFO queue for holding the filenames until
+the reader needs them.
+
+`string_input_producer` has options for shuffling and setting a maximum number
+of epochs. A queue runner adds the whole list of filenames to the queue once
+for each epoch, shuffling the filenames within an epoch if `shuffle=True`.
+This procedure provides a uniform sampling of files, so that examples are not
+under- or over- sampled relative to each other.
+
+The queue runner works in a thread separate from the reader that pulls
+filenames from the queue, so the shuffling and enqueuing process does not
+block the reader.
+
+### File formats <div class="md-anchor" id="AUTOGENERATED-file-formats">{#AUTOGENERATED-file-formats}</div>
+
+Select the reader that matches your input file format and pass the filename
+queue to the reader's read method.  The read method outputs a key identifying
+the file and record (useful for debugging if you have some weird records), and
+a scalar string value. Use one (or more) of the decoder and conversion ops to
+decode this string into the tensors that make up an example.
+
+#### CSV files
+
+To read text files in [comma-separated value (CSV)
+format](https://tools.ietf.org/html/rfc4180), use a
+[TextLineReader](../../api_docs/python/io_ops.md#TextLineReader) with the
+[decode_csv](../../api_docs/python/io_ops.md#decode_csv) operation. For example:
+
+```python
+filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])
+
+reader = tf.TextLineReader()
+key, value = reader.read(filename_queue)
+
+# Default values, in case of empty columns. Also specifies the type of the
+# decoded result.
+record_defaults = [[1], [1], [1], [1], [1]]
+col1, col2, col3, col4, col5 = tf.decode_csv(
+    value, record_defaults=record_defaults)
+features = tf.concat(0, [col1, col2, col3, col4])
+
+with tf.Session() as sess:
+  # Start populating the filename queue.
+  coord = tf.train.Coordinator()
+  threads = tf.train.start_queue_runners(coord=coord)
+
+  for i in range(1200):
+    # Retrieve a single instance:
+    example, label = sess.run([features, col5])
+
+  coord.request_stop()
+  coord.join(threads)
+```
+
+Each execution of `read()` reads a single line from the file. The
+`decode_csv()` op then parses the result into a list of tensors. The
+`record_defaults` argument determines the type of the resulting tensors and
+sets the default value to use if a value is missing in the input string.
+
+You must call `tf.train.start_queue_runners()` to populate the queue before
+you call `run()` or `eval()` to execute the `read()`. Otherwise `read()` will
+block while it waits for filenames from the queue.
+
+#### Fixed length records
+
+To read binary files in which each record is a fixed number of bytes, use
+[tf.FixedLengthRecordReader](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
+with the [tf.decode_raw](../../api_docs/python/io_ops.md#decode_raw) operation.
+The `decode_raw` op converts from a string to a uint8 tensor.
+
+For example, [the CIFAR-10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html)
+uses a file format where each record is represented using a fixed number of
+bytes: 1 byte for the label followed by 3072 bytes of image data. Once you have
+a uint8 tensor, standard operations can slice out each piece and reformat as
+needed. For CIFAR-10, you can see how to do the reading and decoding in
+[tensorflow/models/image/cifar10/cifar10_input.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_input.py)
+and described in
+[this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data).
+
+#### Standard TensorFlow format
+
+Another approach is to convert whatever data you have into a supported format.
+This approach makes it easier to mix and match data sets and network
+architectures. The recommended format for TensorFlow is a TFRecords file
+containing
+[tf.train.Example protocol buffers](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/example.proto)
+(which contain
+[`Features`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/feature.proto)
+as a field).  You write a little program that gets your data, stuffs it in an
+`Example` protocol buffer, serializes the protocol buffer to a string, and then
+writes the string to a TFRecords file using the
+[tf.python_io.TFRecordWriter class](../../api_docs/python/python_io.md#TFRecordWriter).
+For example,
+[tensorflow/g3doc/how_tos/reading_data/convert_to_records.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py)
+converts MNIST data to this format.
+
+To read a file of TFRecords, use
+[tf.TFRecordReader](../../api_docs/python/io_ops.md#TFRecordReader) with
+the [tf.parse_single_example](../../api_docs/python/io_ops.md#parse_single_example)
+decoder. The `parse_single_example` op decodes the example protocol buffers into
+tensors. An MNIST example using the data produced by `convert_to_records` can be
+found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py),
+which you can compare with the `fully_connected_feed` version.
+
+### Preprocessing <div class="md-anchor" id="AUTOGENERATED-preprocessing">{#AUTOGENERATED-preprocessing}</div>
+
+You can then do any preprocessing of these examples you want. This would be any
+processing that doesn't depend on trainable parameters. Examples include
+normalization of your data, picking a random slice, adding noise or distortions,
+etc.  See
+[tensorflow/models/image/cifar10/cifar10.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py)
+for an example.
+
+### Batching <div class="md-anchor" id="AUTOGENERATED-batching">{#AUTOGENERATED-batching}</div>
+
+At the end of the pipeline we use another queue to batch together examples for
+training, evaluation, or inference.  For this we use a queue that randomizes the
+order of examples, using the
+[tf.train.shuffle_batch function](../../api_docs/python/io_ops.md#shuffle_batch).
+
+Example:
+
+```
+def read_my_file_format(filename_queue):
+  reader = tf.SomeReader()
+  key, record_string = reader.read(filename_queue)
+  example, label = tf.some_decoder(record_string)
+  processed_example = some_processing(example)
+  return processed_example, label
+
+def input_pipeline(filenames, batch_size, num_epochs=None):
+  filename_queue = tf.train.string_input_producer(
+      filenames, num_epochs=num_epochs, shuffle=True)
+  example, label = read_my_file_format(filename_queue)
+  # min_after_dequeue defines how big a buffer we will randomly sample
+  #   from -- bigger means better shuffling but slower start up and more
+  #   memory used.
+  # capacity must be larger than min_after_dequeue and the amount larger
+  #   determines the maximum we will prefetch.  Recommendation:
+  #   min_after_dequeue + (num_threads + a small safety margin) * batch_size
+  min_after_dequeue = 10000
+  capacity = min_after_dequeue + 3 * batch_size
+  example_batch, label_batch = tf.train.shuffle_batch(
+      [example, label], batch_size=batch_size, capacity=capacity,
+      min_after_dequeue=min_after_dequeue)
+  return example_batch, label_batch
+```
+
+If you need more parallelism or shuffling of examples between files, use
+multiple reader instances using the
+[tf.train.shuffle_batch_join function](../../api_docs/python/io_ops.md#shuffle_batch_join).
+For example:
+
+```
+def read_my_file_format(filename_queue):
+  # Same as above
+
+def input_pipeline(filenames, batch_size, read_threads, num_epochs=None):
+  filename_queue = tf.train.string_input_producer(
+      filenames, num_epochs=num_epochs, shuffle=True)
+  example_list = [read_my_file_format(filename_queue)
+                  for _ in range(read_threads)]
+  min_after_dequeue = 10000
+  capacity = min_after_dequeue + 3 * batch_size
+  example_batch, label_batch = tf.train.shuffle_batch_join(
+      example_list, batch_size=batch_size, capacity=capacity,
+      min_after_dequeue=min_after_dequeue)
+  return example_batch, label_batch
+```
+
+You still only use a single filename queue that is shared by all the readers.
+That way we ensure that the different readers use different files from the same
+epoch until all the files from the epoch have been started.  (It is also usually
+sufficient to have a single thread filling the filename queue.)
+
+An alternative is to use a single reader via the
+[tf.train.shuffle_batch function](../../api_docs/python/io_ops.md#shuffle_batch)
+with `num_threads` bigger than 1.  This will make it read from a single file at
+the same time (but faster than with 1 thread), instead of N files at once.
+This can be important:
+
+*   If you have more reading threads than input files, to avoid the risk that
+    you will have two threads reading the same example from the same file near
+    each other.
+*   Or if reading N files in parallel causes too many disk seeks.
+
+How many threads do you need? the `tf.train.shuffle_batch*` functions add a
+summary to the graph that indicates how full the example queue is. If you have
+enough reading threads, that summary will stay above zero.  You can
+[view your summaries as training progresses using TensorBoard](../summaries_and_tensorboard/index.md).
+
+### Creating threads to prefetch using `QueueRunner` objects <div class="md-anchor" id="QueueRunner">{#QueueRunner}</div>
+
+The short version: many of the `tf.train` functions listed above add
+[`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your
+graph.  These require that you call
+[tf.train.start_queue_runners](../../api_docs/python/train.md#start_queue_runners)
+before running any training or inference steps, or it will hang forever. This
+will start threads that run the input pipeline, filling the example queue so
+that the dequeue to get the examples will succeed.  This is best combined with a
+[tf.train.Coordinator](../../api_docs/python/train.md#Coordinator) to cleanly
+shut down these threads when there are errors. If you set a limit on the number
+of epochs, that will use an epoch counter that will need to be intialized.  The
+recommended code pattern combining these is:
+
+```python
+# Create the graph, etc.
+init_op = tf.initialize_all_variables()
+
+# Create a session for running operations in the Graph.
+sess = tf.Session()
+
+# Initialize the variables (like the epoch counter).
+sess.run(init_op)
+
+# Start input enqueue threads.
+coord = tf.train.Coordinator()
+threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+try:
+    while not coord.should_stop():
+        # Run training steps or whatever
+        sess.run(train_op)
+
+except tf.errors.OutOfRangeError:
+    print 'Done training -- epoch limit reached'
+finally:
+    # When done, ask the threads to stop.
+    coord.request_stop()
+
+# Wait for threads to finish.
+coord.join(threads)
+sess.close()
+```
+
+#### Aside: What is happening here?
+
+First we create the graph. It will have a few pipeline stages that are
+connected by queues. The first stage will generate filenames to read and enqueue
+them in the filename queue. The second stage consumes filenames (using a
+`Reader`), produces examples, and enqueues them in an example queue. Depending
+on how you have set things up, you may actually have a few independent copies of
+the second stage, so that you can read from multiple files in parallel. At the
+end of these stages is an enqueue operation, which enqueues into a queue that
+the next stage dequeues from. We want to start threads running these enqueuing
+operations, so that our training loop can dequeue examples from the example
+queue.
+
+<div style="width:70%; margin-left:12%; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="AnimatedFileQueues.gif">
+</div>
+
+The helpers in `tf.train` that create these queues and enqueuing operations add
+a [tf.train.QueueRunner docs](../../api_docs/python/train.md#QueueRunner) to the
+graph using the
+[tf.train.add_queue_runner](../../api_docs/python/train.md#add_queue_runner)
+function. Each `QueueRunner` is responsible for one stage, and holds the list of
+enqueue operations that need to be run in threads. Once the graph is
+constructed, the
+[tf.train.start_queue_runners](../../api_docs/python/train.md#start_queue_runners)
+function asks each QueueRunner in the graph to start its threads running the
+enqueuing operations.
+
+If all goes well, you can now run your training steps and the queues will be
+filled by the background threads. If you have set an epoch limit, at some point
+an attempt to dequeue examples will get an
+[`tf.OutOfRangeError`](../../api_docs/python/client.md#OutOfRangeError).  This
+is the TensorFlow equivalent of "end of file" (EOF) -- this means the epoch
+limit has been reached and no more examples are available.
+
+The last ingredient is the
+[Coordinator](../../api_docs/python/train.md#Coordinator). This is responsible
+for letting all the threads know if anything has signalled a shut down. Most
+commonly this would be because an exception was raised, for example one of the
+threads got an error when running some operation (or an ordinary Python
+exception).
+
+For more about threading, queues, QueueRunners, and Coordinators
+[see here](../threading_and_queues/index.md).
+
+#### Aside: How clean shut-down when limiting epochs works
+
+Imagine you have a model that has set a limit on the number of epochs to train
+on.  That means that the thread generating filenames will only run that many
+times before generating an `OutOfRange` error. The QueueRunner will catch that
+error, close the filename queue, and exit the thread. Closing the queue does two
+things:
+
+*   Any future attempt to enqueue in the filename queue will generate an error.
+    At this point there shouldn't be any threads trying to do that, but this
+    is helpful when queues are closed due to other errors.
+*   Any current or future dequeue will either succeed (if there are enough
+    elements left) or fail (with an `OutOfRange` error) immediately.  They won't
+    block waiting for more elements to be enqueued, since by the previous point
+    that can't happen.
+
+The point is that when the filename queue is closed, there will likely still be
+many filenames in that queue, so the next stage of the pipeline (with the reader
+and other preprocessing) may continue running for some time.  Once the filename
+queue is exhausted, though, the next attempt to dequeue a filename (e.g. from a
+reader that has finished with the file it was working on) will trigger an
+`OutOfRange` error.  In this case, though, you might have multiple threads
+associated with a single QueueRunner.  If this isn't the last thread in the
+QueueRunner, the `OutOfRange` error just causes the one thread to exit.  This
+allows the other threads, which are still finishing up their last file, to
+proceed until they finish as well.  (Assuming you are using a
+[tf.train.Coordinator](../../api_docs/python/train.md#Coordinator),
+other types of errors will cause all the threads to stop.)  Once all the reader
+threads hit the `OutOfRange` error, only then does the next queue, the example
+queue, gets closed.
+
+Again, the example queue will have some elements queued, so training will
+continue until those are exhausted.  If the example queue is a
+[RandomShuffleQueue](../../api_docs/python/io_ops.md#RandomShuffleQueue), say
+because you are using `shuffle_batch` or `shuffle_batch_join`, it normally will
+avoid ever going having fewer than its `min_after_dequeue` attr elements
+buffered.  However, once the queue is closed that restriction will be lifted and
+the queue will eventually empty.  At that point the actual training threads,
+when they try and dequeue from example queue, will start getting `OutOfRange`
+errors and exiting.  Once all the training threads are done,
+[tf.train.Coordinator.join()](../../api_docs/python/train.md#Coordinator.join)
+will return and you can exit cleanly.
+
+### Filtering records or producing multiple examples per record <div class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record">{#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record}</div>
+
+Instead of examples with shapes `[x, y, z]`, you will produce a batch of
+examples with shape `[batch, x, y, z]`.  The batch size can be 0 if you want to
+filter this record out (maybe it is in a hold-out set?), or bigger than 1 if you
+are producing multiple examples per record.  Then simply set `enqueue_many=True`
+when calling one of the batching functions (such as `shuffle_batch` or
+`shuffle_batch_join`).
+
+### Sparse input data <div class="md-anchor" id="AUTOGENERATED-sparse-input-data">{#AUTOGENERATED-sparse-input-data}</div>
+
+SparseTensors don't play well with queues. If you use SparseTensors you have
+to decode the string records using
+[tf.parse_example](../../api_docs/python/io_ops.md#parse_example) **after**
+batching (instead of using `tf.parse_single_example` before batching).
+
+## Preloaded data <div class="md-anchor" id="AUTOGENERATED-preloaded-data">{#AUTOGENERATED-preloaded-data}</div>
+
+This is only used for small data sets that can be loaded entirely in memory.
+There are two approaches:
+
+* Store the data in a constant.
+* Store the data in a variable, that you initialize and then never change.
+
+Using a constant is a bit simpler, but uses more memory (since the constant is
+stored inline in the graph data structure, which may be duplicated a few times).
+
+```python
+training_data = ...
+training_labels = ...
+with tf.Session():
+  input_data = tf.constant(training_data)
+  input_labels = tf.constant(training_labels)
+  ...
+```
+
+To instead use a variable, you need to also initialize it after the graph has been built.
+
+```python
+training_data = ...
+training_labels = ...
+with tf.Session() as sess:
+  data_initializer = tf.placeholder(dtype=training_data.dtype,
+                                    shape=training_data.shape)
+  label_initializer = tf.placeholder(dtype=training_labels.dtype,
+                                     shape=training_labels.shape)
+  input_data = tf.Variable(data_initalizer, trainable=False, collections=[])
+  input_labels = tf.Variable(label_initalizer, trainable=False, collections=[])
+  ...
+  sess.run(input_data.initializer,
+           feed_dict={data_initializer: training_data})
+  sess.run(input_labels.initializer,
+           feed_dict={label_initializer: training_lables})
+```
+
+Setting `trainable=False` keeps the variable out of the
+`GraphKeys.TRAINABLE_VARIABLES` collection in the graph, so we won't try and
+update it when training.  Setting `collections=[]` keeps the variable out of the
+`GraphKeys.VARIABLES` collection used for saving and restoring checkpoints.
+
+Either way,
+[tf.train.slice_input_producer function](../../api_docs/python/io_ops.md#slice_input_producer)
+can be used to produce a slice at a time.  This shuffles the examples across an
+entire epoch, so further shuffling when batching is undesirable.  So instead of
+using the `shuffle_batch` functions, we use the plain
+[tf.train.batch function](../../api_docs/python/io_ops.md#batch).  To use
+multiple preprocessing threads, set the `num_threads` parameter to a number
+bigger than 1.
+
+An MNIST example that preloads the data using constants can be found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py), and one that preloads the data using variables can be found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py),
+You can compare these with the `fully_connected_feed` and
+`fully_connected_reader` versions above.
+
+## Multiple input pipelines <div class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines">{#AUTOGENERATED-multiple-input-pipelines}</div>
+
+Commonly you will want to train on one dataset and evaluate (or "eval") on
+another.  One way to do this is to actually have two separate processes:
+
+* The training process reads training input data and periodically writes
+  checkpoint files with all the trained variables.
+* The evaluation process restores the checkpoint files into an inference
+  model that reads validation input data.
+
+This is what is done in
+[the example CIFAR-10 model](../../tutorials/deep_cnn/index.md#save-and-restore-checkpoints).  This has a couple of benefits:
+
+* The eval is performed on a single snapshot of the trained variables.
+* You can perform the eval even after training has completed and exited.
+
+You can have the train and eval in the same graph in the same process, and share
+their trained variables.  See
+[the shared variables tutorial](../variable_scope/index.md).
diff --git a/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md b/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md
new file mode 100644
index 0000000000..18f4b4260e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md
@@ -0,0 +1,102 @@
+# TensorBoard: Visualizing Your Training
+
+The computations you'll use TensorBoard for - like training a massive
+deep neural network - can be complex and confusing. To make it easier to
+understand, debug, and optimize TensorFlow programs, we've included a suite of
+visualization tools called TensorBoard. You can use TensorBoard to visualize
+your TensorFlow graph, quantitative metrics about the execution of your graph,
+and even additional data like images that pass through it. When TensorBoard is
+fully configured, it looks like this:
+
+TODO(danmane): Enable a live TensorBoard
+![MNIST TensorBoard](./mnist_tensorboard.png "MNIST TensorBoard")
+
+## Serializing the data
+
+TensorBoard operates by reading TensorFlow events files, which contain summary
+data that you can generate when running TensorFlow. Here's the general
+lifecycle for summary data within TensorBoard.
+
+First, create the TensorFlow graph that you'd like to collect summary
+data from, and decide which nodes you would like to annotate with
+[summary operations]
+(../../api_docs/python/train.md#summary-operations).
+
+For example, suppose that you are creating a convolutional neural network for
+training MNIST digits recognition. You'd like to record how the learning rate
+varies over time, and how the objective function is changing. Collect these by
+attaching [`scalar_summary`](../../api_docs/python/train.md#scalar_summary) ops
+to the nodes that output the learning rate and loss respectively. Then, give
+each `scalar_summary` a meaningful `tag`, like `'learning rate'` and `'loss
+function'`.
+
+Perhaps you'd also like to visualize the distributions of activations coming
+off a particular layer, or the distribution of gradients or weights. Collect
+this data by attaching
+[`histogram_summary`](../../api_docs/python/train.md#histogram_summary) ops to
+the gradient outputs and to the variable that holds your weights, respectively.
+
+For details on all of the summary operations avaiable, check out the docs on
+[summary operations]
+(../../api_docs/python/train.md#summary-operations).
+
+Operations in TensorFlow don't do anything until you run them, or an op that
+depends on their output. And the summary nodes that we've just created are
+peripheral to your graph: none of the ops you are currently running depend on
+them. So, to generate summaries, we need to run all of these summary nodes.
+Managing them by hand would be tedious, so use
+[`tf.merge_all_summaries`](../../api_docs/python/train.md#merge_all_summaries)
+to combine them into a single op that generates all the summary data.
+
+Then, you can just run the merged summary op, which will generate a serialized
+`Summary` protobuf object with all of your summary data at a given step.
+Finally, to write this summary data to disk, pass the summary protobuf to a
+[`tf.train.SummaryWriter`](../../api_docs/python/train.md#SummaryWriter).
+
+The `SummaryWriter` takes a logdir in its constructor - this logdir is quite
+important, it's the directory where all of the events will be written out.
+Also, the `SummaryWriter` can optionally take a `GraphDef` in its constructor.
+If it receives one, then TensorBoard will visualize your graph as well.
+
+Now that you've modified your graph and have a `SummaryWriter`, you're ready to
+start runing your network! If you want, you could run the merged summary op
+every single step, and record a ton of training data. That's likely to be more
+data than you need, though. Instead, consider running the merged summary op
+every hundred steps or so, as in the following code example.
+
+```python
+merged_summary_op = tf.merge_all_summaries()
+summary_writer = tf.train.SummaryWriter('/tmp/mnist_logs', sess.graph)
+total_step = 0
+while training:
+  total_step += 1
+  session.run(training_op)
+  if total_step % 100 == 0:
+    summary_str = session.run(merged_summary_op)
+    summary_writer.add_summary(summary_str, total_step)
+```
+
+You're now all set to visualize this data using TensorBoard.
+
+
+## Launching TensorBoard
+
+To run TensorBoard, use the command
+`python tensorflow/tensorboard/tensorboard.py --logdir=path/to/logs`, where
+`logdir` points to the directory where the `SummaryWriter` serialized its data.
+If this `logdir` directory contains sub-directories which contain serialized
+data from separate runs, then TensorBoard will visualize the data from all of
+those runs. Once TensorBoard is running, navigate your web browser to
+localhost:6006 to view the TensorBoard.
+
+If you have pip installed TensorBoard, you can just simply type the command
+`tensorboard --logidr=/path/to/logs` in order to run it.
+
+When looking at TensorBoard, you will see the navigation tabs in the top right
+corner. Each tab represents a set of serialized data that can be visualized.
+For any tab you are looking at, if the logs being looked at by TensorBoard do
+not contain any data relevant to that tab, a message will be displayed
+indicating how to serialize data that is applicable to that tab.
+
+For in depth information on how to use the "graph" tab to visualize your graph,
+see [TensorBoard: Visualizing your graph](../graph_viz/index.md).
diff --git a/tensorflow/g3doc/how_tos/threading_and_queues/index.md b/tensorflow/g3doc/how_tos/threading_and_queues/index.md
new file mode 100644
index 0000000000..c472de18c5
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/threading_and_queues/index.md
@@ -0,0 +1,146 @@
+# Threading and Queues
+
+Queues, such as `FIFOQueue` and `RandomShuffleQueue`, are important TensorFlow
+objects for computing tensors asynchronously in a graph.
+
+For example, a typical input architecture is to use a `RandomShuffleQueue` to
+prepare inputs for training a model:
+
+* Multiple threads prepare training examples and push them in the queue.
+* A training thread executes a training op that dequeues mini-batches from the
+  queue
+
+This architecture has many benefits, as highlighted in the
+[Reading data how to](../reading_data), which also gives an overview of
+functions that simplify the construction of input pipelines.
+
+The TensorFlow `Session` object is multithreaded, so multiple threads can
+easily use the same session and run ops in parallel.  However, it is not always
+easy to implement a Python program that drives threads as described above.  All
+threads must be able to stop together, exceptions must be caught and
+reported, and queues must be properly closed when stopping.
+
+TensorFlow provides two classes to help:
+[tf.Coordinator](../../api_docs/python/train.md#Coordinator) and
+[tf.QueueRunner](../../api_docs/python/train.md#QueueRunner). These two classes
+are designed to be used together. The `Coordinator` class helps multiple threads
+stop together and report exceptions to a program that waits for them to stop.
+The `QueueRunner` class is used to create a number of threads cooperating to
+enqueue tensors in the same queue.
+
+## Coordinator
+
+The Coordinator class helps multiple threads stop together.
+
+Its key methods are:
+
+* `should_stop()`: returns True if the threads should stop.
+* `request_stop(<exception>)`: requests that threads should stop.
+* `join(<list of threads>)`: waits until the specified threads have stopped.
+
+You first create a `Coordinator` object, and then create a number of threads
+that use the coordinator.  The threads typically run loops that stop when
+`should_stop()` returns `True`.
+
+Any thread can decide that the computation should stop.  It only has to call
+`request_stop()` and the other threads will stop as `should_stop()` will then
+return `True`.
+
+```python
+# Thread body: loop until the coordinator indicates a stop was requested.
+# If some condition becomes true, ask the coordinator to stop.
+def MyLoop(coord):
+  while not coord.should_stop():
+    ...do something...
+    if ...some condition...:
+      coord.request_stop()
+
+# Main code: create a coordinator.
+coord = Coordinator()
+
+# Create 10 threads that run 'MyLoop()'
+threads = [threading.Thread(target=MyLoop, args=(coord)) for i in xrange(10)]
+
+# Start the threads and wait for all of them to stop.
+for t in threads: t.start()
+coord.join(threads)
+```
+
+Obviously, the coordinator can manage threads doing very different things.
+They don't have to be all the same as in the example above.  The coordinator
+also has support to capture and report exceptions.  See the [Coordinator class](../../api_docs/python/train.md#Coordinator) documentation for more details.
+
+## QueueRunner
+
+The `QueueRunner` class creates a number of threads that repeatedly run an
+enqueue op.  These threads can use a coordinator to stop together.  In
+addition, a queue runner runs a *closer thread* that automatically closes the
+queue if an exception is reported to the coordinator.
+
+You can use a queue runner to implement the architecture described above.
+
+First build a graph that uses a `Queue` for input examples.  Add ops that
+process examples and enqueue them in the queue.  Add training ops that start by
+dequeueing from the queue.
+
+```python
+example = ...ops to create one example...
+# Create a queue, and an op that enqueues examples one at a time in the queue.
+queue = tf.RandomShuffleQueue(...)
+enqueue_op = queue.enqueue(example)
+# Create a training graph that starts by dequeuing a batch of examples.
+inputs = queue.dequeue_many(batch_size)
+train_op = ...use 'inputs' to build the training part of the graph...
+```
+
+In the Python training program, create a `QueueRunner` that will run a few
+threads to process and enqueue examples.  Create a `Coordinator` and ask the
+queue runner to start its threads with the coordinator.  Write a training loop
+that also uses the coordinator.
+
+```
+# Create a queue runner that will run 4 threads in parallel to enqueue
+# examples.
+qr = tf.train.QueueRunner(queue, [enqueue_op] * 4)
+
+# Launch the graph.
+sess = tf.Session()
+# Create a coordinator, launch the queue runner threads.
+coord = tf.train.Coordinator()
+enqueue_threads = qr.create_threads(sess, coord=coord, start=True)
+# Run the training loop, controlling termination with the coordinator.
+for step in xrange(1000000):
+    if coord.should_stop():
+        break
+    sess.run(train_op)
+# When done, ask the threads to stop.
+coord.request_stop()
+# And wait for them to actually do it.
+coord.join(threads)
+```
+
+## Handling Exceptions
+
+Threads started by queue runners do more than just run the enqueue ops.  They
+also catch and handle exceptions generated by queues, including
+`OutOfRangeError` which is used to report that a queue was closed.
+
+A training program that uses a coordinator must similarly catch and report
+exceptions in its main loop.
+
+Here is an improved version of the training loop above.
+
+```python
+try:
+    for step in xrange(1000000):
+        if coord.should_stop():
+            break
+        sess.run(train_op)
+except Exception, e:
+   # Report exceptions to the coordinator.
+   coord.request_stop(e)
+
+# Terminate as usual.  It is innocuous to request stop twice.
+coord.request_stop()
+coord.join(threads)
+```
diff --git a/tensorflow/g3doc/how_tos/using_gpu/index.md b/tensorflow/g3doc/how_tos/using_gpu/index.md
new file mode 100644
index 0000000000..c0bdc5a7cb
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/using_gpu/index.md
@@ -0,0 +1,174 @@
+# Using GPUs
+
+## Supported devices
+
+On a typical system, there are multiple computing devices. In TensorFlow, the
+supported device types are `CPU` and `GPU`.  They are represented as
+`strings`. For example:
+
+*  `"/cpu:0"`: The CPU of your machine.
+*  `"/gpu:0"`: The GPU of your machine, if you have one.
+*  `"/gpu:1"`: The second GPU of your machine, etc.
+
+If a TensorFlow operation has both CPU and GPU implementations, the
+GPU devices will be given priority when the operation is assigned to
+a device. For example, `matmul` has both CPU and GPU kernels.  On a
+system with devices `cpu:0` and `gpu:0`, `gpu:0` will be selected to run
+`matmul`.
+
+## Logging Device placement
+
+To find out which devices your operations and tensors are assigned to, create
+the session with `log_device_placement` configuration option set to `True`.
+
+```python
+# Creates a graph.
+a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+You should see the following output:
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
+id: 0000:05:00.0
+b: /job:localhost/replica:0/task:0/gpu:0
+a: /job:localhost/replica:0/task:0/gpu:0
+MatMul: /job:localhost/replica:0/task:0/gpu:0
+[[ 22.  28.]
+ [ 49.  64.]]
+
+```
+
+## Manual device placement
+
+If you would like a particular operation to run on a device of your
+choice instead of what's automatically selected for you, you can use
+`with tf.device` to create a device context such that all the operations
+within that context will have the same device assignment.
+
+```python
+# Creates a graph.
+with tf.device('/cpu:0'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+You will see that now `a` and `b` are assigned to `cpu:0`.
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
+id: 0000:05:00.0
+b: /job:localhost/replica:0/task:0/cpu:0
+a: /job:localhost/replica:0/task:0/cpu:0
+MatMul: /job:localhost/replica:0/task:0/gpu:0
+[[ 22.  28.]
+ [ 49.  64.]]
+```
+
+## Using a single GPU on a multi-GPU system
+
+If you have more than one GPU in your system, the GPU with the lowest ID will be
+selected by default. If you would like to run on a different GPU, you will need
+to specify the preference explicitly:
+
+```python
+# Creates a graph.
+with tf.device('/gpu:2'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+  c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+If the device you have specified does not exist, you will get
+`InvalidArgumentError`:
+
+```
+InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
+Could not satisfy explicit device specification '/gpu:2'
+   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
+   values: 1 2 3...>, _device="/gpu:2"]()]]
+```
+
+If you would like TensorFlow to automatically choose an existing and
+supported device to run the operations in case the specified one doesn't
+exist, you can set `allow_soft_placement` to `True` in the configuration
+option when creating the session.
+
+```python
+# Creates a graph.
+with tf.device('/gpu:2'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+  c = tf.matmul(a, b)
+# Creates a session with allow_soft_placement and log_device_placement set
+# to True.
+sess = tf.Session(config=tf.ConfigProto(
+      allow_soft_placement=True, log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+## Using multiple GPUs
+
+If you would like to run TensorFlow on multiple GPUs, you can construct your
+model in a multi-tower fashion where each tower is assigned to a different GPU.
+For example:
+
+```
+# Creates a graph.
+c = []
+for d in ['/gpu:2', '/gpu:3']:
+  with tf.device(d):
+    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
+    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
+    c.append(tf.matmul(a, b))
+with tf.device('/cpu:0'):
+  sum = tf.add_n(c)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(sum)
+```
+
+You will see the following output.
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
+id: 0000:02:00.0
+/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
+id: 0000:03:00.0
+/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
+id: 0000:83:00.0
+/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
+id: 0000:84:00.0
+Const_3: /job:localhost/replica:0/task:0/gpu:3
+Const_2: /job:localhost/replica:0/task:0/gpu:3
+MatMul_1: /job:localhost/replica:0/task:0/gpu:3
+Const_1: /job:localhost/replica:0/task:0/gpu:2
+Const: /job:localhost/replica:0/task:0/gpu:2
+MatMul: /job:localhost/replica:0/task:0/gpu:2
+AddN: /job:localhost/replica:0/task:0/cpu:0
+[[  44.   56.]
+ [  98.  128.]]
+```
+
+The [cifar10 tutorial](../../tutorials/deep_cnn/index.md) is a good example
+demonstrating how to do training with multiple GPUs.
diff --git a/tensorflow/g3doc/how_tos/variable_scope/index.md b/tensorflow/g3doc/how_tos/variable_scope/index.md
new file mode 100644
index 0000000000..f9221b207b
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/variable_scope/index.md
@@ -0,0 +1,372 @@
+# Sharing Variables
+
+You can create, initialize, save and load single variables
+in the way described in the [Variables HowTo](../variables/index.md).
+But when building complex models you often need to share large sets of
+variables and you might want to initialize all of them in one place.
+This tutorial shows how this can be done using `tf.variable_scope()` and
+the `tf.get_variable()`.
+
+## The Problem
+
+Imagine you create a simple model for image filters, similar to our
+[Convolutional Neural Networks Tutorial](../../tutorials/deep_cnn/index.md)
+model but with only 2 convolutions (for simplicity of this example). If you use
+just `tf.Variable`, as explained in [Variables HowTo](../variables/index.md),
+your model might look like this.
+
+```python
+def my_image_filter(input_images):
+    conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv1_weights")
+    conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
+    conv1 = tf.nn.conv2d(input_images, conv1_weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    relu1 = tf.nn.relu(conv1 + conv1_biases)
+
+    conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv2_weights")
+    conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
+    conv2 = tf.nn.conv2d(relu1, conv2_weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv2 + conv2_biases)
+```
+
+As you can easily imagine, models quickly get much more complicated than
+this one, and even here we already have 4 different variables: `conv1_weights`,
+`conv1_biases`, `conv2_weights`, and `conv2_biases`.
+
+The problem arises when you want to reuse this model. Assume you want to
+apply your image filter to 2 different images, `image1` and `image2`.
+You want both images processed by the same filer with the same parameters.
+You can call `my_image_filter()` twice, but this will create two sets
+of variables:
+
+```python
+# First call creates one set of variables.
+result1 = my_image_filter(image1)
+# Another set is created in the second call.
+result2 = my_image_filter(image2)
+```
+
+A common way to share variables is to create them in a separate piece of code
+and pass them to functions that use them.   For example by using a dictionary:
+
+```python
+variables_dict = {
+    "conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv1_weights")
+    "conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
+    ... etc. ...
+}
+
+def my_image_filter(input_images, variables_dict):
+    conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
+        strides=[1, 1, 1, 1], padding='SAME')
+    relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])
+
+    conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv2 + variables_dict["conv2_biases"])
+
+# The 2 calls to my_image_filter() now use the same variables
+result1 = my_image_filter(image1, variables_dict)
+result2 = my_image_filter(image2, variables_dict)
+```
+
+While convenient, creating variables like above,
+outside of the code, breaks encapsulation:
+
+*  The code that builds the graph must document the names, types,
+   and shapes of variables to create.
+*  When the code changes, the callers may have to create more, or less,
+   or different variables.
+
+One way to address the problem is to use classes to create a model,
+where the classes take care of managing the variables they need.
+For a lighter solution, not involving classes, TensorFlow provides
+a *Variable Scope* mechanism that allows to easily share named variables
+while constructing a graph.
+
+## Variable Scope Example
+
+Variable Scope mechanism in TensorFlow consists of 2 main functions:
+
+* `tf.get_variable(<name>, <shape>, <initializer>)`:
+  Creates or returns a variable with a given name.
+* `tf.variable_scope(<scope_name>)`:
+  Manages namespaces for names passed to `tf.get_variable()`.
+
+The function `tf.get_variable()` is used to get or create a variable instead
+of a direct call to `tf.Variable`. It uses an *initializer* instead of passing
+the value directly, as in `tf.Variable`. An initializer is a function that
+takes the shape and provides a tensor with that shape. Here are some
+initializers available in TensorFlow:
+
+* `tf.constant_initializer(value)` initializes everything to the provided value,
+* `tf.random_uniform_initializer(a, b)` initializes uniformly from [a, b],
+* `tf.random_normal_initializer(mean, stddev)` initializes from the normal
+  distribution with the given mean and standard deviation.
+
+To see how `tf.get_variable()` solves the problem discussed
+before, let's refactor the code that created one convolution into
+a separate function, named `conv_relu`:
+
+```python
+def conv_relu(input, kernel_shape, bias_shape):
+    # Create variable named "weights".
+    weights = tf.get_variable("weights", kernel_shape,
+        initializer=tf.random_normal_initializer())
+    # Create variable named "biases".
+    biases = tf.get_variable("biases", bias_shape,
+        initializer=tf.constant_intializer(0.0))
+    conv = tf.nn.conv2d(input, weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv + biases)
+```
+
+This function uses short names `"weights"` and `"biases"`.
+We'd like to use it for both `conv1` and `conv2`, but
+the variables need to have different names.
+This is where `tf.variable_scope()` comes into play:
+it pushes a namespace for variables.
+
+```python
+def my_image_filter(input_images):
+    with tf.variable_scope("conv1"):
+        # Variables created here will be named "conv1/weights", "conv1/biases".
+        relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
+    with tf.variable_scope("conv2"):
+        # Variables created here will be named "conv2/weights", "conv2/biases".
+        return conv_relu(relu1, [5, 5, 32, 32], [32])
+```
+
+Now, let's see what happens when we call `my_image_filter()` twice.
+
+```
+result1 = my_image_filter(image1)
+result2 = my_image_filter(image2)
+# Raises ValueError(... conv1/weights already exists ...)
+```
+
+As you can see, `tf.get_variable()` checks that already existing variables
+are not shared by accident. If you want to share them, you need to specify
+it by setting `reuse_variables()` as follows.
+
+```
+with tf.variable_scope("image_filters") as scope:
+    result1 = my_image_filter(image1)
+    scope.reuse_variables()
+    result2 = my_image_filter(image2)
+```
+
+This is a good way to share variables, lightweight and safe.
+
+## How Does Variable Scope Work?
+
+### Understanding `tf.get_variable()`
+
+To understand variable scope it is necessary to first
+fully understand how `tf.get_variable()` works.
+Here is how `tf.get_variable` is usually called.
+
+```python
+v = tf.get_variable(name, shape, dtype, initializer)
+```
+
+This call does one of two things depending on the scope it is called in.
+Here are the two options.
+
+* Case 1: the scope is set for creating new variables, as evidenced by
+`tf.get_variable_scope().reuse == False`.
+
+In this case, `v` will be a newly created `tf.Variable` with the provided
+shape and data type. The full name of the created variable will be set to
+the current variable scope name + the provided `name` and a check will be
+performed to ensure that no variable with this full name exists yet.
+If a variable with this full name already exists, the funtion will
+raise a `ValueError`. If a new variable is created, it will be
+initialized to the value `initializer(shape)`. For example:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+assert v.name == "foo/v:0"
+```
+
+* Case 2: the scope is set for reusing variables, as evidenced by
+`tf.get_variable_scope().reuse == True`.
+
+In this case, the call will search for an already existing variable with
+name equal to the current variable scope name + the provided `name`.
+If no such variable exists, a `ValueError` will be raised. If the variable
+is found, it will be returned. For example:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+with tf.variable_scope("foo", reuse=True):
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+### Basics of `tf.variable_scope()`
+
+Knowing how `tf.get_variable()` works makes it easy to understand variable
+scope. The primary function of variable scope is to carry a name that will
+be used as prefix for variable names and a reuse-flag to distinguish the two
+cases described above. Nesting variable scopes appends their names in a way
+analogous to how directories work:
+
+```python
+with tf.variable_scope("foo"):
+    with tf.variable_scope("bar"):
+        v = tf.get_variable("v", [1])
+assert v.name == "foo/bar/v:0"
+```
+
+The current variable scope can be retrieved using `tf.get_variable_scope()`
+and the `reuse` flag of the current variable scope can be set to `True` by
+calling `tf.get_variable_scope().reuse_variables()`:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+    tf.get_variable_scope().reuse_variables()
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+Note that you *cannot* set the `reuse` flag to `False`. The reason behind
+this is to allow to compose functions that create models. Imagine you write
+a function `my_image_filter(inputs)` as before. Someone calling the function
+in a variable scope with `reuse=True` would expect all inner variables to be
+reused as well. Allowing to force `reuse=False` inside the function would break
+this contract and make it hard to share parameters in this way.
+
+Even though you cannot set `reuse` to `False` explicitly, you can enter
+a reusing variable scope and then exit it, going back to a non-reusing one.
+This can be done using a `reuse=True` parameter when opening a variable scope.
+Note also that, for the same reason as above, the `reuse` parameter is
+inherited. So when you open a reusing variable scope, all sub-scopes will
+be reusing too.
+
+```python
+with tf.variable_scope("root"):
+    # At start, the scope is not reusing.
+    assert tf.get_variable_scope().reuse == False
+    with tf.variable_scope("foo"):
+        # Opened a sub-scope, still not reusing.
+        assert tf.get_variable_scope().reuse == False
+    with tf.variable_scope("foo", reuse=True):
+        # Explicitly opened a reusing scope.
+        assert tf.get_variable_scope().reuse == True
+        with tf.variable_scope("bar"):
+            # Now sub-scope inherits the reuse flag.
+            assert tf.get_variable_scope().reuse == True
+    # Exited the reusing scope, back to a non-reusing one.
+    assert tf.get_variable_scope().reuse == False
+```
+
+### Capturing variable scope
+
+In all examples presented above, we shared parameters only because their
+names agreed, that is, because we opened a reusing variable scope with
+exactly the same string. In more complex cases, it might be useful to pass
+a VariableScope object rather than rely on getting the names right.
+To this end, variable scopes can be captured and used instead of names
+when opening a new variable scope.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+    v = tf.get_variable("v", [1])
+with tf.variable_scope(foo_scope)
+    w = tf.get_variable("w", [1])
+with tf.variable_scope(foo_scope, reuse=True)
+    v1 = tf.get_variable("v", [1])
+    w1 = tf.get_variable("w", [1])
+assert v1 == v
+assert w1 == w
+```
+
+When opening a variable scope using a previously existing scope
+we jump out of the current variable scope prefix to an entirely
+different one. This is fully independent of where we do it.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+    assert foo_scope.name == "foo"
+with tf.variable_scope("bar")
+    with tf.variable_scope("baz") as other_scope:
+        assert other_scope.name == "bar/baz"
+        with tf.variable_scope(foo_scope) as foo_scope2:
+            assert foo_scope2.name == "foo"  # Not changed.
+```
+
+### Initializers in variable scope
+
+Using `tf.get_variable()` allows to write functions that create or reuse
+variables and can be transparently called from outside. But what if we wanted
+to change the initializer of the created variables? Do we need to pass an extra
+argument to every function that creates variables? What about the most common
+case, when we want to set the default initializer for all variables in one
+place, on top of all functions? To help with these cases, variable scope
+can carry a default initializer. It is inherited by sub-scopes and passed
+to each `tf.get_variable()` call. But it will be overridden if another
+initializer is specified explicitly.
+
+```python
+with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
+    v = tf.get_variable("v", [1])
+    assert v.eval() == 0.4  # Default initializer as set above.
+    w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
+    assert w.eval() == 0.3  # Specific initializer overrides the default.
+    with tf.variable_scope("bar"):
+        v = get_variable("v", [1])
+        assert v.eval() == 0.4  # Inherited default initializer.
+    with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
+        v = get_variable("v", [1])
+        assert v.eval() == 0.2  # Changed default initializer.
+```
+
+### Names of ops in `tf.variable_scope()`
+
+We discussed how `tf.variable_scope` governs the names of variables.
+But how does it influence the names of other ops in the scope?
+It is natural that ops created inside a variable scope should also
+share that name. For this reason, when we do `with tf.variable_scope("name")`,
+this implicitly opens a `tf.name_scope("name")`. For example:
+
+```python
+with tf.variable_scope("foo"):
+    x = 1.0 + tf.get_variable("v", [1])
+assert x.op.name == "foo/add"
+```
+
+Name scopes can be openend in addition to a variable scope, and then
+they will only affect the names of the ops, but not of variables.
+
+```python
+with tf.variable_scope("foo"):
+    with tf.name_scope("bar"):
+        v = tf.get_variable("v", [1])
+        x = 1.0 + v
+assert v.name == "foo/v:0"
+assert x.op.name == "foo/bar/add"
+```
+
+When opening a variable scope using a captured object instead of a string,
+we do not alter the current name scope for ops.
+
+
+## Examples of Use
+
+Here are pointers to a few files that make use of variable scope.
+In particular, it is heavily used for recurrent neural networks
+and sequence-to-sequence models.
+
+File | What's in it?
+--- | ---
+`models/image/cifar10.py` | Model for detecting objects in images.
+`models/rnn/rnn_cell.py` | Cell functions for recurrent neural networks.
+`models/rnn/seq2seq.py` | Functions for building sequence-to-sequence models.
diff --git a/tensorflow/g3doc/how_tos/variables/index.md b/tensorflow/g3doc/how_tos/variables/index.md
new file mode 100644
index 0000000000..4ad8f8a266
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/variables/index.md
@@ -0,0 +1,215 @@
+# Variables: Creation, Initialization, Saving, and Loading
+
+When you train a model, you use [Variables](../../api_docs/python/state_ops.md)
+to hold and update parameters.  Variables are in-memory buffers containing
+tensors.  They need to be explicitly initialized and can be saved to disk during
+and after training. You can later restore saved values to exercise or analyse
+the model.
+
+This document references the following TensorFlow classes.  Follow the links to
+their reference manual for a complete description of their API:
+
+*  The `Variable` class [tf.Variable](../../api_docs/python/state_ops.md#Variable).
+*  The `Saver` class [tf.train.Saver](../../api_docs/python/state_ops.md#Saver).
+
+
+## Creation
+
+When you create a [Variable](../../api_docs/python/state_ops.md) you pass a
+`Tensor` as its initial value to the `Variable()` constructor.  TensorFlow
+provides a collection of Ops that produce tensors often used for initialization
+from [constants or random values](../../api_docs/python/constant_op.md).
+
+Note that all these Ops require you to specify the shape of the tensors.  That
+shape automatically becomes the shape of the variable.  Variables generally
+have a fixed shape, but TensorFlow provides advanced mechanisms to reshape
+variables.
+
+```python
+# Create two variables.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+biases = tf.Variable(tf.zeros([200]), name="biases")
+```
+
+Calling `tf.Variable()` adds a few Ops to the graph:
+
+*  A `variable` Op that holds the variable value.
+*  An initializer Op that sets the variable to its initial value.  This is
+   actually a `tf.assign` Op.
+*  The Ops for the initial value, such as the `zeros` Op for the `biases`
+   variable in the example are also added to the graph.
+
+The value returned by `tf.Variable()` value is an instance of the Python class
+`tf.Variable`.
+
+## Initialization
+
+Variable initializers must be run explicitly before other Ops in your model can
+be run.  The easiest way to do that is to add an Op that runs all the variable
+initializers, and run that Op before using the model.
+
+You can alternatively restore variable values from a checkpoint file, see
+below.
+
+Use `tf.initialize_all_variables()` to add an Op to run variable initializers.
+Only run that Op after you have fully constructed your model and launched it in
+a session.
+
+```python
+# Create two variables.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+biases = tf.Variable(tf.zeros([200]), name="biases")
+...
+# Add an Op to initialize the variables.
+init_op = tf.initialize_all_variables()
+
+# Later, when launching the model
+with tf.Session() as sess:
+  # Run the init operation.
+  sess.Run(init_op)
+  ...
+  # Use the model
+  ...
+```
+
+### Initialization from another Variable
+
+You sometimes need to initialize a variable from the initial value of another
+variable.  As the Op added by `tf.initialize_all_variables()` initializes all
+variables in parallel you have to be careful when this is needed.
+
+To initialize a new variable from the value of another variable use the other
+variable's `initialized_value()` property.  You can use the initialized value
+directly as the initial value for the new variable, or you can use it as any
+other tensor to compute a value for the new variable.
+
+
+```python
+# Create a variable with a random value.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+# Create another variable with the same value as 'weights'.
+w2 = tf.Variable(weights.initialized_value(), name="w2")
+# Create another variable with twice the value of 'weights'
+w_twice = tf.Variable(weights.initialized_value() * 0.2, name="w_twice")
+```
+
+### Custom Initialization
+
+The convenience function `tf.initialize_all_variables()` adds an Op to
+initialize *all variables* in the model.  You can also pass it an explicit list
+of variables to initialize.  See the
+[Variables Documentation](../../api_docs/python/state_op.md) for more options,
+including checking if variables are initialized.
+
+## Saving and Restoring
+
+The easiest way to save and restore a model is to use a `tf.train.Saver`
+object.  The constructor adds `save` and `restore` Ops to the graph for all, or
+a specified list, of variables.  The saver object provides methods to run these
+Ops, specifying paths for the checkpoint files to write to or read from.
+
+### Checkpoint Files
+
+Variables are saved in binary files that, roughly, contains a map from variable
+names to tensors.
+
+When you create a `Saver` object, you can optionally chose names for the
+variables in the checkpoint files.  By default, it uses the names passed to the
+`tf.Variable()` call.
+
+### Saving Variables
+
+Create a `Saver` with `tf.train.Saver()` to manage all variables in
+the model.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add an Op to initialize the variables.
+init_op = tf.initialize_all_variables()
+
+# Add Ops to save and restore all the variables.
+saver = tf.train.Saver()
+
+# Later, launch the model, initialize the variables, do some work, save the
+# variables to disk.
+with tf.Session() as sess:
+  sess.Run(init_op)
+  # Do some work with the model.
+  ..
+  # Save the variables to disk.
+  save_path = saver.Save(sess, "/tmp/model.ckpt")
+  print "Model saved in file: ", save_path
+```
+
+### Restoring Variables
+
+The same `Saver` object is used to restore variables.  Note that when you
+restore variables form a file you do not have to initialize them beforehand.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add Ops to save and restore all the variables.
+saver = tf.train.Saver()
+
+# Later, launch the model, use the saver to restore variables from disk, and
+# do some work with the model.
+with tf.Session() as sess:
+  # Restore variables from disk.
+  saver.Restore(sess, "/tmp/model.ckpt")
+  print "Model restored."
+  # Do some work with the model
+  ...
+```
+
+### Chosing which Variables to Save and Restore
+
+If you do not pass any argument to `tf.train.Saver()` the saver
+handles all variables.  Each one of them is saved under the name that was
+passed when the variable was created.
+
+It is sometimes useful to explicitly specify names for variables in the
+checkpoint files.  For example, you may have trained a model with a variable
+named `"weights"` whose value you want to restore in a new variable named
+`"params"`.
+
+It is also sometimes useful to only save or restore a subset of the variables
+used by a model.  For example, you may have trained a neural net with 5 layers,
+and you now want to train a new model with 6 layers, restoring the parameters
+from the 5 layers of the previously trained model into the first 5 layers of
+the new model.
+
+You can easily specify the names and variables to save by passing to the
+`tf.train.Saver()` constructor a Python dictionary: keys are the
+names to use, values are the variables to manage.
+
+Notes:
+
+*  You can create as many saver objects as you want if you need to save and
+   restore different subsets of the model variables.  The same variable can be
+   listed in multiple saver objects, its value is only changed when the saver
+   `Restore()` method is run.
+
+*  If you only restore a subset of the model variables at the start
+   of a session, you have to run an initialize Op for the other variables.  See
+   [`tf.initialize_variables()`](../../api_docs/python/state_ops.md#initialize_variables)
+   for more information.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add Ops to save and restore only 'v2' using the name "my_v2"
+saver = tf.train.Saver({"my_v2": v2})
+# Use the saver object normally after that.
+...
+```