1 files changed, 1015 insertions, 0 deletions
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/index.md b/tensorflow/g3doc/how_tos/adding_an_op/index.md
new file mode 100644
index 0000000000..5c6243cd9c
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/index.md
@@ -0,0 +1,1015 @@
+# Adding a New Op to TensorFlow
+
+PREREQUISITES:
+
+* Some familiarity with C++.
+* Must have [downloaded TensorFlow source](../../get_started/index.md#source),
+  and be able to build it.
+
+If you'd like to incorporate an operation that isn't covered by the existing
+library, you can create a custom Op. To incorporate your custom Op, you'll need
+to:
+
+* Register the new Op in a C++ file. The Op registration is independent of the
+  implementation, and describes the semantics of how the Op is invoked. For
+  example, it defines the Op name, and specifies its inputs and outputs.
+* Implement the Op in C++. This implementation is called a "kernel", and there
+  can be multiple kernels for different architectures (e.g. CPUs, GPUs) or
+  input / output types.
+* Create a Python wrapper. This wrapper is the public API to create the Op. A
+  default wrapper is generated from the Op registration, which can be used
+  directly or added to.
+* Optionally, write a function to compute gradients for the Op.
+* Optionally, write a function that describes the input and output shapes
+  for the Op.  This allows shape inference to work with your Op.
+* Test the Op, typically in Python. If you define gradients, verify them with
+  the Python [`GradientChecker`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/kernel_tests/gradient_checker.py).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Define the Op's interface](#define_interface)
+* [Implement the kernel for the Op](#AUTOGENERATED-implement-the-kernel-for-the-op)
+* [Generate the client wrapper](#AUTOGENERATED-generate-the-client-wrapper)
+  * [The Python Op wrapper](#AUTOGENERATED-the-python-op-wrapper)
+  * [The C++ Op wrapper](#AUTOGENERATED-the-c---op-wrapper)
+* [Verify it works](#AUTOGENERATED-verify-it-works)
+* [Validation](#validation)
+* [Op registration](#AUTOGENERATED-op-registration)
+  * [Attrs](#AUTOGENERATED-attrs)
+  * [Attr types](#AUTOGENERATED-attr-types)
+  * [Polymorphism](#polymorphism)
+  * [Inputs and Outputs](#AUTOGENERATED-inputs-and-outputs)
+  * [Backwards compatibility](#AUTOGENERATED-backwards-compatibility)
+* [GPU Support](#mult-archs)
+* [Implement the gradient in Python](#AUTOGENERATED-implement-the-gradient-in-python)
+* [Implement a shape function in Python](#AUTOGENERATED-implement-a-shape-function-in-python)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Define the Op's interface <div class="md-anchor" id="define_interface">{#define_interface}</div>
+
+You define the interface of an Op by registering it with the TensorFlow system.
+In the registration, you specify the name of your Op, its inputs (types and
+names) and outputs (types and names), as well as [docstrings](#docstrings) and
+any [attrs](#attrs) the Op might require.
+
+To see how this works, suppose you'd like to create an Op that takes a tensor of
+`int32`s and outputs a copy of the tensor, with all but the first element set to
+zero. Create file [`tensorflow/core/user_ops`][user_ops]`/zero_out.cc` and
+add a call to the `REGISTER_OP` macro that defines the interface for such an Op:
+
+```c++
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("ZeroOut")
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+```
+
+This `ZeroOut` Op takes one tensor `to_zero` of 32-bit integers as input, and
+outputs a tensor `zeroed` of 32-bit integers.
+
+> A note on naming: The name of the Op should be unique and CamelCase.  Names
+> starting with an underscore (`_`) are reserved for internal use.
+
+## Implement the kernel for the Op <div class="md-anchor" id="AUTOGENERATED-implement-the-kernel-for-the-op">{#AUTOGENERATED-implement-the-kernel-for-the-op}</div>
+
+After you define the interface, provide one or more implementations of the Op.
+To create one of these kernels, create a class that extends `OpKernel` and
+overrides the `Compute` method. The `Compute` method provides one `context`
+argument of type `OpKernelContext*`, from which you can access useful things
+like the input and output tensors.
+
+Add your kernel to the file you created above. The kernel might look something
+like this:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<int32>();
+
+    // Create an output tensor
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+                                                     &output_tensor));
+    auto output = output_tensor->template flat<int32>();
+
+    // Set all but the first element of the output tensor to 0.
+    const int N = input.size();
+    for (int i = 1; i < N; i++) {
+      output(i) = 0;
+    }
+
+    // Preserve the first input value if possible.
+    if (N > 0) output(0) = input(0);
+  }
+};
+```
+
+After implementing your kernel, you register it with the TensorFlow system. In
+the registration, you specify different constraints under which this kernel
+will run. For example, you might have one kernel made for CPUs, and a separate
+one for GPUs.
+
+To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
+
+```c++
+REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
+```
+
+TODO: instructions or pointer to building TF
+
+At this point, the Tensorflow system can reference and use the Op when
+requested.
+
+## Generate the client wrapper <div class="md-anchor" id="AUTOGENERATED-generate-the-client-wrapper">{#AUTOGENERATED-generate-the-client-wrapper}</div>
+### The Python Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-python-op-wrapper">{#AUTOGENERATED-the-python-op-wrapper}</div>
+
+Python op wrappers are created automatically in
+`bazel-genfiles/tensorflow/python/ops/gen_user_ops.py` for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory when you build Tensorflow.
+Those ops are imported into
+[`tensorflow/python/user_ops/user_ops.py`][python-user_ops] with the statement:
+
+```python
+from tensorflow.python.ops.gen_user_ops import *
+```
+
+You may optionally use your own function instead.  To do this, you first hide
+the generated code for that op by adding its name to the `hidden` list in the
+`"user_ops"` rule in
+[`tensorflow/python/BUILD`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD):
+
+```python
+tf_gen_op_wrapper_py(
+    name = "user_ops",
+    hidden = [
+        "Fact",
+    ],
+    require_shape_functions = False,
+)
+```
+
+List your op next to `"Fact"`.  Next you add your replacement function to
+[`tensorflow/python/user_ops/user_ops.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py).
+Typically your function will call the generated function to actually add the op
+to the graph.  The hidden version of the generated function will be in the
+`gen_user_ops` package and start with an underscore ("`_`").  For example:
+
+```python
+def my_fact():
+    """Example of overriding the generated code for an Op."""
+    return gen_user_ops._fact()
+```
+
+### The C++ Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-c---op-wrapper">{#AUTOGENERATED-the-c---op-wrapper}</div>
+
+C++ op wrappers are created automatically for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory, when you build Tensorflow. For
+example, ops in `tensorflow/core/user_ops/zero_out.cc` will generate wrappers in
+`bazel-genfiles/tensorflow/cc/ops/user_ops.{h,cc}`.
+
+All generated wrappers for user ops are automatically
+imported into [`tensorflow/cc/ops/standard_ops.h`][standard_ops-cc] with the
+statement
+
+```c++
+#include "tensorflow/cc/ops/user_ops.h"
+```
+
+## Verify it works <div class="md-anchor" id="AUTOGENERATED-verify-it-works">{#AUTOGENERATED-verify-it-works}</div>
+
+A good way to verify that you've successfully implemented your Op is to write a
+test for it. Create the file
+`tensorflow/python/kernel_tests/zero_out_op_test.py` with the contents:
+[TODO]:# (put tests somewhere else and make sure it works)
+
+```python
+import tensorflow as tf
+
+
+class ZeroOutTest(tf.test.TestCase):
+  def testZeroOut(self):
+    with self.test_session():
+      result = tf.user_ops.zero_out([5, 4, 3, 2, 1])
+      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+```
+
+Then run your test:
+
+```sh
+$ bazel test tensorflow/python:zero_out_op_test
+```
+
+## Validation <div class="md-anchor" id="validation">{#validation}</div>
+
+The example above assumed that the Op applied to a tensor of any shape.  What
+if it only applied to vectors?  That means adding a check to the above OpKernel
+implementation.
+
+```c++
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_tensor.shape()),
+                errors::InvalidArgument("ZeroOut expects a 1-D vector."));
+    // ...
+  }
+```
+
+This asserts that the input is a vector, and returns having set the
+`InvalidArgument` status if it isn't.  The
+[OP_REQUIRES macro][validation-macros] takes three arguments:
+
+*   The `context`, which can either be an `OpKernelContext` or
+    `OpKernelConstruction` pointer (see
+    [`tensorflow/core/framework/op_kernel.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_kernel.h)),
+    for its `SetStatus()` method.
+*   The condition.  For example, there are functions for validating the shape
+    of a tensor in [`tensorflow/core/public/tensor_shape.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/tensor_shape.h) 
+*   The error itself, which is represented by a `Status` object, see
+    [`tensorflow/core/public/status.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/status.h). A
+    `Status` has both a type (frequently `InvalidArgument`, but see the list of
+    types) and a message.  Functions for constructing an error may be found in
+    [`tensorflow/core/lib/core/errors.h`][validation-macros].
+
+Alternatively, if you want to test whether a `Status` object returned from some
+function is an error, and if so return it, use
+[`OP_REQUIRES_OK`][validation-macros].  Both of these macros return from the
+function on error.
+
+## Op registration <div class="md-anchor" id="AUTOGENERATED-op-registration">{#AUTOGENERATED-op-registration}</div>
+
+### Attrs <div class="md-anchor" id="AUTOGENERATED-attrs">{#AUTOGENERATED-attrs}</div>
+
+Ops can have attrs, whose values are set when the Op is added to a graph. These
+are used to configure the Op, and their values can be accessed both within the
+kernel implementation and in the types of inputs and outputs in the Op
+registration. Prefer using an input instead of an attr when possible, since
+inputs are more flexible.  They can change every step, be set using a feed, etc.
+Attrs are used for things that can't be done with inputs: any configuration
+that affects the signature (number or type of inputs or outputs) or that
+can't change from step-to-step.
+
+You define an attr when you register the Op, by specifying its name and type
+using the `Attr` method, which expects a spec of the form:
+
+```
+<name>: <attr-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores, and `<attr-type-expr>` is a type expression of the
+form [described below](#attr-types)
+
+For example, if you'd like the `ZeroOut` Op to preserve a user-specified index,
+instead of only the 0th element, you can register the Op like so:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("preserve_index: int")</b>
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+</pre></code>
+
+Your kernel can then access this attr in its constructor via the `context`
+parameter:
+
+<code class="lang-c++"><pre>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {<b>
+    // Get the index of the value to preserve
+    OP_REQUIRES_OK(context->GetAttr("preserve\_index", &preserve\_index\_));
+  </b>}
+  void Compute(OpKernelContext\* context) override {
+    // ...
+  }
+ <b>private:
+  int preserve\_index\_;</b>
+}
+</pre></code>
+
+which can then be used in the `Compute` method:
+
+<code class="lang-c++"><pre>
+  void Compute(OpKernelContext\* context) override {
+    // ...
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i=0; i < N; i++) {
+      output\_flat(i) = 0;
+    }<br>
+    <b>// Preserve the requested input value
+    output\_flat(preserve\_index\_) = input(preserve\_index\_);</b>
+  }
+</pre></code>
+
+[TODO]:# (check the code in this section in and test it)
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>     <b>.Attr("preserve\_index: int = 0")</b>
+>     .Input("to_zero: int32")
+>     .Output("zeroed: int32");
+> </pre></code>
+
+### Attr types <div class="md-anchor" id="AUTOGENERATED-attr-types">{#AUTOGENERATED-attr-types}</div>
+
+The following types are supported in an attr:
+
+* `string`: Any sequence of bytes (not required to be UTF8).
+* `int`: A signed integer.
+* `float`: A floating point number.
+* `bool`: True or false.
+* `type`: One of the (non-ref) values of [`DataType`][DataTypeString].
+* `shape`: A [`TensorShapeProto`][TensorShapeProto].
+* `tensor`: A [`TensorProto`][TensorProto].
+* `list(<type>)`: A list of `<type>`, where `<type>` is one of the above types.
+  Note that `list(list(<type>))` is invalid.
+
+See also: [op_def_builder.cc:FinalizeAttr][FinalizeAttr] for a definitive list.
+
+#### Default values & constraints
+
+Attrs may have default values, and some types of attrs can have constraints. To
+define an attr with constraints, you can use the following `<attr-type-expr>`s:
+
+* `{'<string1>', '<string2>'}`: The value must be a string that has either the
+  value `<string1>` or `<string2>`.  The name of the type, `string`, is implied
+  when you use this syntax.  This emulates an enum:
+
+  ```c++
+  REGISTER_OP("EnumExample")
+      .Attr("e: {'apple', 'orange'}");
+  ```
+
+* `{<type1>, <type2>}`: The value is of type `type`, and must be one of
+  `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
+  [tensor types](../../resources/dims_types.md#data-types).  You don't specify
+  that the type of the attr is `type`. This is implied when you have a list of
+  types in `{...}`.  For example, in this case the attr `t` is a type that must
+  be an `int32`, a `float`, or a `bool`:
+
+  ```c++
+  REGISTER_OP("RestrictedTypeExample")
+      .Attr("t: {int32, float, bool}");
+  ```
+
+* There are shortcuts for common type constraints:
+    * `numbertype`: Type `type` restricted to the numeric (non-string and
+      non-bool) types.
+    * `realnumbertype`: Like `numbertype` without complex types.
+    * `quantizedtype`: Like `numbertype` but just the quantized number types.
+
+    The specific lists of types allowed by these are defined by the functions
+    (like `NumberTypes()`) in
+    [`tensorflow/core/framework/types.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.h).
+    In this example the attr `t` must be one of the numeric types:
+
+    ```c++
+    REGISTER_OP("NumberType")
+        .Attr("t: numbertype");
+    ```
+
+    For this op:
+
+    ```python
+    tf.number_type(t=tf.int32)  # Valid
+    tf.number_type(t=tf.bool)   # Invalid
+    ```
+
+* `int >= <n>`: The value must be an int whose value is greater than or equal to
+  `<n>`, where `<n>` is a natural number.
+
+  For example, the following Op registration specifies that the attr `a` must
+  have a value that is at least `2`:
+
+  ```c++
+  REGISTER_OP("MinIntExample")
+      .Attr("a: int >= 2");
+  ```
+
+* `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
+  or equal to `<n>`.
+
+  For example, the following Op registration specifies that the attr `a` is a
+  list of types (either `int32` or `float`), and that there must be at least 3
+  of them:
+
+  ```c++
+  REGISTER_OP("TypeListExample")
+      .Attr("a: list({int32, float}) >= 3");
+  ```
+
+To set a default value for an attr (making it optional in the generated code),
+add `= <default>` to the end, as in:
+
+```c++
+REGISTER_OP("AttrDefaultExample")
+    .Attr("i: int = 0");
+```
+
+The supported syntax of the default value is what would be used in the proto
+representation of the resulting GraphDef definition.
+
+Here are examples for how to specify a default for all types:
+
+```c++
+REGISTER_OP("AttrDefaultExampleForAllTypes")
+   .Attr("s: string = 'foo'")
+   .Attr("i: int = 0")
+   .Attr("f: float = 1.0")
+   .Attr("b: bool = true")
+   .Attr("ty: type = DT_INT32")
+   .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+   .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+   .Attr("l_empty: list(int) = []")
+   .Attr("l_int: list(int) = [2, 3, 5, 7]");
+```
+
+Note in particular that the values of type `type` use [the `DT_*` names
+for the types](../../resources/dims_types.md#data-types).
+
+### Polymorphism <div class="md-anchor" id="polymorphism">{#polymorphism}</div>
+#### Type Polymorphism {#type-polymorphism}
+
+For ops that can take different types as input or produce different output
+types, you can specify [an attr](#attrs) in
+[an input or output type](#inputs-outputs) in the Op registration.  Typically
+you would then register an `OpKernel` for each supported type.
+
+For instance, if you'd like the `ZeroOut` Op to work on `float`s
+in addition to `int32`s, your Op registration might look like:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Your Op registration now specifies that the input's type must be `float`, or
+`int32`, and that its output will be the same type, since both have type `T`.
+
+> A note on naming:{#naming} Inputs, outputs, and attrs generally should be
+> given snake_case names.  The one exception is attrs that are used as the type
+> of an input or in the type of an input. Those attrs can be inferred when the
+> op is added to the graph and so don't appear in the op's function.  For
+> example, this last definition of ZeroOut will generate a Python function that
+> looks like:
+>
+> ```python
+> def zero_out(to_zero, name=None):
+>   """...
+>   Args:
+>     to_zero: A `Tensor`. Must be one of the following types:
+>         `float32`, `int32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor`. Has the same type as `x`.
+>   """
+> ```
+>
+> If `to_zero` is passed an `int32` tensor, then `T` is automatically set to
+> `int32` (well, actually `DT_INT32`). Those inferred attrs are given
+> Capitalized or CamelCase names.
+>
+> Compare this with an op that has a type attr that determines the output
+> type:
+>
+> ```c++
+> REGISTER_OP("StringToNumber")
+>     .Input("string_tensor: string")
+>     .Output("output: out_type")
+>     .Attr("out_type: {float, int32}");
+>     .Doc(R"doc(
+> Converts each string in the input Tensor to the specified numeric type.
+> )doc");
+> ```
+>
+> In this case, the user has to specify the output type, as in the generated
+> Python:
+>
+> ```python
+> def string_to_number(string_tensor, out_type=None, name=None):
+>   """Converts each string in the input Tensor to the specified numeric type.
+>
+>   Args:
+>     string_tensor: A `Tensor` of type `string`.
+>     out_type: An optional `tf.DType` from: `tf.float32, tf.int32`.
+>       Defaults to `tf.float32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor` of type `out_type`.
+>   """
+> ```
+
+<code class="lang-c++"><pre>
+\#include "tensorflow/core/framework/op_kernel.h"<br/>
+class ZeroOut<b>Int32</b>Op : public OpKernel {
+  // as before
+};<br/>
+class ZeroOut<b>Float</b>Op : public OpKernel {
+ public:
+  explicit ZeroOut<b>Float</b>Op(OpKernelConstruction\* context)
+      : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat&lt;<b>float</b>&gt;();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat&lt;<b>float</b>&gt;();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/><b>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    <b>.TypeConstraint&lt;int32&gt;("T"),</b>
+    ZeroOutOp<b>Int32</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    ZeroOutFloatOp);
+</b></pre></code>
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>   <b>.Attr("T: {float, int32} = DT_INT32")</b>
+>   .Input("to_zero: T")
+>   .Output("zeroed: T")
+> </pre></code>
+
+Lets say you wanted to add more types, say `double`:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, <b>double,</b> int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Instead of writing another `OpKernel` with redundant code as above, often you
+will be able to use a C++ template instead.  You will still have one kernel
+registration (`REGISTER\_KERNEL\_BUILDER` call) per overload.
+
+<code class="lang-c++"><pre>
+<b>template &lt;typename T&gt;</b>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat<b>&lt;T&gt;</b>();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat<b>&lt;T&gt;</b>();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;int32&gt;("T"),
+    <b>ZeroOutOp&lt;int32&gt;</b>);
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    <b>ZeroOutOp&lt;float&gt;</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;double&gt;("T"),
+    ZeroOutOp&lt;double&gt;);
+</b></pre></code>
+
+If you have more than a couple overloads, you can put the registration in a
+macro.
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+
+#undef REGISTER_KERNEL
+```
+
+Depending on the list of types you are registering the kernel for, you may be
+able to use a macro provided by
+[`tensorflow/core/framework/register_types.h`][register_types]:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+
+REGISTER_OP("ZeroOut")
+    .Attr("T: realnumbertypes")
+    .Input("to_zero: T")
+    .Output("zeroed: T");
+
+template <typename T>
+class ZeroOutOp : public OpKernel { ... };
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
+
+#undef REGISTER_KERNEL
+```
+
+#### List Inputs and Outputs {#list-input-output}
+
+In addition to being able to accept or produce different types, ops can consume
+or produce a variable number of tensors.
+
+In the next example, the attr `T` holds a *list* of types, and is used as the
+type of both the input `in` and the output `out`.  The input and output are
+lists of tensors of that type (and the number and types of tensors in the output
+are the same as the input, since both have type `T`).
+
+```c++
+REGISTER_OP("PolymorphicListExample")
+    .Attr("T: list(type)")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+You can also place restrictions on what types can be specified in the list. In
+this next case, the input is a list of `float` and `double` tensors. The Op
+accepts, for example, input types `(float, double, float)` and in that case the
+output type would also be `(float, double, float)`.
+
+```c++
+REGISTER_OP("ListTypeRestrictionExample")
+    .Attr("T: list({float, double})")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+If you want all the tensors in a list to be of the same type, you might do
+something like:
+
+```c++
+REGISTER_OP("IntListInputExample")
+    .Attr("N: int")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+This accepts a list of `int32` tensors, and uses an `int` attr `N` to
+specify the length of the list.
+
+This can be made [type polymorphic](#type-polymorphism) as well.  In the next
+example, the input is a list of tensors (with length `"N"`) of the same (but
+unspecified) type (`"T"`), and the output is a single tensor of matching type:
+
+```c++
+REGISTER_OP("SameListInputExample")
+    .Attr("N: int")
+    .Attr("T: type")
+    .Input("in: N * T")
+    .Output("out: T");
+```
+
+By default, tensor lists have a minimum length of 1. You can change that default
+using
+[a `">="` constraint on the corresponding attr](#default-values-constraints).
+In this next example, the input is a list of at least 2 `int32` tensors:
+
+```c++
+REGISTER_OP("MinLengthIntListExample")
+    .Attr("N: int >= 2")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+The same syntax works with `"list(type)"` attrs:
+
+```c++
+REGISTER_OP("MinimumLengthPolymorphicListExample")
+    .Attr("T: list(type) >= 3")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+### Inputs and Outputs <div class="md-anchor" id="AUTOGENERATED-inputs-and-outputs">{#AUTOGENERATED-inputs-and-outputs}</div>
+
+To summarize the above, an Op registration can have multiple inputs and outputs:
+
+```c++
+REGISTER_OP("MultipleInsAndOuts")
+    .Input("y: int32")
+    .Input("z: float")
+    .Output("a: string")
+    .Output("b: int32");
+```
+
+Each input or output spec is of the form:
+
+```
+<name>: <io-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores. `<io-type-expr>` is one of the following type
+expressions:
+
+* `<type>`, where `<type>` is a supported input type (e.g. `float`, `int32`,
+  `string`). This specifies a single tensor of the given type.
+
+  See
+  [the list of supported Tensor types](../../resources/dims_types.md#data-types).
+
+  ```c++
+  REGISTER_OP("BuiltInTypesExample")
+      .Input("integers: int32")
+      .Input("complex_numbers: scomplex64");
+  ```
+
+* `<attr-type>`, where `<attr-type>` is the name of an [Attr](#attrs) with type
+  `type` or `list(type)` (with a possible type restriction). This syntax allows
+  for [polymorphic ops](#polymorphism).
+
+  ```c++
+  REGISTER_OP("PolymorphicSingleInput")
+      .Attr("T: type")
+      .Input("in: T);
+
+  REGISTER_OP("RestrictedPolymorphicSingleInput")
+      .Attr("T: {int32, int64}")
+      .Input("in: T);
+  ```
+
+  Referencing an attr of type `list(type)` allows you to accept a sequence of
+  tensors.
+
+  ```c++
+  REGISTER_OP("ArbitraryTensorSequenceExample")
+      .Attr("T: list(type)")
+      .Input("in: T")
+      .Output("out: T");
+
+  REGISTER_OP("RestrictedTensorSequenceExample")
+      .Attr("T: list({int32, int64})")
+      .Input("in: T")
+      .Output("out: T");
+  ```
+
+  Note that the number and types of tensors in the output `out` is the same as
+  in the input `in`, since both are of type `T`.
+
+* For a sequence of tensors with the same type: `<number> * <type>`, where
+  `<number>` is the name of an [Attr](#attrs) with type `int`.  The `<type>` can
+  either be
+  [a specific type like `int32` or `float`](../../resources/dims_types.md#data-types),
+  or the name of an attr with type `type`.  As an example of the first, this
+  Op accepts a list of `int32` tensors:
+
+  ```c++
+  REGISTER_OP("Int32SequenceExample")
+      .Attr("NumTensors: int")
+      .Input("in: NumTensors * int32")
+  ```
+
+  Whereas this Op accepts a list of tensors of any type, as long as they are all
+  the same:
+
+  ```c++
+  REGISTER_OP("SameTypeSequenceExample")
+      .Attr("NumTensors: int")
+      .Attr("T: type")
+      .Input("in: NumTensors * T")
+  ```
+
+* For a reference to a tensor: `Ref(<type>)`, where `<type>` is one of the
+  previous types.
+
+> A note on naming: Any attr used in the type of an input will be inferred.  By
+> convention those inferred attrs use capital names (like `T` or `N`).
+> Otherwise inputs, outputs, and attrs have names like function parameters
+> (e.g. `num_outputs`).  For more details, see the
+> [earlier note on naming](#naming).
+
+For more details, see
+[`tensorflow/core/framework/op_def_builder.h`][op_def_builder].
+
+### Backwards compatibility <div class="md-anchor" id="AUTOGENERATED-backwards-compatibility">{#AUTOGENERATED-backwards-compatibility}</div>
+
+In general, changes to specifications must be backwards-compatible: changing the
+specification of an Op must not break prior serialized GraphDefs constructed
+from older specfications.
+
+There are several ways to preserve backwards-compatibility.
+
+1. Any new attrs added to an operation must have default values defined, and
+   with that default value the Op must have the original behavior. To change an
+   operation from not polymorphic to polymorphic, you *must* give a default
+   value to the new type attr to preserve the original signature by default. For
+   example, if your operation was:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: float")
+       .Output("out: float");
+   ```
+
+   you can make it polymorphic in a backwards-compatible way using:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: T")
+       .Output("out: T")
+       .Attr("T: numerictype = float");
+   ```
+
+1. You can safely make a constraint on an attr less restrictive.  For example,
+   you can change from `{int32, int64}` to `{int32, int64, float}` or from
+   `{"apple", "orange"}` to `{"apple", "banana", "orange"}`.
+
+1. Namespace any new Ops you create, by prefixing the Op names with something
+   unique to your project. This avoids having your Op colliding with any Ops
+   that might be included in future versions of Tensorflow.
+
+1. Plan ahead! Try to anticipate future uses for the Op. Some signature changes
+   can't be done in a compatible way (for example, adding an input, or making a
+   single input into a list).
+
+If you cannot make your change to an operation backwards compatible, then
+create a new operation with a new name with the new semantics.
+
+## GPU Support <div class="md-anchor" id="mult-archs">{#mult-archs}</div>
+
+You can implement different OpKernels and register one for CPU and another for
+GPU, just like you can [register kernels for different types](#polymorphism).
+There are several examples of kernels with GPU support in
+[tensorflow/core/kernels/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/).
+Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
+ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.
+
+For example, the [`pad` op](../../api_docs/python/array_ops.md#pad) has
+everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
+The GPU kernel is in
+[`tensorflow/core/kernels/pad_op_gpu.cu.cc`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op_gpu.cu.cc),
+and the shared code is a templated class defined in
+[`tensorflow/core/kernels/pad_op.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.h).
+One thing to note, even when the GPU kernel version of `pad` is used, it still
+needs its `"paddings"` input in CPU memory.  To mark that inputs or outputs are
+kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
+
+```c++
+#define REGISTER_GPU_KERNEL(T)                         \
+  REGISTER_KERNEL_BUILDER(Name("Pad")                  \
+                              .Device(DEVICE_GPU)      \
+                              .TypeConstraint<T>("T")  \
+                              .HostMemory("paddings"), \
+                          PadOp<GPUDevice, T>)
+```
+
+## Implement the gradient in Python <div class="md-anchor" id="AUTOGENERATED-implement-the-gradient-in-python">{#AUTOGENERATED-implement-the-gradient-in-python}</div>
+
+[TODO]:# (Write this!)
+
+## Implement a shape function in Python <div class="md-anchor" id="AUTOGENERATED-implement-a-shape-function-in-python">{#AUTOGENERATED-implement-a-shape-function-in-python}</div>
+
+The TensorFlow Python API has a feature called "shape inference" that provides
+information about the shapes of tensors without having to execute the
+graph. Shape inference is supported by "shape functions" that are registered for
+each op type, and perform two roles: asserting that the shapes of the inputs are
+compatible, and specifying the shapes for the outputs. A shape function is a
+Python function that takes an
+[`Operation`](../../api_docs/python/framework.md#Operation) as input, and
+returns a list of
+[`TensorShape`](../../api_docs/python/framework.md#TensorShape) objects (one per
+output of the op). To register a shape function, apply the
+[`tf.RegisterShape` decorator](../../api_docs/python/framework.md#RegisterShape)
+to a shape function. For example, the
+[ZeroOut op defined above](#define_interface) would have a shape function like
+the following:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the unconstrained version of ZeroOut, which produces an output
+  with the same shape as its input.
+  """
+  return [op.inputs[0].get_shape()]
+```
+
+A shape function can also constrain the shape of an input. For the version of
+[ZeroOut with a vector shape constraint](#validation), the shape function
+would be as follows:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the constrained version of ZeroOut, which requires the input to
+  have rank 1 (a vector).
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(1)
+  return [input_shape]
+```
+
+If your op is [polymorphic with multiple inputs](#polymorphism), use the
+properties of the operation to determine the number of shapes to check:
+
+```
+@tf.RegisterShape("IntListInputExample")
+def _int_list_input_example_shape(op):
+  """Shape function for the "IntListInputExample" op.
+
+  All inputs and the output are matrices of the same size.
+  """
+  output_shape = tf.TensorShape(None)
+  for input in op.inputs:
+    output_shape = output_shape.merge_with(input.get_shape().with_rank(2))
+  return [output_shape]
+```
+
+Since shape inference is an optional feature, and the shapes of tensors may vary
+dynamically, shape functions must be robust to incomplete shape information for
+any of the inputs. The [`merge_with()`](../../api_docs/python/framework.md)
+method allows the caller to assert that two shapes are the same, even if either
+or both of them do not have complete information. Shape functions are defined
+for all of the
+[standard Python ops](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/),
+and provide many different usage examples.
+
+[core-array_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/ops/array_ops.cc
+[python-user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py
+[tf-kernels]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/
+[user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/user_ops/
+[pad_op]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.cc
+[standard_ops-py]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/standard_ops.py
+[standard_ops-cc]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/cc/ops/standard_ops.h
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[validation-macros]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/lib/core/errors.h
+[op_def_builder]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.h
+[register_types]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/register_types.h
+[FinalizeAttr]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.cc#FinalizeAttr
+[DataTypeString]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.cc#DataTypeString
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[types-proto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.proto
+[TensorShapeProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor_shape.proto
+[TensorProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor.proto