1 files changed, 0 insertions, 1460 deletions
diff --git a/tensorflow/docs_src/extend/adding_an_op.md b/tensorflow/docs_src/extend/adding_an_op.md
deleted file mode 100644
index cc25ab9b45..0000000000
--- a/tensorflow/docs_src/extend/adding_an_op.md
+++ /dev/null
@@ -1,1460 +0,0 @@
-# Adding a New Op
-
-Note: By default [www.tensorflow.org](https://www.tensorflow.org) shows docs for the
-most recent stable version. The instructions in this doc require building from
-source. You will probably want to build from the `master` version of tensorflow.
-You should, as a result, be sure you are following the
-[`master` version of this doc](https://www.tensorflow.org/versions/master/extend/adding_an_op),
-in case there have been any changes.
-
-If you'd like to create an op that isn't covered by the existing TensorFlow
-library, we recommend that you first try writing the op in Python as
-a composition of existing Python ops or functions. If that isn't possible, you
-can create a custom C++ op. There are several reasons why you might want to
-create a custom C++ op:
-
-*   It's not easy or possible to express your operation as a composition of
-    existing ops.
-*   It's not efficient to express your operation as a composition of existing
-    primitives.
-*   You want to hand-fuse a composition of primitives that a future compiler
-    would find difficult fusing.
-
-For example, imagine you want to implement something like "median pooling",
-similar to the "MaxPool" operator, but computing medians over sliding windows
-instead of maximum values.  Doing this using a composition of operations may be
-possible (e.g., using ExtractImagePatches and TopK), but may not be as
-performance- or memory-efficient as a native operation where you can do
-something more clever in a single, fused operation. As always, it is typically
-first worth trying to express what you want using operator composition, only
-choosing to add a new operation if that proves to be difficult or inefficient.
-
-To incorporate your custom op you'll need to:
-
-1.  Register the new op in a C++ file. Op registration defines an interface
-    (specification) for the op's functionality, which is independent of the
-    op's implementation. For example, op registration defines the op's name and
-    the op's inputs and outputs. It also defines the shape function
-    that is used for tensor shape inference.
-2.  Implement the op in C++. The implementation of an op is known
-    as a kernel, and it is the concrete implementation of the specification you
-    registered in Step 1. There can be multiple kernels for different input /
-    output types or architectures (for example, CPUs, GPUs).
-3.  Create a Python wrapper (optional). This wrapper is the public API that's
-    used to create the op in Python. A default wrapper is generated from the
-    op registration, which can be used directly or added to.
-4.  Write a function to compute gradients for the op (optional).
-5.  Test the op. We usually do this in Python for convenience, but you can also
-    test the op in C++. If you define gradients, you can verify them with the
-    Python `tf.test.compute_gradient_error`.
-    See
-    [`relu_op_test.py`](https://www.tensorflow.org/code/tensorflow/python/kernel_tests/relu_op_test.py) as
-    an example that tests the forward functions of Relu-like operators and
-    their gradients.
-
-PREREQUISITES:
-
-*   Some familiarity with C++.
-*   Must have installed the
-    [TensorFlow binary](../install/index.md), or must have
-    [downloaded TensorFlow source](../install/install_sources.md),
-    and be able to build it.
-
-[TOC]
-
-## Define the op's interface
-
-You define the interface of an op by registering it with the TensorFlow system.
-In the registration, you specify the name of your op, its inputs (types and
-names) and outputs (types and names), as well as docstrings and
-any [attrs](#attrs) the op might require.
-
-To see how this works, suppose you'd like to create an op that takes a tensor of
-`int32`s and outputs a copy of the tensor, with all but the first element set to
-zero. To do this, create a file named `zero_out.cc`. Then add a call to the
-`REGISTER_OP` macro that defines the interface for your op:
-
-```c++
-#include "tensorflow/core/framework/op.h"
-#include "tensorflow/core/framework/shape_inference.h"
-
-using namespace tensorflow;
-
-REGISTER_OP("ZeroOut")
-    .Input("to_zero: int32")
-    .Output("zeroed: int32")
-    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
-      c->set_output(0, c->input(0));
-      return Status::OK();
-    });
-```
-
-This `ZeroOut` op takes one tensor `to_zero` of 32-bit integers as input, and
-outputs a tensor `zeroed` of 32-bit integers. The op also uses a shape function
-to ensure that the output tensor is the same shape as the input tensor. For
-example, if the input is a tensor of shape [10, 20], then this shape function
-specifies that the output shape is also [10, 20].
-
-
->   A note on naming: The op name must be in CamelCase and it must be unique
->   among all other ops that are registered in the binary.
-
-## Implement the kernel for the op
-
-After you define the interface, provide one or more implementations of the op.
-To create one of these kernels, create a class that extends `OpKernel` and
-overrides the `Compute` method. The `Compute` method provides one `context`
-argument of type `OpKernelContext*`, from which you can access useful things
-like the input and output tensors.
-
-Add your kernel to the file you created above. The kernel might look something
-like this:
-
-```c++
-#include "tensorflow/core/framework/op_kernel.h"
-
-using namespace tensorflow;
-
-class ZeroOutOp : public OpKernel {
- public:
-  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
-
-  void Compute(OpKernelContext* context) override {
-    // Grab the input tensor
-    const Tensor& input_tensor = context->input(0);
-    auto input = input_tensor.flat<int32>();
-
-    // Create an output tensor
-    Tensor* output_tensor = NULL;
-    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
-                                                     &output_tensor));
-    auto output_flat = output_tensor->flat<int32>();
-
-    // Set all but the first element of the output tensor to 0.
-    const int N = input.size();
-    for (int i = 1; i < N; i++) {
-      output_flat(i) = 0;
-    }
-
-    // Preserve the first input value if possible.
-    if (N > 0) output_flat(0) = input(0);
-  }
-};
-```
-
-After implementing your kernel, you register it with the TensorFlow system. In
-the registration, you specify different constraints under which this kernel
-will run. For example, you might have one kernel made for CPUs, and a separate
-one for GPUs.
-
-To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
-
-```c++
-REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
-```
-
->   Important: Instances of your OpKernel may be accessed concurrently.
->   Your `Compute` method must be thread-safe. Guard any access to class
->   members with a mutex. Or better yet, don't share state via class members!
->   Consider using a [`ResourceMgr`](https://www.tensorflow.org/code/tensorflow/core/framework/resource_mgr.h)
->   to keep track of op state.
-
-### Multi-threaded CPU kernels
-
-To write a multi-threaded CPU kernel, the Shard function in
-[`work_sharder.h`](https://www.tensorflow.org/code/tensorflow/core/util/work_sharder.h)
-can be used. This function shards a computation function across the
-threads configured to be used for intra-op threading (see
-intra_op_parallelism_threads in
-[`config.proto`](https://www.tensorflow.org/code/tensorflow/core/protobuf/config.proto)).
-
-### GPU kernels
-
-A GPU kernel is implemented in two parts: the OpKernel and the CUDA kernel and
-its launch code.
-
-Sometimes the OpKernel implementation is common between a CPU and GPU kernel,
-such as around inspecting inputs and allocating outputs.  In that case, a
-suggested implementation is to:
-
-1. Define the OpKernel templated on the Device and the primitive type of the
-   tensor.
-2. To do the actual computation of the output, the Compute function calls a
-    templated functor struct.
-3. The specialization of that functor for the CPUDevice is defined in the same
-   file, but the specialization for the GPUDevice is defined in a .cu.cc file,
-   since it will be compiled with the CUDA compiler.
-
-Here is an example implementation.
-
-```c++
-// kernel_example.h
-#ifndef KERNEL_EXAMPLE_H_
-#define KERNEL_EXAMPLE_H_
-
-template <typename Device, typename T>
-struct ExampleFunctor {
-  void operator()(const Device& d, int size, const T* in, T* out);
-};
-
-#if GOOGLE_CUDA
-// Partially specialize functor for GpuDevice.
-template <typename Eigen::GpuDevice, typename T>
-struct ExampleFunctor {
-  void operator()(const Eigen::GpuDevice& d, int size, const T* in, T* out);
-};
-#endif
-
-#endif KERNEL_EXAMPLE_H_
-```
-
-```c++
-// kernel_example.cc
-#include "example.h"
-#include "tensorflow/core/framework/op_kernel.h"
-
-using namespace tensorflow;
-
-using CPUDevice = Eigen::ThreadPoolDevice;
-using GPUDevice = Eigen::GpuDevice;
-
-// CPU specialization of actual computation.
-template <typename T>
-struct ExampleFunctor<CPUDevice, T> {
-  void operator()(const CPUDevice& d, int size, const T* in, T* out) {
-    for (int i = 0; i < size; ++i) {
-      out[i] = 2 * in[i];
-    }
-  }
-};
-
-// OpKernel definition.
-// template parameter <T> is the datatype of the tensors.
-template <typename Device, typename T>
-class ExampleOp : public OpKernel {
- public:
-  explicit ExampleOp(OpKernelConstruction* context) : OpKernel(context) {}
-
-  void Compute(OpKernelContext* context) override {
-    // Grab the input tensor
-    const Tensor& input_tensor = context->input(0);
-
-    // Create an output tensor
-    Tensor* output_tensor = NULL;
-    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
-                                                     &output_tensor));
-
-    // Do the computation.
-    OP_REQUIRES(context, input_tensor.NumElements() <= tensorflow::kint32max,
-                errors::InvalidArgument("Too many elements in tensor"));
-    ExampleFunctor<Device, T>()(
-        context->eigen_device<Device>(),
-        static_cast<int>(input_tensor.NumElements()),
-        input_tensor.flat<T>().data(),
-        output_tensor->flat<T>().data());
-  }
-};
-
-// Register the CPU kernels.
-#define REGISTER_CPU(T)                                          \
-  REGISTER_KERNEL_BUILDER(                                       \
-      Name("Example").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
-      ExampleOp<CPUDevice, T>);
-REGISTER_CPU(float);
-REGISTER_CPU(int32);
-
-// Register the GPU kernels.
-#ifdef GOOGLE_CUDA
-#define REGISTER_GPU(T)                                          \
-  /* Declare explicit instantiations in kernel_example.cu.cc. */ \
-  extern template ExampleFunctor<GPUDevice, T>;                  \
-  REGISTER_KERNEL_BUILDER(                                       \
-      Name("Example").Device(DEVICE_GPU).TypeConstraint<T>("T"), \
-      ExampleOp<GPUDevice, T>);
-REGISTER_GPU(float);
-REGISTER_GPU(int32);
-#endif  // GOOGLE_CUDA
-```
-
-```c++
-// kernel_example.cu.cc
-#ifdef GOOGLE_CUDA
-#define EIGEN_USE_GPU
-#include "example.h"
-#include "tensorflow/core/util/cuda_kernel_helper.h"
-
-using namespace tensorflow;
-
-using GPUDevice = Eigen::GpuDevice;
-
-// Define the CUDA kernel.
-template <typename T>
-__global__ void ExampleCudaKernel(const int size, const T* in, T* out) {
-  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size;
-       i += blockDim.x * gridDim.x) {
-    out[i] = 2 * ldg(in + i);
-  }
-}
-
-// Define the GPU implementation that launches the CUDA kernel.
-template <typename T>
-void ExampleFunctor<GPUDevice, T>::operator()(
-    const GPUDevice& d, int size, const T* in, T* out) {
-  // Launch the cuda kernel.
-  //
-  // See core/util/cuda_kernel_helper.h for example of computing
-  // block count and thread_per_block count.
-  int block_count = 1024;
-  int thread_per_block = 20;
-  ExampleCudaKernel<T>
-      <<<block_count, thread_per_block, 0, d.stream()>>>(size, in, out);
-}
-
-// Explicitly instantiate functors for the types of OpKernels registered.
-template struct ExampleFunctor<GPUDevice, float>;
-template struct ExampleFunctor<GPUDevice, int32>;
-
-#endif  // GOOGLE_CUDA
-```
-
-## Build the op library
-### Compile the op using your system compiler (TensorFlow binary installation)
-
-You should be able to compile `zero_out.cc` with a `C++` compiler such as `g++`
-or `clang` available on your system. The binary PIP package installs the header
-files and the library that you need to compile your op in locations that are
-system specific. However, the TensorFlow python library provides the
-`get_include` function to get the header directory, and the `get_lib` directory
-has a shared object to link against.
-Here are the outputs of these functions on an Ubuntu machine.
-
-```bash
-$ python
->>> import tensorflow as tf
->>> tf.sysconfig.get_include()
-'/usr/local/lib/python2.7/site-packages/tensorflow/include'
->>> tf.sysconfig.get_lib()
-'/usr/local/lib/python2.7/site-packages/tensorflow'
-```
-
-Assuming you have `g++` installed, here is the sequence of commands you can use
-to compile your op into a dynamic library.
-
-```bash
-TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
-TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
-g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2
-```
-
-On Mac OS X, the additional flag "-undefined dynamic_lookup" is required when
-building the `.so` file.
-
->   Note on `gcc` version `>=5`: gcc uses the new C++
->   [ABI](https://gcc.gnu.org/gcc-5/changes.html#libstdcxx) since version `5`. The binary pip
->   packages available on the TensorFlow website are built with `gcc4` that uses
->   the older ABI. If you compile your op library with `gcc>=5`, add
->   `-D_GLIBCXX_USE_CXX11_ABI=0` to the command line to make the library
->   compatible with the older abi.
->   Furthermore if you are using TensorFlow package created from source remember to add `--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"`
->   as bazel command to compile the Python package.
-
-### Compile the op using bazel (TensorFlow source installation)
-
-If you have TensorFlow sources installed, you can make use of TensorFlow's build
-system to compile your op. Place a BUILD file with following Bazel build rule in
-the [`tensorflow/core/user_ops`][user_ops] directory.
-
-```python
-load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
-
-tf_custom_op_library(
-    name = "zero_out.so",
-    srcs = ["zero_out.cc"],
-)
-```
-
-Run the following command to build `zero_out.so`.
-
-```bash
-$ bazel build --config opt //tensorflow/core/user_ops:zero_out.so
-```
-
->   Note: Although you can create a shared library (a `.so` file) with the
->   standard `cc_library` rule, we strongly recommend that you use the
->   `tf_custom_op_library` macro. It adds some required dependencies, and
->   performs checks to ensure that the shared library is compatible with
->   TensorFlow's plugin loading mechanism.
-
-## Use the op in Python
-
-TensorFlow Python API provides the
-`tf.load_op_library` function to
-load the dynamic library and register the op with the TensorFlow
-framework. `load_op_library` returns a Python module that contains the Python
-wrappers for the op and the kernel. Thus, once you have built the op, you can
-do the following to run it from Python:
-
-```python
-import tensorflow as tf
-zero_out_module = tf.load_op_library('./zero_out.so')
-with tf.Session(''):
-  zero_out_module.zero_out([[1, 2], [3, 4]]).eval()
-
-# Prints
-array([[1, 0], [0, 0]], dtype=int32)
-```
-
-Keep in mind, the generated function will be given a snake\_case name (to comply
-with [PEP8](https://www.python.org/dev/peps/pep-0008/)). So, if your op is
-named `ZeroOut` in the C++ files, the python function will be called `zero_out`.
-
-To make the op available as a regular function `import`-able from a Python
-module, it maybe useful to have the `load_op_library` call in a Python source
-file as follows:
-
-```python
-import tensorflow as tf
-
-zero_out_module = tf.load_op_library('./zero_out.so')
-zero_out = zero_out_module.zero_out
-```
-
-## Verify that the op works
-
-A good way to verify that you've successfully implemented your op is to write a
-test for it. Create the file
-`zero_out_op_test.py` with the contents:
-
-```python
-import tensorflow as tf
-
-class ZeroOutTest(tf.test.TestCase):
-  def testZeroOut(self):
-    zero_out_module = tf.load_op_library('./zero_out.so')
-    with self.test_session():
-      result = zero_out_module.zero_out([5, 4, 3, 2, 1])
-      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
-
-if __name__ == "__main__":
-  tf.test.main()
-```
-
-Then run your test (assuming you have tensorflow installed):
-
-```sh
-$ python zero_out_op_test.py
-```
-
-## Building advanced features into your op
-
-Now that you know how to build a basic (and somewhat restricted) op and
-implementation, we'll look at some of the more complicated things you will
-typically need to build into your op. This includes:
-
-*   [Conditional checks and validation](#conditional-checks-and-validation)
-*   [Op registration](#op-registration)
-    *   [Attrs](#attrs)
-    *   [Attr types](#attr-types)
-    *   [Polymorphism](#polymorphism)
-    *   [Inputs and outputs](#inputs-and-outputs)
-    *   [Backwards compatibility](#backwards-compatibility)
-*   [GPU support](#gpu-support)
-    *   [Compiling the kernel for the GPU device](#compiling-the-kernel-for-the-gpu-device)
-*   [Implement the gradient in Python](#implement-the-gradient-in-python)
-*   [Shape functions in C++](#shape-functions-in-c)
-
-### Conditional checks and validation
-
-The example above assumed that the op applied to a tensor of any shape.  What
-if it only applied to vectors?  That means adding a check to the above OpKernel
-implementation.
-
-```c++
-  void Compute(OpKernelContext* context) override {
-    // Grab the input tensor
-    const Tensor& input_tensor = context->input(0);
-
-    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_tensor.shape()),
-                errors::InvalidArgument("ZeroOut expects a 1-D vector."));
-    // ...
-  }
-```
-
-This asserts that the input is a vector, and returns having set the
-`InvalidArgument` status if it isn't.  The
-[`OP_REQUIRES` macro][validation-macros] takes three arguments:
-
-*   The `context`, which can either be an `OpKernelContext` or
-    `OpKernelConstruction` pointer (see
-    [`tensorflow/core/framework/op_kernel.h`](https://www.tensorflow.org/code/tensorflow/core/framework/op_kernel.h)),
-    for its `SetStatus()` method.
-*   The condition.  For example, there are functions for validating the shape
-    of a tensor in
-    [`tensorflow/core/framework/tensor_shape.h`](https://www.tensorflow.org/code/tensorflow/core/framework/tensor_shape.h)
-*   The error itself, which is represented by a `Status` object, see
-    [`tensorflow/core/lib/core/status.h`](https://www.tensorflow.org/code/tensorflow/core/lib/core/status.h). A
-    `Status` has both a type (frequently `InvalidArgument`, but see the list of
-    types) and a message.  Functions for constructing an error may be found in
-    [`tensorflow/core/lib/core/errors.h`][validation-macros].
-
-Alternatively, if you want to test whether a `Status` object returned from some
-function is an error, and if so return it, use
-[`OP_REQUIRES_OK`][validation-macros].  Both of these macros return from the
-function on error.
-
-### Op registration
-
-#### Attrs
-
-Ops can have attrs, whose values are set when the op is added to a graph. These
-are used to configure the op, and their values can be accessed both within the
-kernel implementation and in the types of inputs and outputs in the op
-registration. Prefer using an input instead of an attr when possible, since
-inputs are more flexible. This is because attrs are constants and must be
-defined at graph construction time. In contrast, inputs are Tensors whose
-values can be dynamic; that is, inputs can change every step, be set using a
-feed, etc. Attrs are used for things that can't be done with inputs: any
-configuration that affects the signature (number or type of inputs or outputs)
-or that can't change from step-to-step.
-
-You define an attr when you register the op, by specifying its name and type
-using the `Attr` method, which expects a spec of the form:
-
-```
-<name>: <attr-type-expr>
-```
-
-where `<name>` begins with a letter and can be composed of alphanumeric
-characters and underscores, and `<attr-type-expr>` is a type expression of the
-form [described below](#attr_types).
-
-For example, if you'd like the `ZeroOut` op to preserve a user-specified index,
-instead of only the 0th element, you can register the op like so:
-```c++
-REGISTER_OP("ZeroOut")
-    .Attr("preserve_index: int")
-    .Input("to_zero: int32")
-    .Output("zeroed: int32");
-```
-
-(Note that the set of [attribute types](#attr_types) is different from the
-`tf.DType` used for inputs and outputs.)
-
-Your kernel can then access this attr in its constructor via the `context`
-parameter:
-```c++
-class ZeroOutOp : public OpKernel {
- public:
-  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {
-    // Get the index of the value to preserve
-    OP_REQUIRES_OK(context,
-                   context->GetAttr("preserve_index", &preserve_index_));
-    // Check that preserve_index is positive
-    OP_REQUIRES(context, preserve_index_ >= 0,
-                errors::InvalidArgument("Need preserve_index >= 0, got ",
-                                        preserve_index_));
-  }
-  void Compute(OpKernelContext* context) override {
-    // ...
-  }
- private:
-  int preserve_index_;
-};
-```
-
-which can then be used in the `Compute` method:
-```c++
-  void Compute(OpKernelContext* context) override {
-    // ...
-
-    // We're using saved attr to validate potentially dynamic input
-    // So we check that preserve_index is in range
-    OP_REQUIRES(context, preserve_index_ < input.dimension(0),
-                errors::InvalidArgument("preserve_index out of range"));
-
-    // Set all the elements of the output tensor to 0
-    const int N = input.size();
-    for (int i = 0; i < N; i++) {
-      output\_flat(i) = 0;
-    }
-
-    // Preserve the requested input value
-    output_flat(preserve_index_) = input(preserve_index_);
-  }
-```
-
-#### Attr types
-
-The following types are supported in an attr:
-
-* `string`: Any sequence of bytes (not required to be UTF8).
-* `int`: A signed integer.
-* `float`: A floating point number.
-* `bool`: True or false.
-* `type`: One of the (non-ref) values of [`DataType`][DataTypeString].
-* `shape`: A [`TensorShapeProto`][TensorShapeProto].
-* `tensor`: A [`TensorProto`][TensorProto].
-* `list(<type>)`: A list of `<type>`, where `<type>` is one of the above types.
-  Note that `list(list(<type>))` is invalid.
-
-See also: [`op_def_builder.cc:FinalizeAttr`][FinalizeAttr] for a definitive list.
-
-##### Default values & constraints
-
-Attrs may have default values, and some types of attrs can have constraints. To
-define an attr with constraints, you can use the following `<attr-type-expr>`s:
-
-* `{'<string1>', '<string2>'}`: The value must be a string that has either the
-  value `<string1>` or `<string2>`.  The name of the type, `string`, is implied
-  when you use this syntax.  This emulates an enum:
-
-  ```c++
-  REGISTER_OP("EnumExample")
-      .Attr("e: {'apple', 'orange'}");
-  ```
-
-* `{<type1>, <type2>}`: The value is of type `type`, and must be one of
-  `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
-  `tf.DType`.  You don't specify
-  that the type of the attr is `type`. This is implied when you have a list of
-  types in `{...}`.  For example, in this case the attr `t` is a type that must
-  be an `int32`, a `float`, or a `bool`:
-
-  ```c++
-  REGISTER_OP("RestrictedTypeExample")
-      .Attr("t: {int32, float, bool}");
-  ```
-
-* There are shortcuts for common type constraints:
-    * `numbertype`: Type `type` restricted to the numeric (non-string and
-      non-bool) types.
-    * `realnumbertype`: Like `numbertype` without complex types.
-    * `quantizedtype`: Like `numbertype` but just the quantized number types.
-
-    The specific lists of types allowed by these are defined by the functions
-    (like `NumberTypes()`) in
-    [`tensorflow/core/framework/types.h`](https://www.tensorflow.org/code/tensorflow/core/framework/types.h).
-    In this example the attr `t` must be one of the numeric types:
-
-    ```c++
-    REGISTER_OP("NumberType")
-        .Attr("t: numbertype");
-    ```
-
-    For this op:
-
-    ```python
-    tf.number_type(t=tf.int32)  # Valid
-    tf.number_type(t=tf.bool)   # Invalid
-    ```
-
-    Lists can be combined with other lists and single types.  The following
-    op allows attr `t` to be any of the numeric types, or the bool type:
-
-    ```c++
-    REGISTER_OP("NumberOrBooleanType")
-        .Attr("t: {numbertype, bool}");
-    ```
-
-    For this op:
-
-    ```python
-    tf.number_or_boolean_type(t=tf.int32)  # Valid
-    tf.number_or_boolean_type(t=tf.bool)   # Valid
-    tf.number_or_boolean_type(t=tf.string) # Invalid
-    ```
-
-* `int >= <n>`: The value must be an int whose value is greater than or equal to
-  `<n>`, where `<n>` is a natural number.
-
-  For example, the following op registration specifies that the attr `a` must
-  have a value that is at least `2`:
-
-  ```c++
-  REGISTER_OP("MinIntExample")
-      .Attr("a: int >= 2");
-  ```
-
-* `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
-  or equal to `<n>`.
-
-  For example, the following op registration specifies that the attr `a` is a
-  list of types (either `int32` or `float`), and that there must be at least 3
-  of them:
-
-  ```c++
-  REGISTER_OP("TypeListExample")
-      .Attr("a: list({int32, float}) >= 3");
-  ```
-
-To set a default value for an attr (making it optional in the generated code),
-add `= <default>` to the end, as in:
-
-```c++
-REGISTER_OP("AttrDefaultExample")
-    .Attr("i: int = 0");
-```
-
-The supported syntax of the default value is what would be used in the proto
-representation of the resulting GraphDef definition.
-
-Here are examples for how to specify a default for all types:
-
-```c++
-REGISTER_OP("AttrDefaultExampleForAllTypes")
-   .Attr("s: string = 'foo'")
-   .Attr("i: int = 0")
-   .Attr("f: float = 1.0")
-   .Attr("b: bool = true")
-   .Attr("ty: type = DT_INT32")
-   .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
-   .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
-   .Attr("l_empty: list(int) = []")
-   .Attr("l_int: list(int) = [2, 3, 5, 7]");
-```
-
-Note in particular that the values of type `type`
-use `tf.DType`.
-
-#### Polymorphism
-
-##### Type Polymorphism
-
-For ops that can take different types as input or produce different output
-types, you can specify [an attr](#attrs) in
-[an input or output type](#inputs-and-outputs) in the op registration.  Typically
-you would then register an `OpKernel` for each supported type.
-
-For instance, if you'd like the `ZeroOut` op to work on `float`s
-in addition to `int32`s, your op registration might look like:
-```c++
-REGISTER_OP("ZeroOut")
-    .Attr("T: {float, int32}")
-    .Input("to_zero: T")
-    .Output("zeroed: T");
-```
-
-Your op registration now specifies that the input's type must be `float`, or
-`int32`, and that its output will be the same type, since both have type `T`.
-
-> <a id="naming"></a>A note on naming: Inputs, outputs, and attrs generally should be
-> given snake\_case names.  The one exception is attrs that are used as the type
-> of an input or in the type of an input. Those attrs can be inferred when the
-> op is added to the graph and so don't appear in the op's function.  For
-> example, this last definition of ZeroOut will generate a Python function that
-> looks like:
->
-> ```python
-> def zero_out(to_zero, name=None):
->   """...
->   Args:
->     to_zero: A `Tensor`. Must be one of the following types:
->         `float32`, `int32`.
->     name: A name for the operation (optional).
->
->   Returns:
->     A `Tensor`. Has the same type as `to_zero`.
->   """
-> ```
->
-> If `to_zero` is passed an `int32` tensor, then `T` is automatically set to
-> `int32` (well, actually `DT_INT32`). Those inferred attrs are given
-> Capitalized or CamelCase names.
->
-> Compare this with an op that has a type attr that determines the output
-> type:
->
-> ```c++
-> REGISTER_OP("StringToNumber")
->     .Input("string_tensor: string")
->     .Output("output: out_type")
->     .Attr("out_type: {float, int32} = DT_FLOAT");
->     .Doc(R"doc(
-> Converts each string in the input Tensor to the specified numeric type.
-> )doc");
-> ```
->
-> In this case, the user has to specify the output type, as in the generated
-> Python:
->
-> ```python
-> def string_to_number(string_tensor, out_type=None, name=None):
->   """Converts each string in the input Tensor to the specified numeric type.
->
->   Args:
->     string_tensor: A `Tensor` of type `string`.
->     out_type: An optional `tf.DType` from: `tf.float32, tf.int32`.
->       Defaults to `tf.float32`.
->     name: A name for the operation (optional).
->
->   Returns:
->     A `Tensor` of type `out_type`.
->   """
-> ```
-
-```c++
-#include "tensorflow/core/framework/op_kernel.h"
-
-class ZeroOutInt32Op : public OpKernel {
-  // as before
-};
-
-class ZeroOutFloatOp : public OpKernel {
- public:
-  explicit ZeroOutFloatOp(OpKernelConstruction* context)
-      : OpKernel(context) {}
-
-  void Compute(OpKernelContext* context) override {
-    // Grab the input tensor
-    const Tensor& input_tensor = context->input(0);
-    auto input = input_tensor.flat<float>();
-
-    // Create an output tensor
-    Tensor* output = NULL;
-    OP_REQUIRES_OK(context,
-                   context->allocate_output(0, input_tensor.shape(), &output));
-    auto output_flat = output->template flat<float>();
-
-    // Set all the elements of the output tensor to 0
-    const int N = input.size();
-    for (int i = 0; i < N; i++) {
-      output_flat(i) = 0;
-    }
-
-    // Preserve the first input value
-    if (N > 0) output_flat(0) = input(0);
-  }
-};
-
-// Note that TypeConstraint<int32>("T") means that attr "T" (defined
-// in the op registration above) must be "int32" to use this template
-// instantiation.
-REGISTER_KERNEL_BUILDER(
-    Name("ZeroOut")
-    .Device(DEVICE_CPU)
-    .TypeConstraint<int32>("T"),
-    ZeroOutOpInt32);
-REGISTER_KERNEL_BUILDER(
-    Name("ZeroOut")
-    .Device(DEVICE_CPU)
-    .TypeConstraint<float>("T"),
-    ZeroOutFloatOp);
-```
-
-> To preserve [backwards compatibility](#backwards-compatibility), you should
-> specify a [default value](#default-values-constraints) when adding an attr to
-> an existing op:
->
-> ```c++
-> REGISTER_OP("ZeroOut")
->   .Attr("T: {float, int32} = DT_INT32")
->   .Input("to_zero: T")
->   .Output("zeroed: T")
-> ```
-
-Let's say you wanted to add more types, say `double`:
-```c++
-REGISTER_OP("ZeroOut")
-    .Attr("T: {float, double, int32}")
-    .Input("to_zero: T")
-    .Output("zeroed: T");
-```
-
-Instead of writing another `OpKernel` with redundant code as above, often you
-will be able to use a C++ template instead.  You will still have one kernel
-registration (`REGISTER_KERNEL_BUILDER` call) per overload.
-```c++
-template <typename T>
-class ZeroOutOp : public OpKernel {
- public:
-  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
-  
-  void Compute(OpKernelContext* context) override {
-    // Grab the input tensor
-    const Tensor& input_tensor = context->input(0);
-    auto input = input_tensor.flat<T>();
-    
-    // Create an output tensor
-    Tensor* output = NULL;
-    OP_REQUIRES_OK(context,
-                   context->allocate_output(0, input_tensor.shape(), &output));
-    auto output_flat = output->template flat<T>();
-    
-    // Set all the elements of the output tensor to 0
-    const int N = input.size();
-    for (int i = 0; i < N; i++) {
-      output_flat(i) = 0;
-    }
-    
-    // Preserve the first input value
-    if (N > 0) output_flat(0) = input(0);
-  }
-};
-
-// Note that TypeConstraint<int32>("T") means that attr "T" (defined
-// in the op registration above) must be "int32" to use this template
-// instantiation.
-REGISTER_KERNEL_BUILDER(
-    Name("ZeroOut")
-    .Device(DEVICE_CPU)
-    .TypeConstraint<int32>("T"),
-    ZeroOutOp<int32>);
-REGISTER_KERNEL_BUILDER(
-    Name("ZeroOut")
-    .Device(DEVICE_CPU)
-    .TypeConstraint<float>("T"),
-    ZeroOutOp<float>);
-REGISTER_KERNEL_BUILDER(
-    Name("ZeroOut")
-    .Device(DEVICE_CPU)
-    .TypeConstraint<double>("T"),
-    ZeroOutOp<double>);
-```
-
-If you have more than a couple overloads, you can put the registration in a
-macro.
-
-```c++
-#include "tensorflow/core/framework/op_kernel.h"
-
-#define REGISTER_KERNEL(type)                                       \
-  REGISTER_KERNEL_BUILDER(                                          \
-      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
-      ZeroOutOp<type>)
-
-REGISTER_KERNEL(int32);
-REGISTER_KERNEL(float);
-REGISTER_KERNEL(double);
-
-#undef REGISTER_KERNEL
-```
-
-Depending on the list of types you are registering the kernel for, you may be
-able to use a macro provided by
-[`tensorflow/core/framework/register_types.h`][register_types]:
-
-```c++
-#include "tensorflow/core/framework/op_kernel.h"
-#include "tensorflow/core/framework/register_types.h"
-
-REGISTER_OP("ZeroOut")
-    .Attr("T: realnumbertype")
-    .Input("to_zero: T")
-    .Output("zeroed: T");
-
-template <typename T>
-class ZeroOutOp : public OpKernel { ... };
-
-#define REGISTER_KERNEL(type)                                       \
-  REGISTER_KERNEL_BUILDER(                                          \
-      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
-      ZeroOutOp<type>)
-
-TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
-
-#undef REGISTER_KERNEL
-```
-
-##### List Inputs and Outputs
-
-In addition to being able to accept or produce different types, ops can consume
-or produce a variable number of tensors.
-
-In the next example, the attr `T` holds a *list* of types, and is used as the
-type of both the input `in` and the output `out`.  The input and output are
-lists of tensors of that type (and the number and types of tensors in the output
-are the same as the input, since both have type `T`).
-
-```c++
-REGISTER_OP("PolymorphicListExample")
-    .Attr("T: list(type)")
-    .Input("in: T")
-    .Output("out: T");
-```
-
-You can also place restrictions on what types can be specified in the list. In
-this next case, the input is a list of `float` and `double` tensors. The op
-accepts, for example, input types `(float, double, float)` and in that case the
-output type would also be `(float, double, float)`.
-
-```c++
-REGISTER_OP("ListTypeRestrictionExample")
-    .Attr("T: list({float, double})")
-    .Input("in: T")
-    .Output("out: T");
-```
-
-If you want all the tensors in a list to be of the same type, you might do
-something like:
-
-```c++
-REGISTER_OP("IntListInputExample")
-    .Attr("N: int")
-    .Input("in: N * int32")
-    .Output("out: int32");
-```
-
-This accepts a list of `int32` tensors, and uses an `int` attr `N` to
-specify the length of the list.
-
-This can be made [type polymorphic](#type-polymorphism) as well.  In the next
-example, the input is a list of tensors (with length `"N"`) of the same (but
-unspecified) type (`"T"`), and the output is a single tensor of matching type:
-
-```c++
-REGISTER_OP("SameListInputExample")
-    .Attr("N: int")
-    .Attr("T: type")
-    .Input("in: N * T")
-    .Output("out: T");
-```
-
-By default, tensor lists have a minimum length of 1. You can change that default
-using
-[a `">="` constraint on the corresponding attr](#default-values-constraints).
-In this next example, the input is a list of at least 2 `int32` tensors:
-
-```c++
-REGISTER_OP("MinLengthIntListExample")
-    .Attr("N: int >= 2")
-    .Input("in: N * int32")
-    .Output("out: int32");
-```
-
-The same syntax works with `"list(type)"` attrs:
-
-```c++
-REGISTER_OP("MinimumLengthPolymorphicListExample")
-    .Attr("T: list(type) >= 3")
-    .Input("in: T")
-    .Output("out: T");
-```
-
-#### Inputs and Outputs
-
-To summarize the above, an op registration can have multiple inputs and outputs:
-
-```c++
-REGISTER_OP("MultipleInsAndOuts")
-    .Input("y: int32")
-    .Input("z: float")
-    .Output("a: string")
-    .Output("b: int32");
-```
-
-Each input or output spec is of the form:
-
-```
-<name>: <io-type-expr>
-```
-
-where `<name>` begins with a letter and can be composed of alphanumeric
-characters and underscores. `<io-type-expr>` is one of the following type
-expressions:
-
-* `<type>`, where `<type>` is a supported input type (e.g. `float`, `int32`,
-  `string`). This specifies a single tensor of the given type.
-
-  See
-  `tf.DType`.
-
-  ```c++
-  REGISTER_OP("BuiltInTypesExample")
-      .Input("integers: int32")
-      .Input("complex_numbers: complex64");
-  ```
-
-* `<attr-type>`, where `<attr-type>` is the name of an [Attr](#attrs) with type
-  `type` or `list(type)` (with a possible type restriction). This syntax allows
-  for [polymorphic ops](#polymorphism).
-
-  ```c++
-  REGISTER_OP("PolymorphicSingleInput")
-      .Attr("T: type")
-      .Input("in: T");
-
-  REGISTER_OP("RestrictedPolymorphicSingleInput")
-      .Attr("T: {int32, int64}")
-      .Input("in: T");
-  ```
-
-  Referencing an attr of type `list(type)` allows you to accept a sequence of
-  tensors.
-
-  ```c++
-  REGISTER_OP("ArbitraryTensorSequenceExample")
-      .Attr("T: list(type)")
-      .Input("in: T")
-      .Output("out: T");
-
-  REGISTER_OP("RestrictedTensorSequenceExample")
-      .Attr("T: list({int32, int64})")
-      .Input("in: T")
-      .Output("out: T");
-  ```
-
-  Note that the number and types of tensors in the output `out` is the same as
-  in the input `in`, since both are of type `T`.
-
-* For a sequence of tensors with the same type: `<number> * <type>`, where
-  `<number>` is the name of an [Attr](#attrs) with type `int`.  The `<type>` can
-  either be a `tf.DType`,
-  or the name of an attr with type `type`.  As an example of the first, this
-  op accepts a list of `int32` tensors:
-
-  ```c++
-  REGISTER_OP("Int32SequenceExample")
-      .Attr("NumTensors: int")
-      .Input("in: NumTensors * int32")
-  ```
-
-  Whereas this op accepts a list of tensors of any type, as long as they are all
-  the same:
-
-  ```c++
-  REGISTER_OP("SameTypeSequenceExample")
-      .Attr("NumTensors: int")
-      .Attr("T: type")
-      .Input("in: NumTensors * T")
-  ```
-
-* For a reference to a tensor: `Ref(<type>)`, where `<type>` is one of the
-  previous types.
-
-> A note on naming: Any attr used in the type of an input will be inferred.  By
-> convention those inferred attrs use capital names (like `T` or `N`).
-> Otherwise inputs, outputs, and attrs have names like function parameters
-> (e.g. `num_outputs`).  For more details, see the
-> [earlier note on naming](#naming).
-
-For more details, see
-[`tensorflow/core/framework/op_def_builder.h`][op_def_builder].
-
-#### Backwards compatibility
-
-Let's assume you have written a nice, custom op and shared it with others, so
-you have happy customers using your operation.  However, you'd like to make
-changes to the op in some way.
-
-In general, changes to existing, checked-in specifications must be
-backwards-compatible: changing the specification of an op must not break prior
-serialized `GraphDef` protocol buffers constructed from older specifications.
-The details of `GraphDef` compatibility are
-[described here](../guide/version_compat.md#compatibility_of_graphs_and_checkpoints).
-
-There are several ways to preserve backwards-compatibility.
-
-1. Any new attrs added to an operation must have default values defined, and
-   with that default value the op must have the original behavior. To change an
-   operation from not polymorphic to polymorphic, you *must* give a default
-   value to the new type attr to preserve the original signature by default. For
-   example, if your operation was:
-
-       REGISTER_OP("MyGeneralUnaryOp")
-           .Input("in: float")
-           .Output("out: float");
-
-   you can make it polymorphic in a backwards-compatible way using:
-
-       REGISTER_OP("MyGeneralUnaryOp")
-           .Input("in: T")
-           .Output("out: T")
-           .Attr("T: numerictype = DT_FLOAT");
-
-2. You can safely make a constraint on an attr less restrictive.  For example,
-   you can change from `{int32, int64}` to `{int32, int64, float}` or `type`.
-   Or you may change from `{"apple", "orange"}` to `{"apple", "banana",
-   "orange"}` or `string`.
-
-3. You can change single inputs / outputs into list inputs / outputs, as long as
-   the default for the list type matches the old signature.
-
-4. You can add a new list input / output, if it defaults to empty.
-
-5. Namespace any new ops you create, by prefixing the op names with something
-   unique to your project. This avoids having your op colliding with any ops
-   that might be included in future versions of TensorFlow.
-
-6. Plan ahead! Try to anticipate future uses for the op. Some signature changes
-   can't be done in a compatible way (for example, making a list of the same
-   type into a list of varying types).
-
-The full list of safe and unsafe changes can be found in
-[`tensorflow/core/framework/op_compatibility_test.cc`](https://www.tensorflow.org/code/tensorflow/core/framework/op_compatibility_test.cc).
-If you cannot make your change to an operation backwards compatible, then create
-a new operation with a new name with the new semantics.
-
-Also note that while these changes can maintain `GraphDef` compatibility, the
-generated Python code may change in a way that isn't compatible with old
-callers.  The Python API may be kept compatible by careful changes in a
-hand-written Python wrapper, by keeping the old signature except possibly adding
-new optional arguments to the end.  Generally incompatible changes may only be
-made when TensorFlow's changes major versions, and must conform to the
-[`GraphDef` version semantics](../guide/version_compat.md#compatibility_of_graphs_and_checkpoints).
-
-### GPU Support
-
-You can implement different OpKernels and register one for CPU and another for
-GPU, just like you can [register kernels for different types](#polymorphism).
-There are several examples of kernels with GPU support in
-[`tensorflow/core/kernels/`](https://www.tensorflow.org/code/tensorflow/core/kernels/).
-Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
-ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.
-
-For example, the `tf.pad` has
-everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
-The GPU kernel is in
-[`tensorflow/core/kernels/pad_op_gpu.cu.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/pad_op_gpu.cu.cc),
-and the shared code is a templated class defined in
-[`tensorflow/core/kernels/pad_op.h`](https://www.tensorflow.org/code/tensorflow/core/kernels/pad_op.h).
-We organize the code this way for two reasons: it allows you to share common
-code among the CPU and GPU implementations, and it puts the GPU implementation
-into a separate file so that it can be compiled only by the GPU compiler.
-
-One thing to note, even when the GPU kernel version of `pad` is used, it still
-needs its `"paddings"` input in CPU memory.  To mark that inputs or outputs are
-kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
-
-```c++
-#define REGISTER_GPU_KERNEL(T)                         \
-  REGISTER_KERNEL_BUILDER(Name("Pad")                  \
-                              .Device(DEVICE_GPU)      \
-                              .TypeConstraint<T>("T")  \
-                              .HostMemory("paddings"), \
-                          PadOp<GPUDevice, T>)
-```
-
-#### Compiling the kernel for the GPU device
-
-Look at
-[cuda_op_kernel.cu.cc](https://www.tensorflow.org/code/tensorflow/examples/adding_an_op/cuda_op_kernel.cu.cc)
-for an example that uses a CUDA kernel to implement an op. The
-`tf_custom_op_library` accepts a `gpu_srcs` argument in which the list of source
-files containing the CUDA kernels (`*.cu.cc` files) can be specified. For use
-with a binary installation of TensorFlow, the CUDA kernels have to be compiled
-with NVIDIA's `nvcc` compiler. Here is the sequence of commands you can use to
-compile the
-[cuda_op_kernel.cu.cc](https://www.tensorflow.org/code/tensorflow/examples/adding_an_op/cuda_op_kernel.cu.cc)
-and
-[cuda_op_kernel.cc](https://www.tensorflow.org/code/tensorflow/examples/adding_an_op/cuda_op_kernel.cc)
-into a single dynamically loadable library:
-
-```bash
-nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
-  ${TF_CFLAGS[@]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
-
-g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
-  cuda_op_kernel.cu.o ${TF_CFLAGS[@]} -fPIC -lcudart ${TF_LFLAGS[@]}
-```
-
-`cuda_op_kernel.so` produced above can be loaded as usual in Python, using the
-`tf.load_op_library` function.
-
-Note that if your CUDA libraries are not installed in `/usr/local/lib64`,
-you'll need to specify the path explicitly in the second (g++) command above.
-For example, add `-L /usr/local/cuda-8.0/lib64/` if your CUDA is installed in
-`/usr/local/cuda-8.0`.
-
->   Note in some linux settings, additional options to `nvcc` compiling step are needed. Add `-D_MWAITXINTRIN_H_INCLUDED` to the `nvcc` command line to avoid errors from `mwaitxintrin.h`.
-
-### Implement the gradient in Python
-
-Given a graph of ops, TensorFlow uses automatic differentiation
-(backpropagation) to add new ops representing gradients with respect to the
-existing ops (see
-[Gradient Computation](../api_guides/python/train.md#gradient_computation)).
-To make automatic differentiation work for new ops, you must register a gradient
-function which computes gradients with respect to the ops' inputs given
-gradients with respect to the ops' outputs.
-
-Mathematically, if an op computes \\(y = f(x)\\) the registered gradient op
-converts gradients \\(\partial L/ \partial y\\) of loss \\(L\\) with respect to
-\\(y\\) into gradients \\(\partial L/ \partial x\\) with respect to \\(x\\) via
-the chain rule:
-
-$$\frac{\partial L}{\partial x}
-    = \frac{\partial L}{\partial y} \frac{\partial y}{\partial x}
-    = \frac{\partial L}{\partial y} \frac{\partial f}{\partial x}.$$
-
-In the case of `ZeroOut`, only one entry in the input affects the output, so the
-gradient with respect to the input is a sparse "one hot" tensor.  This is
-expressed as follows:
-
-```python
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import sparse_ops
-
-@ops.RegisterGradient("ZeroOut")
-def _zero_out_grad(op, grad):
-  """The gradients for `zero_out`.
-
-  Args:
-    op: The `zero_out` `Operation` that we are differentiating, which we can use
-      to find the inputs and outputs of the original op.
-    grad: Gradient with respect to the output of the `zero_out` op.
-
-  Returns:
-    Gradients with respect to the input of `zero_out`.
-  """
-  to_zero = op.inputs[0]
-  shape = array_ops.shape(to_zero)
-  index = array_ops.zeros_like(shape)
-  first_grad = array_ops.reshape(grad, [-1])[0]
-  to_zero_grad = sparse_ops.sparse_to_dense([index], shape, first_grad, 0)
-  return [to_zero_grad]  # List of one Tensor, since we have one input
-```
-
-Details about registering gradient functions with
-`tf.RegisterGradient`:
-
-* For an op with one output, the gradient function will take an
-  `tf.Operation` `op` and a
-  `tf.Tensor` `grad` and build new ops
-  out of the tensors
-  [`op.inputs[i]`](../../api_docs/python/framework.md#Operation.inputs),
-  [`op.outputs[i]`](../../api_docs/python/framework.md#Operation.outputs), and `grad`.  Information
-  about any attrs can be found via
-  `tf.Operation.get_attr`.
-
-* If the op has multiple outputs, the gradient function will take `op` and
-  `grads`, where `grads` is a list of gradients with respect to each output.
-  The result of the gradient function must be a list of `Tensor` objects
-  representing the gradients with respect to each input.
-
-* If there is no well-defined gradient for some input, such as for integer
-  inputs used as indices, the corresponding returned gradient should be
-  `None`.  For example, for an op taking a floating point tensor `x` and an
-  integer index `i`, the gradient function would `return [x_grad, None]`.
-
-* If there is no meaningful gradient for the op at all, you often will not have
-  to register any gradient, and as long as the op's gradient is never needed,
-  you will be fine. In some cases, an op has no well-defined gradient but can
-  be involved in the computation of the gradient. Here you can use
-  `ops.NotDifferentiable` to automatically propagate zeros backwards.
-
-Note that at the time the gradient function is called, only the data flow graph
-of ops is available, not the tensor data itself.  Thus, all computation must be
-performed using other tensorflow ops, to be run at graph execution time.
-
-### Shape functions in C++
-
-The TensorFlow API has a feature called "shape inference" that provides
-information about the shapes of tensors without having to execute the
-graph. Shape inference is supported by "shape functions" that are registered for
-each op type in the C++ `REGISTER_OP` declaration, and perform two roles:
-asserting that the shapes of the inputs are compatible during graph
-construction, and specifying the shapes for the outputs.
-
-Shape functions are defined as operations on the
-`shape_inference::InferenceContext` class. For example, in the shape function
-for ZeroOut:
-
-```c++
-    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
-      c->set_output(0, c->input(0));
-      return Status::OK();
-    });
-```
-
-`c->set_output(0, c->input(0));` declares that the first output's shape should
-be set to the first input's shape. If the output is selected by its index as in the above example, the second parameter of `set_output` should be a `ShapeHandle` object. You can create an empty `ShapeHandle` object by its default constructor. The `ShapeHandle` object for an input with index `idx` can be obtained by `c->input(idx)`.
-
-There are a number of common shape functions
-that apply to many ops, such as `shape_inference::UnchangedShape` which can be
-found in [common_shape_fns.h](https://www.tensorflow.org/code/tensorflow/core/framework/common_shape_fns.h) and used as follows:
-
-```c++
-REGISTER_OP("ZeroOut")
-    .Input("to_zero: int32")
-    .Output("zeroed: int32")
-    .SetShapeFn(::tensorflow::shape_inference::UnchangedShape);
-```
-
-A shape function can also constrain the shape of an input. For the version of
-[`ZeroOut` with a vector shape constraint](#validation), the shape function
-would be as follows:
-
-```c++
-    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
-      ::tensorflow::shape_inference::ShapeHandle input;
-      TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 1, &input));
-      c->set_output(0, input);
-      return Status::OK();
-    });
-```
-
-The `WithRank` call validates that the input shape `c->input(0)` has
-a shape with exactly one dimension (or if the input shape is unknown,
-the output shape will be a vector with one unknown dimension).
-
-If your op is [polymorphic with multiple inputs](#polymorphism), you can use
-members of `InferenceContext` to determine the number of shapes to check, and
-`Merge` to validate that the shapes are all compatible (alternatively, access
-attributes that indicate the lengths, with `InferenceContext::GetAttr`, which
-provides access to the attributes of the op).
-
-```c++
-    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
-      ::tensorflow::shape_inference::ShapeHandle input;
-      ::tensorflow::shape_inference::ShapeHandle output;
-      for (size_t i = 0; i < c->num_inputs(); ++i) {
-        TF_RETURN_IF_ERROR(c->WithRank(c->input(i), 2, &input));
-        TF_RETURN_IF_ERROR(c->Merge(output, input, &output));
-      }
-      c->set_output(0, output);
-      return Status::OK();
-    });
-```
-
-Since shape inference is an optional feature, and the shapes of tensors may vary
-dynamically, shape functions must be robust to incomplete shape information for
-any of the inputs. The `Merge` method in [`InferenceContext`](https://www.tensorflow.org/code/tensorflow/core/framework/shape_inference.h)
-allows the caller to assert that two shapes are the same, even if either
-or both of them do not have complete information. Shape functions are defined
-for all of the core TensorFlow ops and provide many different usage examples.
-
-The `InferenceContext` class has a number of functions that can be used to
-define shape function manipulations.  For example, you can validate that a
-particular dimension has a very specific value using `InferenceContext::Dim` and
-`InferenceContext::WithValue`; you can specify that an output dimension is the
-sum / product of two input dimensions using `InferenceContext::Add` and
-`InferenceContext::Multiply`. See the `InferenceContext` class for
-all of the various shape manipulations you can specify. The following example sets
-shape of the first output to (n, 3), where first input has shape (n, ...)
-
-```c++
-.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
-    c->set_output(0, c->Matrix(c->Dim(c->input(0), 0), 3));
-    return Status::OK();
-});
-```
-
-If you have a complicated shape function, you should consider adding a test for
-validating that various input shape combinations produce the expected output
-shape combinations.  You can see examples of how to write these tests in some
-our
-[core ops tests](https://www.tensorflow.org/code/tensorflow/core/ops/array_ops_test.cc).
-(The syntax of `INFER_OK` and `INFER_ERROR` are a little cryptic, but try to be
-compact in representing input and output shape specifications in tests.  For
-now, see the surrounding comments in those tests to get a sense of the shape
-string specification).
-
-
-[core-array_ops]:https://www.tensorflow.org/code/tensorflow/core/ops/array_ops.cc
-[python-user_ops]:https://www.tensorflow.org/code/tensorflow/python/user_ops/user_ops.py
-[tf-kernels]:https://www.tensorflow.org/code/tensorflow/core/kernels/
-[user_ops]:https://www.tensorflow.org/code/tensorflow/core/user_ops/
-[pad_op]:https://www.tensorflow.org/code/tensorflow/core/kernels/pad_op.cc
-[standard_ops-py]:https://www.tensorflow.org/code/tensorflow/python/ops/standard_ops.py
-[standard_ops-cc]:https://www.tensorflow.org/code/tensorflow/cc/ops/standard_ops.h
-[python-BUILD]:https://www.tensorflow.org/code/tensorflow/python/BUILD
-[validation-macros]:https://www.tensorflow.org/code/tensorflow/core/lib/core/errors.h
-[op_def_builder]:https://www.tensorflow.org/code/tensorflow/core/framework/op_def_builder.h
-[register_types]:https://www.tensorflow.org/code/tensorflow/core/framework/register_types.h
-[FinalizeAttr]:https://www.tensorflow.org/code/tensorflow/core/framework/op_def_builder.cc
-[DataTypeString]:https://www.tensorflow.org/code/tensorflow/core/framework/types.cc
-[python-BUILD]:https://www.tensorflow.org/code/tensorflow/python/BUILD
-[types-proto]:https://www.tensorflow.org/code/tensorflow/core/framework/types.proto
-[TensorShapeProto]:https://www.tensorflow.org/code/tensorflow/core/framework/tensor_shape.proto
-[TensorProto]:https://www.tensorflow.org/code/tensorflow/core/framework/tensor.proto