path: root/tensorflow/g3doc/how_tos/adding_an_op/index.md
diff options
authorGravatar Manjunath Kudlur <keveman@gmail.com>2015-11-06 16:27:58 -0800
committerGravatar Manjunath Kudlur <keveman@gmail.com>2015-11-06 16:27:58 -0800
commitf41959ccb2d9d4c722fe8fc3351401d53bcf4900 (patch)
treeef0ca22cb2a5ac4bdec9d080d8e0788a53ed496d /tensorflow/g3doc/how_tos/adding_an_op/index.md
TensorFlow: Initial commit of TensorFlow library.
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108
Diffstat (limited to 'tensorflow/g3doc/how_tos/adding_an_op/index.md')
1 files changed, 1015 insertions, 0 deletions
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/index.md b/tensorflow/g3doc/how_tos/adding_an_op/index.md
new file mode 100644
index 0000000000..5c6243cd9c
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/index.md
@@ -0,0 +1,1015 @@
+# Adding a New Op to TensorFlow
+* Some familiarity with C++.
+* Must have [downloaded TensorFlow source](../../get_started/index.md#source),
+ and be able to build it.
+If you'd like to incorporate an operation that isn't covered by the existing
+library, you can create a custom Op. To incorporate your custom Op, you'll need
+* Register the new Op in a C++ file. The Op registration is independent of the
+ implementation, and describes the semantics of how the Op is invoked. For
+ example, it defines the Op name, and specifies its inputs and outputs.
+* Implement the Op in C++. This implementation is called a "kernel", and there
+ can be multiple kernels for different architectures (e.g. CPUs, GPUs) or
+ input / output types.
+* Create a Python wrapper. This wrapper is the public API to create the Op. A
+ default wrapper is generated from the Op registration, which can be used
+ directly or added to.
+* Optionally, write a function to compute gradients for the Op.
+* Optionally, write a function that describes the input and output shapes
+ for the Op. This allows shape inference to work with your Op.
+* Test the Op, typically in Python. If you define gradients, verify them with
+ the Python [`GradientChecker`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/kernel_tests/gradient_checker.py).
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Define the Op's interface](#define_interface)
+* [Implement the kernel for the Op](#AUTOGENERATED-implement-the-kernel-for-the-op)
+* [Generate the client wrapper](#AUTOGENERATED-generate-the-client-wrapper)
+ * [The Python Op wrapper](#AUTOGENERATED-the-python-op-wrapper)
+ * [The C++ Op wrapper](#AUTOGENERATED-the-c---op-wrapper)
+* [Verify it works](#AUTOGENERATED-verify-it-works)
+* [Validation](#validation)
+* [Op registration](#AUTOGENERATED-op-registration)
+ * [Attrs](#AUTOGENERATED-attrs)
+ * [Attr types](#AUTOGENERATED-attr-types)
+ * [Polymorphism](#polymorphism)
+ * [Inputs and Outputs](#AUTOGENERATED-inputs-and-outputs)
+ * [Backwards compatibility](#AUTOGENERATED-backwards-compatibility)
+* [GPU Support](#mult-archs)
+* [Implement the gradient in Python](#AUTOGENERATED-implement-the-gradient-in-python)
+* [Implement a shape function in Python](#AUTOGENERATED-implement-a-shape-function-in-python)
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+## Define the Op's interface <div class="md-anchor" id="define_interface">{#define_interface}</div>
+You define the interface of an Op by registering it with the TensorFlow system.
+In the registration, you specify the name of your Op, its inputs (types and
+names) and outputs (types and names), as well as [docstrings](#docstrings) and
+any [attrs](#attrs) the Op might require.
+To see how this works, suppose you'd like to create an Op that takes a tensor of
+`int32`s and outputs a copy of the tensor, with all but the first element set to
+zero. Create file [`tensorflow/core/user_ops`][user_ops]`/zero_out.cc` and
+add a call to the `REGISTER_OP` macro that defines the interface for such an Op:
+#include "tensorflow/core/framework/op.h"
+ .Input("to_zero: int32")
+ .Output("zeroed: int32");
+This `ZeroOut` Op takes one tensor `to_zero` of 32-bit integers as input, and
+outputs a tensor `zeroed` of 32-bit integers.
+> A note on naming: The name of the Op should be unique and CamelCase. Names
+> starting with an underscore (`_`) are reserved for internal use.
+## Implement the kernel for the Op <div class="md-anchor" id="AUTOGENERATED-implement-the-kernel-for-the-op">{#AUTOGENERATED-implement-the-kernel-for-the-op}</div>
+After you define the interface, provide one or more implementations of the Op.
+To create one of these kernels, create a class that extends `OpKernel` and
+overrides the `Compute` method. The `Compute` method provides one `context`
+argument of type `OpKernelContext*`, from which you can access useful things
+like the input and output tensors.
+Add your kernel to the file you created above. The kernel might look something
+like this:
+#include "tensorflow/core/framework/op_kernel.h"
+using namespace tensorflow;
+class ZeroOutOp : public OpKernel {
+ public:
+ explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+ void Compute(OpKernelContext* context) override {
+ // Grab the input tensor
+ const Tensor& input_tensor = context->input(0);
+ auto input = input_tensor.flat<int32>();
+ // Create an output tensor
+ Tensor* output_tensor = NULL;
+ OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+ &output_tensor));
+ auto output = output_tensor->template flat<int32>();
+ // Set all but the first element of the output tensor to 0.
+ const int N = input.size();
+ for (int i = 1; i < N; i++) {
+ output(i) = 0;
+ }
+ // Preserve the first input value if possible.
+ if (N > 0) output(0) = input(0);
+ }
+After implementing your kernel, you register it with the TensorFlow system. In
+the registration, you specify different constraints under which this kernel
+will run. For example, you might have one kernel made for CPUs, and a separate
+one for GPUs.
+To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
+TODO: instructions or pointer to building TF
+At this point, the Tensorflow system can reference and use the Op when
+## Generate the client wrapper <div class="md-anchor" id="AUTOGENERATED-generate-the-client-wrapper">{#AUTOGENERATED-generate-the-client-wrapper}</div>
+### The Python Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-python-op-wrapper">{#AUTOGENERATED-the-python-op-wrapper}</div>
+Python op wrappers are created automatically in
+`bazel-genfiles/tensorflow/python/ops/gen_user_ops.py` for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory when you build Tensorflow.
+Those ops are imported into
+[`tensorflow/python/user_ops/user_ops.py`][python-user_ops] with the statement:
+from tensorflow.python.ops.gen_user_ops import *
+You may optionally use your own function instead. To do this, you first hide
+the generated code for that op by adding its name to the `hidden` list in the
+`"user_ops"` rule in
+ name = "user_ops",
+ hidden = [
+ "Fact",
+ ],
+ require_shape_functions = False,
+List your op next to `"Fact"`. Next you add your replacement function to
+Typically your function will call the generated function to actually add the op
+to the graph. The hidden version of the generated function will be in the
+`gen_user_ops` package and start with an underscore ("`_`"). For example:
+def my_fact():
+ """Example of overriding the generated code for an Op."""
+ return gen_user_ops._fact()
+### The C++ Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-c---op-wrapper">{#AUTOGENERATED-the-c---op-wrapper}</div>
+C++ op wrappers are created automatically for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory, when you build Tensorflow. For
+example, ops in `tensorflow/core/user_ops/zero_out.cc` will generate wrappers in
+All generated wrappers for user ops are automatically
+imported into [`tensorflow/cc/ops/standard_ops.h`][standard_ops-cc] with the
+#include "tensorflow/cc/ops/user_ops.h"
+## Verify it works <div class="md-anchor" id="AUTOGENERATED-verify-it-works">{#AUTOGENERATED-verify-it-works}</div>
+A good way to verify that you've successfully implemented your Op is to write a
+test for it. Create the file
+`tensorflow/python/kernel_tests/zero_out_op_test.py` with the contents:
+[TODO]:# (put tests somewhere else and make sure it works)
+import tensorflow as tf
+class ZeroOutTest(tf.test.TestCase):
+ def testZeroOut(self):
+ with self.test_session():
+ result = tf.user_ops.zero_out([5, 4, 3, 2, 1])
+ self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+Then run your test:
+$ bazel test tensorflow/python:zero_out_op_test
+## Validation <div class="md-anchor" id="validation">{#validation}</div>
+The example above assumed that the Op applied to a tensor of any shape. What
+if it only applied to vectors? That means adding a check to the above OpKernel
+ void Compute(OpKernelContext* context) override {
+ // Grab the input tensor
+ const Tensor& input_tensor = context->input(0);
+ OP_REQUIRES(context, TensorShapeUtils::IsVector(input_tensor.shape()),
+ errors::InvalidArgument("ZeroOut expects a 1-D vector."));
+ // ...
+ }
+This asserts that the input is a vector, and returns having set the
+`InvalidArgument` status if it isn't. The
+[OP_REQUIRES macro][validation-macros] takes three arguments:
+* The `context`, which can either be an `OpKernelContext` or
+ `OpKernelConstruction` pointer (see
+ [`tensorflow/core/framework/op_kernel.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_kernel.h)),
+ for its `SetStatus()` method.
+* The condition. For example, there are functions for validating the shape
+ of a tensor in [`tensorflow/core/public/tensor_shape.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/tensor_shape.h)
+* The error itself, which is represented by a `Status` object, see
+ [`tensorflow/core/public/status.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/status.h). A
+ `Status` has both a type (frequently `InvalidArgument`, but see the list of
+ types) and a message. Functions for constructing an error may be found in
+ [`tensorflow/core/lib/core/errors.h`][validation-macros].
+Alternatively, if you want to test whether a `Status` object returned from some
+function is an error, and if so return it, use
+[`OP_REQUIRES_OK`][validation-macros]. Both of these macros return from the
+function on error.
+## Op registration <div class="md-anchor" id="AUTOGENERATED-op-registration">{#AUTOGENERATED-op-registration}</div>
+### Attrs <div class="md-anchor" id="AUTOGENERATED-attrs">{#AUTOGENERATED-attrs}</div>
+Ops can have attrs, whose values are set when the Op is added to a graph. These
+are used to configure the Op, and their values can be accessed both within the
+kernel implementation and in the types of inputs and outputs in the Op
+registration. Prefer using an input instead of an attr when possible, since
+inputs are more flexible. They can change every step, be set using a feed, etc.
+Attrs are used for things that can't be done with inputs: any configuration
+that affects the signature (number or type of inputs or outputs) or that
+can't change from step-to-step.
+You define an attr when you register the Op, by specifying its name and type
+using the `Attr` method, which expects a spec of the form:
+<name>: <attr-type-expr>
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores, and `<attr-type-expr>` is a type expression of the
+form [described below](#attr-types)
+For example, if you'd like the `ZeroOut` Op to preserve a user-specified index,
+instead of only the 0th element, you can register the Op like so:
+<code class="lang-c++"><pre>
+ <b>.Attr("preserve_index: int")</b>
+ .Input("to_zero: int32")
+ .Output("zeroed: int32");
+Your kernel can then access this attr in its constructor via the `context`
+<code class="lang-c++"><pre>
+class ZeroOutOp : public OpKernel {
+ public:
+ explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {<b>
+ // Get the index of the value to preserve
+ OP_REQUIRES_OK(context->GetAttr("preserve\_index", &preserve\_index\_));
+ </b>}
+ void Compute(OpKernelContext\* context) override {
+ // ...
+ }
+ <b>private:
+ int preserve\_index\_;</b>
+which can then be used in the `Compute` method:
+<code class="lang-c++"><pre>
+ void Compute(OpKernelContext\* context) override {
+ // ...
+ // Set all the elements of the output tensor to 0
+ const int N = input.size();
+ for (int i=0; i < N; i++) {
+ output\_flat(i) = 0;
+ }<br>
+ <b>// Preserve the requested input value
+ output\_flat(preserve\_index\_) = input(preserve\_index\_);</b>
+ }
+[TODO]:# (check the code in this section in and test it)
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+> <b>.Attr("preserve\_index: int = 0")</b>
+> .Input("to_zero: int32")
+> .Output("zeroed: int32");
+> </pre></code>
+### Attr types <div class="md-anchor" id="AUTOGENERATED-attr-types">{#AUTOGENERATED-attr-types}</div>
+The following types are supported in an attr:
+* `string`: Any sequence of bytes (not required to be UTF8).
+* `int`: A signed integer.
+* `float`: A floating point number.
+* `bool`: True or false.
+* `type`: One of the (non-ref) values of [`DataType`][DataTypeString].
+* `shape`: A [`TensorShapeProto`][TensorShapeProto].
+* `tensor`: A [`TensorProto`][TensorProto].
+* `list(<type>)`: A list of `<type>`, where `<type>` is one of the above types.
+ Note that `list(list(<type>))` is invalid.
+See also: [op_def_builder.cc:FinalizeAttr][FinalizeAttr] for a definitive list.
+#### Default values & constraints
+Attrs may have default values, and some types of attrs can have constraints. To
+define an attr with constraints, you can use the following `<attr-type-expr>`s:
+* `{'<string1>', '<string2>'}`: The value must be a string that has either the
+ value `<string1>` or `<string2>`. The name of the type, `string`, is implied
+ when you use this syntax. This emulates an enum:
+ ```c++
+ REGISTER_OP("EnumExample")
+ .Attr("e: {'apple', 'orange'}");
+ ```
+* `{<type1>, <type2>}`: The value is of type `type`, and must be one of
+ `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
+ [tensor types](../../resources/dims_types.md#data-types). You don't specify
+ that the type of the attr is `type`. This is implied when you have a list of
+ types in `{...}`. For example, in this case the attr `t` is a type that must
+ be an `int32`, a `float`, or a `bool`:
+ ```c++
+ REGISTER_OP("RestrictedTypeExample")
+ .Attr("t: {int32, float, bool}");
+ ```
+* There are shortcuts for common type constraints:
+ * `numbertype`: Type `type` restricted to the numeric (non-string and
+ non-bool) types.
+ * `realnumbertype`: Like `numbertype` without complex types.
+ * `quantizedtype`: Like `numbertype` but just the quantized number types.
+ The specific lists of types allowed by these are defined by the functions
+ (like `NumberTypes()`) in
+ [`tensorflow/core/framework/types.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.h).
+ In this example the attr `t` must be one of the numeric types:
+ ```c++
+ REGISTER_OP("NumberType")
+ .Attr("t: numbertype");
+ ```
+ For this op:
+ ```python
+ tf.number_type(t=tf.int32) # Valid
+ tf.number_type(t=tf.bool) # Invalid
+ ```
+* `int >= <n>`: The value must be an int whose value is greater than or equal to
+ `<n>`, where `<n>` is a natural number.
+ For example, the following Op registration specifies that the attr `a` must
+ have a value that is at least `2`:
+ ```c++
+ REGISTER_OP("MinIntExample")
+ .Attr("a: int >= 2");
+ ```
+* `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
+ or equal to `<n>`.
+ For example, the following Op registration specifies that the attr `a` is a
+ list of types (either `int32` or `float`), and that there must be at least 3
+ of them:
+ ```c++
+ REGISTER_OP("TypeListExample")
+ .Attr("a: list({int32, float}) >= 3");
+ ```
+To set a default value for an attr (making it optional in the generated code),
+add `= <default>` to the end, as in:
+ .Attr("i: int = 0");
+The supported syntax of the default value is what would be used in the proto
+representation of the resulting GraphDef definition.
+Here are examples for how to specify a default for all types:
+ .Attr("s: string = 'foo'")
+ .Attr("i: int = 0")
+ .Attr("f: float = 1.0")
+ .Attr("b: bool = true")
+ .Attr("ty: type = DT_INT32")
+ .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+ .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+ .Attr("l_empty: list(int) = []")
+ .Attr("l_int: list(int) = [2, 3, 5, 7]");
+Note in particular that the values of type `type` use [the `DT_*` names
+for the types](../../resources/dims_types.md#data-types).
+### Polymorphism <div class="md-anchor" id="polymorphism">{#polymorphism}</div>
+#### Type Polymorphism {#type-polymorphism}
+For ops that can take different types as input or produce different output
+types, you can specify [an attr](#attrs) in
+[an input or output type](#inputs-outputs) in the Op registration. Typically
+you would then register an `OpKernel` for each supported type.
+For instance, if you'd like the `ZeroOut` Op to work on `float`s
+in addition to `int32`s, your Op registration might look like:
+<code class="lang-c++"><pre>
+ <b>.Attr("T: {float, int32}")</b>
+ .Input("to_zero: <b>T</b>")
+ .Output("zeroed: <b>T</b>");
+Your Op registration now specifies that the input's type must be `float`, or
+`int32`, and that its output will be the same type, since both have type `T`.
+> A note on naming:{#naming} Inputs, outputs, and attrs generally should be
+> given snake_case names. The one exception is attrs that are used as the type
+> of an input or in the type of an input. Those attrs can be inferred when the
+> op is added to the graph and so don't appear in the op's function. For
+> example, this last definition of ZeroOut will generate a Python function that
+> looks like:
+> ```python
+> def zero_out(to_zero, name=None):
+> """...
+> Args:
+> to_zero: A `Tensor`. Must be one of the following types:
+> `float32`, `int32`.
+> name: A name for the operation (optional).
+> Returns:
+> A `Tensor`. Has the same type as `x`.
+> """
+> ```
+> If `to_zero` is passed an `int32` tensor, then `T` is automatically set to
+> `int32` (well, actually `DT_INT32`). Those inferred attrs are given
+> Capitalized or CamelCase names.
+> Compare this with an op that has a type attr that determines the output
+> type:
+> ```c++
+> REGISTER_OP("StringToNumber")
+> .Input("string_tensor: string")
+> .Output("output: out_type")
+> .Attr("out_type: {float, int32}");
+> .Doc(R"doc(
+> Converts each string in the input Tensor to the specified numeric type.
+> )doc");
+> ```
+> In this case, the user has to specify the output type, as in the generated
+> Python:
+> ```python
+> def string_to_number(string_tensor, out_type=None, name=None):
+> """Converts each string in the input Tensor to the specified numeric type.
+> Args:
+> string_tensor: A `Tensor` of type `string`.
+> out_type: An optional `tf.DType` from: `tf.float32, tf.int32`.
+> Defaults to `tf.float32`.
+> name: A name for the operation (optional).
+> Returns:
+> A `Tensor` of type `out_type`.
+> """
+> ```
+<code class="lang-c++"><pre>
+\#include "tensorflow/core/framework/op_kernel.h"<br/>
+class ZeroOut<b>Int32</b>Op : public OpKernel {
+ // as before
+class ZeroOut<b>Float</b>Op : public OpKernel {
+ public:
+ explicit ZeroOut<b>Float</b>Op(OpKernelConstruction\* context)
+ : OpKernel(context) {}<br/>
+ void Compute(OpKernelContext\* context) override {
+ // Grab the input tensor
+ const Tensor& input\_tensor = context-&gt;input(0);
+ auto input = input\_tensor.flat&lt;<b>float</b>&gt;();<br/>
+ // Create an output tensor
+ Tensor* output = NULL;
+ OP\_REQUIRES\_OK(context,
+ context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+ auto output\_flat = output-&gt;template flat&lt;<b>float</b>&gt;();<br/>
+ // Set all the elements of the output tensor to 0
+ const int N = input.size();
+ for (int i = 0; i &lt; N; i++) {
+ output\_flat(i) = 0;
+ }<br/>
+ // Preserve the first input value
+ if (N &gt; 0) output\_flat(0) = input(0);
+ }
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+ Name("ZeroOut")
+ .Device(DEVICE\_CPU)
+ <b>.TypeConstraint&lt;int32&gt;("T"),</b>
+ ZeroOutOp<b>Int32</b>);
+ Name("ZeroOut")
+ .Device(DEVICE\_CPU)
+ .TypeConstraint&lt;float&gt;("T"),
+ ZeroOutFloatOp);
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+> <b>.Attr("T: {float, int32} = DT_INT32")</b>
+> .Input("to_zero: T")
+> .Output("zeroed: T")
+> </pre></code>
+Lets say you wanted to add more types, say `double`:
+<code class="lang-c++"><pre>
+ <b>.Attr("T: {float, <b>double,</b> int32}")</b>
+ .Input("to_zero: <b>T</b>")
+ .Output("zeroed: <b>T</b>");
+Instead of writing another `OpKernel` with redundant code as above, often you
+will be able to use a C++ template instead. You will still have one kernel
+registration (`REGISTER\_KERNEL\_BUILDER` call) per overload.
+<code class="lang-c++"><pre>
+<b>template &lt;typename T&gt;</b>
+class ZeroOutOp : public OpKernel {
+ public:
+ explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {}<br/>
+ void Compute(OpKernelContext\* context) override {
+ // Grab the input tensor
+ const Tensor& input\_tensor = context-&gt;input(0);
+ auto input = input\_tensor.flat<b>&lt;T&gt;</b>();<br/>
+ // Create an output tensor
+ Tensor* output = NULL;
+ OP\_REQUIRES\_OK(context,
+ context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+ auto output\_flat = output-&gt;template flat<b>&lt;T&gt;</b>();<br/>
+ // Set all the elements of the output tensor to 0
+ const int N = input.size();
+ for (int i = 0; i &lt; N; i++) {
+ output\_flat(i) = 0;
+ }<br/>
+ // Preserve the first input value
+ if (N &gt; 0) output\_flat(0) = input(0);
+ }
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+ Name("ZeroOut")
+ .Device(DEVICE\_CPU)
+ .TypeConstraint&lt;int32&gt;("T"),
+ <b>ZeroOutOp&lt;int32&gt;</b>);
+ Name("ZeroOut")
+ .Device(DEVICE\_CPU)
+ .TypeConstraint&lt;float&gt;("T"),
+ <b>ZeroOutOp&lt;float&gt;</b>);
+ Name("ZeroOut")
+ .Device(DEVICE\_CPU)
+ .TypeConstraint&lt;double&gt;("T"),
+ ZeroOutOp&lt;double&gt;);
+If you have more than a couple overloads, you can put the registration in a
+#include "tensorflow/core/framework/op_kernel.h"
+#define REGISTER_KERNEL(type) \
+ Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+ ZeroOutOp<type>)
+Depending on the list of types you are registering the kernel for, you may be
+able to use a macro provided by
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+ .Attr("T: realnumbertypes")
+ .Input("to_zero: T")
+ .Output("zeroed: T");
+template <typename T>
+class ZeroOutOp : public OpKernel { ... };
+#define REGISTER_KERNEL(type) \
+ Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+ ZeroOutOp<type>)
+#### List Inputs and Outputs {#list-input-output}
+In addition to being able to accept or produce different types, ops can consume
+or produce a variable number of tensors.
+In the next example, the attr `T` holds a *list* of types, and is used as the
+type of both the input `in` and the output `out`. The input and output are
+lists of tensors of that type (and the number and types of tensors in the output
+are the same as the input, since both have type `T`).
+ .Attr("T: list(type)")
+ .Input("in: T")
+ .Output("out: T");
+You can also place restrictions on what types can be specified in the list. In
+this next case, the input is a list of `float` and `double` tensors. The Op
+accepts, for example, input types `(float, double, float)` and in that case the
+output type would also be `(float, double, float)`.
+ .Attr("T: list({float, double})")
+ .Input("in: T")
+ .Output("out: T");
+If you want all the tensors in a list to be of the same type, you might do
+something like:
+ .Attr("N: int")
+ .Input("in: N * int32")
+ .Output("out: int32");
+This accepts a list of `int32` tensors, and uses an `int` attr `N` to
+specify the length of the list.
+This can be made [type polymorphic](#type-polymorphism) as well. In the next
+example, the input is a list of tensors (with length `"N"`) of the same (but
+unspecified) type (`"T"`), and the output is a single tensor of matching type:
+ .Attr("N: int")
+ .Attr("T: type")
+ .Input("in: N * T")
+ .Output("out: T");
+By default, tensor lists have a minimum length of 1. You can change that default
+[a `">="` constraint on the corresponding attr](#default-values-constraints).
+In this next example, the input is a list of at least 2 `int32` tensors:
+ .Attr("N: int >= 2")
+ .Input("in: N * int32")
+ .Output("out: int32");
+The same syntax works with `"list(type)"` attrs:
+ .Attr("T: list(type) >= 3")
+ .Input("in: T")
+ .Output("out: T");
+### Inputs and Outputs <div class="md-anchor" id="AUTOGENERATED-inputs-and-outputs">{#AUTOGENERATED-inputs-and-outputs}</div>
+To summarize the above, an Op registration can have multiple inputs and outputs:
+ .Input("y: int32")
+ .Input("z: float")
+ .Output("a: string")
+ .Output("b: int32");
+Each input or output spec is of the form:
+<name>: <io-type-expr>
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores. `<io-type-expr>` is one of the following type
+* `<type>`, where `<type>` is a supported input type (e.g. `float`, `int32`,
+ `string`). This specifies a single tensor of the given type.
+ See
+ [the list of supported Tensor types](../../resources/dims_types.md#data-types).
+ ```c++
+ REGISTER_OP("BuiltInTypesExample")
+ .Input("integers: int32")
+ .Input("complex_numbers: scomplex64");
+ ```
+* `<attr-type>`, where `<attr-type>` is the name of an [Attr](#attrs) with type
+ `type` or `list(type)` (with a possible type restriction). This syntax allows
+ for [polymorphic ops](#polymorphism).
+ ```c++
+ REGISTER_OP("PolymorphicSingleInput")
+ .Attr("T: type")
+ .Input("in: T);
+ REGISTER_OP("RestrictedPolymorphicSingleInput")
+ .Attr("T: {int32, int64}")
+ .Input("in: T);
+ ```
+ Referencing an attr of type `list(type)` allows you to accept a sequence of
+ tensors.
+ ```c++
+ REGISTER_OP("ArbitraryTensorSequenceExample")
+ .Attr("T: list(type)")
+ .Input("in: T")
+ .Output("out: T");
+ REGISTER_OP("RestrictedTensorSequenceExample")
+ .Attr("T: list({int32, int64})")
+ .Input("in: T")
+ .Output("out: T");
+ ```
+ Note that the number and types of tensors in the output `out` is the same as
+ in the input `in`, since both are of type `T`.
+* For a sequence of tensors with the same type: `<number> * <type>`, where
+ `<number>` is the name of an [Attr](#attrs) with type `int`. The `<type>` can
+ either be
+ [a specific type like `int32` or `float`](../../resources/dims_types.md#data-types),
+ or the name of an attr with type `type`. As an example of the first, this
+ Op accepts a list of `int32` tensors:
+ ```c++
+ REGISTER_OP("Int32SequenceExample")
+ .Attr("NumTensors: int")
+ .Input("in: NumTensors * int32")
+ ```
+ Whereas this Op accepts a list of tensors of any type, as long as they are all
+ the same:
+ ```c++
+ REGISTER_OP("SameTypeSequenceExample")
+ .Attr("NumTensors: int")
+ .Attr("T: type")
+ .Input("in: NumTensors * T")
+ ```
+* For a reference to a tensor: `Ref(<type>)`, where `<type>` is one of the
+ previous types.
+> A note on naming: Any attr used in the type of an input will be inferred. By
+> convention those inferred attrs use capital names (like `T` or `N`).
+> Otherwise inputs, outputs, and attrs have names like function parameters
+> (e.g. `num_outputs`). For more details, see the
+> [earlier note on naming](#naming).
+For more details, see
+### Backwards compatibility <div class="md-anchor" id="AUTOGENERATED-backwards-compatibility">{#AUTOGENERATED-backwards-compatibility}</div>
+In general, changes to specifications must be backwards-compatible: changing the
+specification of an Op must not break prior serialized GraphDefs constructed
+from older specfications.
+There are several ways to preserve backwards-compatibility.
+1. Any new attrs added to an operation must have default values defined, and
+ with that default value the Op must have the original behavior. To change an
+ operation from not polymorphic to polymorphic, you *must* give a default
+ value to the new type attr to preserve the original signature by default. For
+ example, if your operation was:
+ ```c++
+ REGISTER_OP("MyGeneralUnaryOp")
+ .Input("in: float")
+ .Output("out: float");
+ ```
+ you can make it polymorphic in a backwards-compatible way using:
+ ```c++
+ REGISTER_OP("MyGeneralUnaryOp")
+ .Input("in: T")
+ .Output("out: T")
+ .Attr("T: numerictype = float");
+ ```
+1. You can safely make a constraint on an attr less restrictive. For example,
+ you can change from `{int32, int64}` to `{int32, int64, float}` or from
+ `{"apple", "orange"}` to `{"apple", "banana", "orange"}`.
+1. Namespace any new Ops you create, by prefixing the Op names with something
+ unique to your project. This avoids having your Op colliding with any Ops
+ that might be included in future versions of Tensorflow.
+1. Plan ahead! Try to anticipate future uses for the Op. Some signature changes
+ can't be done in a compatible way (for example, adding an input, or making a
+ single input into a list).
+If you cannot make your change to an operation backwards compatible, then
+create a new operation with a new name with the new semantics.
+## GPU Support <div class="md-anchor" id="mult-archs">{#mult-archs}</div>
+You can implement different OpKernels and register one for CPU and another for
+GPU, just like you can [register kernels for different types](#polymorphism).
+There are several examples of kernels with GPU support in
+Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
+ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.
+For example, the [`pad` op](../../api_docs/python/array_ops.md#pad) has
+everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
+The GPU kernel is in
+and the shared code is a templated class defined in
+One thing to note, even when the GPU kernel version of `pad` is used, it still
+needs its `"paddings"` input in CPU memory. To mark that inputs or outputs are
+kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
+ .Device(DEVICE_GPU) \
+ .TypeConstraint<T>("T") \
+ .HostMemory("paddings"), \
+ PadOp<GPUDevice, T>)
+## Implement the gradient in Python <div class="md-anchor" id="AUTOGENERATED-implement-the-gradient-in-python">{#AUTOGENERATED-implement-the-gradient-in-python}</div>
+[TODO]:# (Write this!)
+## Implement a shape function in Python <div class="md-anchor" id="AUTOGENERATED-implement-a-shape-function-in-python">{#AUTOGENERATED-implement-a-shape-function-in-python}</div>
+The TensorFlow Python API has a feature called "shape inference" that provides
+information about the shapes of tensors without having to execute the
+graph. Shape inference is supported by "shape functions" that are registered for
+each op type, and perform two roles: asserting that the shapes of the inputs are
+compatible, and specifying the shapes for the outputs. A shape function is a
+Python function that takes an
+[`Operation`](../../api_docs/python/framework.md#Operation) as input, and
+returns a list of
+[`TensorShape`](../../api_docs/python/framework.md#TensorShape) objects (one per
+output of the op). To register a shape function, apply the
+[`tf.RegisterShape` decorator](../../api_docs/python/framework.md#RegisterShape)
+to a shape function. For example, the
+[ZeroOut op defined above](#define_interface) would have a shape function like
+the following:
+def _zero_out_shape(op):
+ """Shape function for the ZeroOut op.
+ This is the unconstrained version of ZeroOut, which produces an output
+ with the same shape as its input.
+ """
+ return [op.inputs[0].get_shape()]
+A shape function can also constrain the shape of an input. For the version of
+[ZeroOut with a vector shape constraint](#validation), the shape function
+would be as follows:
+def _zero_out_shape(op):
+ """Shape function for the ZeroOut op.
+ This is the constrained version of ZeroOut, which requires the input to
+ have rank 1 (a vector).
+ """
+ input_shape = op.inputs[0].get_shape().with_rank(1)
+ return [input_shape]
+If your op is [polymorphic with multiple inputs](#polymorphism), use the
+properties of the operation to determine the number of shapes to check:
+def _int_list_input_example_shape(op):
+ """Shape function for the "IntListInputExample" op.
+ All inputs and the output are matrices of the same size.
+ """
+ output_shape = tf.TensorShape(None)
+ for input in op.inputs:
+ output_shape = output_shape.merge_with(input.get_shape().with_rank(2))
+ return [output_shape]
+Since shape inference is an optional feature, and the shapes of tensors may vary
+dynamically, shape functions must be robust to incomplete shape information for
+any of the inputs. The [`merge_with()`](../../api_docs/python/framework.md)
+method allows the caller to assert that two shapes are the same, even if either
+or both of them do not have complete information. Shape functions are defined
+for all of the
+[standard Python ops](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/),
+and provide many different usage examples.