aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/how_tos/variable_scope/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/g3doc/how_tos/variable_scope/index.md')
-rw-r--r--tensorflow/g3doc/how_tos/variable_scope/index.md372
1 files changed, 372 insertions, 0 deletions
diff --git a/tensorflow/g3doc/how_tos/variable_scope/index.md b/tensorflow/g3doc/how_tos/variable_scope/index.md
new file mode 100644
index 0000000000..f9221b207b
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/variable_scope/index.md
@@ -0,0 +1,372 @@
+# Sharing Variables
+
+You can create, initialize, save and load single variables
+in the way described in the [Variables HowTo](../variables/index.md).
+But when building complex models you often need to share large sets of
+variables and you might want to initialize all of them in one place.
+This tutorial shows how this can be done using `tf.variable_scope()` and
+the `tf.get_variable()`.
+
+## The Problem
+
+Imagine you create a simple model for image filters, similar to our
+[Convolutional Neural Networks Tutorial](../../tutorials/deep_cnn/index.md)
+model but with only 2 convolutions (for simplicity of this example). If you use
+just `tf.Variable`, as explained in [Variables HowTo](../variables/index.md),
+your model might look like this.
+
+```python
+def my_image_filter(input_images):
+ conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+ name="conv1_weights")
+ conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
+ conv1 = tf.nn.conv2d(input_images, conv1_weights,
+ strides=[1, 1, 1, 1], padding='SAME')
+ relu1 = tf.nn.relu(conv1 + conv1_biases)
+
+ conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+ name="conv2_weights")
+ conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
+ conv2 = tf.nn.conv2d(relu1, conv2_weights,
+ strides=[1, 1, 1, 1], padding='SAME')
+ return tf.nn.relu(conv2 + conv2_biases)
+```
+
+As you can easily imagine, models quickly get much more complicated than
+this one, and even here we already have 4 different variables: `conv1_weights`,
+`conv1_biases`, `conv2_weights`, and `conv2_biases`.
+
+The problem arises when you want to reuse this model. Assume you want to
+apply your image filter to 2 different images, `image1` and `image2`.
+You want both images processed by the same filer with the same parameters.
+You can call `my_image_filter()` twice, but this will create two sets
+of variables:
+
+```python
+# First call creates one set of variables.
+result1 = my_image_filter(image1)
+# Another set is created in the second call.
+result2 = my_image_filter(image2)
+```
+
+A common way to share variables is to create them in a separate piece of code
+and pass them to functions that use them. For example by using a dictionary:
+
+```python
+variables_dict = {
+ "conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
+ name="conv1_weights")
+ "conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
+ ... etc. ...
+}
+
+def my_image_filter(input_images, variables_dict):
+ conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
+ strides=[1, 1, 1, 1], padding='SAME')
+ relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])
+
+ conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
+ strides=[1, 1, 1, 1], padding='SAME')
+ return tf.nn.relu(conv2 + variables_dict["conv2_biases"])
+
+# The 2 calls to my_image_filter() now use the same variables
+result1 = my_image_filter(image1, variables_dict)
+result2 = my_image_filter(image2, variables_dict)
+```
+
+While convenient, creating variables like above,
+outside of the code, breaks encapsulation:
+
+* The code that builds the graph must document the names, types,
+ and shapes of variables to create.
+* When the code changes, the callers may have to create more, or less,
+ or different variables.
+
+One way to address the problem is to use classes to create a model,
+where the classes take care of managing the variables they need.
+For a lighter solution, not involving classes, TensorFlow provides
+a *Variable Scope* mechanism that allows to easily share named variables
+while constructing a graph.
+
+## Variable Scope Example
+
+Variable Scope mechanism in TensorFlow consists of 2 main functions:
+
+* `tf.get_variable(<name>, <shape>, <initializer>)`:
+ Creates or returns a variable with a given name.
+* `tf.variable_scope(<scope_name>)`:
+ Manages namespaces for names passed to `tf.get_variable()`.
+
+The function `tf.get_variable()` is used to get or create a variable instead
+of a direct call to `tf.Variable`. It uses an *initializer* instead of passing
+the value directly, as in `tf.Variable`. An initializer is a function that
+takes the shape and provides a tensor with that shape. Here are some
+initializers available in TensorFlow:
+
+* `tf.constant_initializer(value)` initializes everything to the provided value,
+* `tf.random_uniform_initializer(a, b)` initializes uniformly from [a, b],
+* `tf.random_normal_initializer(mean, stddev)` initializes from the normal
+ distribution with the given mean and standard deviation.
+
+To see how `tf.get_variable()` solves the problem discussed
+before, let's refactor the code that created one convolution into
+a separate function, named `conv_relu`:
+
+```python
+def conv_relu(input, kernel_shape, bias_shape):
+ # Create variable named "weights".
+ weights = tf.get_variable("weights", kernel_shape,
+ initializer=tf.random_normal_initializer())
+ # Create variable named "biases".
+ biases = tf.get_variable("biases", bias_shape,
+ initializer=tf.constant_intializer(0.0))
+ conv = tf.nn.conv2d(input, weights,
+ strides=[1, 1, 1, 1], padding='SAME')
+ return tf.nn.relu(conv + biases)
+```
+
+This function uses short names `"weights"` and `"biases"`.
+We'd like to use it for both `conv1` and `conv2`, but
+the variables need to have different names.
+This is where `tf.variable_scope()` comes into play:
+it pushes a namespace for variables.
+
+```python
+def my_image_filter(input_images):
+ with tf.variable_scope("conv1"):
+ # Variables created here will be named "conv1/weights", "conv1/biases".
+ relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
+ with tf.variable_scope("conv2"):
+ # Variables created here will be named "conv2/weights", "conv2/biases".
+ return conv_relu(relu1, [5, 5, 32, 32], [32])
+```
+
+Now, let's see what happens when we call `my_image_filter()` twice.
+
+```
+result1 = my_image_filter(image1)
+result2 = my_image_filter(image2)
+# Raises ValueError(... conv1/weights already exists ...)
+```
+
+As you can see, `tf.get_variable()` checks that already existing variables
+are not shared by accident. If you want to share them, you need to specify
+it by setting `reuse_variables()` as follows.
+
+```
+with tf.variable_scope("image_filters") as scope:
+ result1 = my_image_filter(image1)
+ scope.reuse_variables()
+ result2 = my_image_filter(image2)
+```
+
+This is a good way to share variables, lightweight and safe.
+
+## How Does Variable Scope Work?
+
+### Understanding `tf.get_variable()`
+
+To understand variable scope it is necessary to first
+fully understand how `tf.get_variable()` works.
+Here is how `tf.get_variable` is usually called.
+
+```python
+v = tf.get_variable(name, shape, dtype, initializer)
+```
+
+This call does one of two things depending on the scope it is called in.
+Here are the two options.
+
+* Case 1: the scope is set for creating new variables, as evidenced by
+`tf.get_variable_scope().reuse == False`.
+
+In this case, `v` will be a newly created `tf.Variable` with the provided
+shape and data type. The full name of the created variable will be set to
+the current variable scope name + the provided `name` and a check will be
+performed to ensure that no variable with this full name exists yet.
+If a variable with this full name already exists, the funtion will
+raise a `ValueError`. If a new variable is created, it will be
+initialized to the value `initializer(shape)`. For example:
+
+```python
+with tf.variable_scope("foo"):
+ v = tf.get_variable("v", [1])
+assert v.name == "foo/v:0"
+```
+
+* Case 2: the scope is set for reusing variables, as evidenced by
+`tf.get_variable_scope().reuse == True`.
+
+In this case, the call will search for an already existing variable with
+name equal to the current variable scope name + the provided `name`.
+If no such variable exists, a `ValueError` will be raised. If the variable
+is found, it will be returned. For example:
+
+```python
+with tf.variable_scope("foo"):
+ v = tf.get_variable("v", [1])
+with tf.variable_scope("foo", reuse=True):
+ v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+### Basics of `tf.variable_scope()`
+
+Knowing how `tf.get_variable()` works makes it easy to understand variable
+scope. The primary function of variable scope is to carry a name that will
+be used as prefix for variable names and a reuse-flag to distinguish the two
+cases described above. Nesting variable scopes appends their names in a way
+analogous to how directories work:
+
+```python
+with tf.variable_scope("foo"):
+ with tf.variable_scope("bar"):
+ v = tf.get_variable("v", [1])
+assert v.name == "foo/bar/v:0"
+```
+
+The current variable scope can be retrieved using `tf.get_variable_scope()`
+and the `reuse` flag of the current variable scope can be set to `True` by
+calling `tf.get_variable_scope().reuse_variables()`:
+
+```python
+with tf.variable_scope("foo"):
+ v = tf.get_variable("v", [1])
+ tf.get_variable_scope().reuse_variables()
+ v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+Note that you *cannot* set the `reuse` flag to `False`. The reason behind
+this is to allow to compose functions that create models. Imagine you write
+a function `my_image_filter(inputs)` as before. Someone calling the function
+in a variable scope with `reuse=True` would expect all inner variables to be
+reused as well. Allowing to force `reuse=False` inside the function would break
+this contract and make it hard to share parameters in this way.
+
+Even though you cannot set `reuse` to `False` explicitly, you can enter
+a reusing variable scope and then exit it, going back to a non-reusing one.
+This can be done using a `reuse=True` parameter when opening a variable scope.
+Note also that, for the same reason as above, the `reuse` parameter is
+inherited. So when you open a reusing variable scope, all sub-scopes will
+be reusing too.
+
+```python
+with tf.variable_scope("root"):
+ # At start, the scope is not reusing.
+ assert tf.get_variable_scope().reuse == False
+ with tf.variable_scope("foo"):
+ # Opened a sub-scope, still not reusing.
+ assert tf.get_variable_scope().reuse == False
+ with tf.variable_scope("foo", reuse=True):
+ # Explicitly opened a reusing scope.
+ assert tf.get_variable_scope().reuse == True
+ with tf.variable_scope("bar"):
+ # Now sub-scope inherits the reuse flag.
+ assert tf.get_variable_scope().reuse == True
+ # Exited the reusing scope, back to a non-reusing one.
+ assert tf.get_variable_scope().reuse == False
+```
+
+### Capturing variable scope
+
+In all examples presented above, we shared parameters only because their
+names agreed, that is, because we opened a reusing variable scope with
+exactly the same string. In more complex cases, it might be useful to pass
+a VariableScope object rather than rely on getting the names right.
+To this end, variable scopes can be captured and used instead of names
+when opening a new variable scope.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+ v = tf.get_variable("v", [1])
+with tf.variable_scope(foo_scope)
+ w = tf.get_variable("w", [1])
+with tf.variable_scope(foo_scope, reuse=True)
+ v1 = tf.get_variable("v", [1])
+ w1 = tf.get_variable("w", [1])
+assert v1 == v
+assert w1 == w
+```
+
+When opening a variable scope using a previously existing scope
+we jump out of the current variable scope prefix to an entirely
+different one. This is fully independent of where we do it.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+ assert foo_scope.name == "foo"
+with tf.variable_scope("bar")
+ with tf.variable_scope("baz") as other_scope:
+ assert other_scope.name == "bar/baz"
+ with tf.variable_scope(foo_scope) as foo_scope2:
+ assert foo_scope2.name == "foo" # Not changed.
+```
+
+### Initializers in variable scope
+
+Using `tf.get_variable()` allows to write functions that create or reuse
+variables and can be transparently called from outside. But what if we wanted
+to change the initializer of the created variables? Do we need to pass an extra
+argument to every function that creates variables? What about the most common
+case, when we want to set the default initializer for all variables in one
+place, on top of all functions? To help with these cases, variable scope
+can carry a default initializer. It is inherited by sub-scopes and passed
+to each `tf.get_variable()` call. But it will be overridden if another
+initializer is specified explicitly.
+
+```python
+with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
+ v = tf.get_variable("v", [1])
+ assert v.eval() == 0.4 # Default initializer as set above.
+ w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
+ assert w.eval() == 0.3 # Specific initializer overrides the default.
+ with tf.variable_scope("bar"):
+ v = get_variable("v", [1])
+ assert v.eval() == 0.4 # Inherited default initializer.
+ with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
+ v = get_variable("v", [1])
+ assert v.eval() == 0.2 # Changed default initializer.
+```
+
+### Names of ops in `tf.variable_scope()`
+
+We discussed how `tf.variable_scope` governs the names of variables.
+But how does it influence the names of other ops in the scope?
+It is natural that ops created inside a variable scope should also
+share that name. For this reason, when we do `with tf.variable_scope("name")`,
+this implicitly opens a `tf.name_scope("name")`. For example:
+
+```python
+with tf.variable_scope("foo"):
+ x = 1.0 + tf.get_variable("v", [1])
+assert x.op.name == "foo/add"
+```
+
+Name scopes can be openend in addition to a variable scope, and then
+they will only affect the names of the ops, but not of variables.
+
+```python
+with tf.variable_scope("foo"):
+ with tf.name_scope("bar"):
+ v = tf.get_variable("v", [1])
+ x = 1.0 + v
+assert v.name == "foo/v:0"
+assert x.op.name == "foo/bar/add"
+```
+
+When opening a variable scope using a captured object instead of a string,
+we do not alter the current name scope for ops.
+
+
+## Examples of Use
+
+Here are pointers to a few files that make use of variable scope.
+In particular, it is heavily used for recurrent neural networks
+and sequence-to-sequence models.
+
+File | What's in it?
+--- | ---
+`models/image/cifar10.py` | Model for detecting objects in images.
+`models/rnn/rnn_cell.py` | Cell functions for recurrent neural networks.
+`models/rnn/seq2seq.py` | Functions for building sequence-to-sequence models.