aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/how_tos/variable_scope/index.md
blob: f9221b207bbc485575bca58236ac562923f3b43b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# Sharing Variables

You can create, initialize, save and load single variables
in the way described in the [Variables HowTo](../variables/index.md).
But when building complex models you often need to share large sets of
variables and you might want to initialize all of them in one place.
This tutorial shows how this can be done using `tf.variable_scope()` and
the `tf.get_variable()`.

## The Problem

Imagine you create a simple model for image filters, similar to our
[Convolutional Neural Networks Tutorial](../../tutorials/deep_cnn/index.md)
model but with only 2 convolutions (for simplicity of this example). If you use
just `tf.Variable`, as explained in [Variables HowTo](../variables/index.md),
your model might look like this.

```python
def my_image_filter(input_images):
    conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
        name="conv1_weights")
    conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
    conv1 = tf.nn.conv2d(input_images, conv1_weights,
        strides=[1, 1, 1, 1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + conv1_biases)

    conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
        name="conv2_weights")
    conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
    conv2 = tf.nn.conv2d(relu1, conv2_weights,
        strides=[1, 1, 1, 1], padding='SAME')
    return tf.nn.relu(conv2 + conv2_biases)
```

As you can easily imagine, models quickly get much more complicated than
this one, and even here we already have 4 different variables: `conv1_weights`,
`conv1_biases`, `conv2_weights`, and `conv2_biases`.

The problem arises when you want to reuse this model. Assume you want to
apply your image filter to 2 different images, `image1` and `image2`.
You want both images processed by the same filer with the same parameters.
You can call `my_image_filter()` twice, but this will create two sets
of variables:

```python
# First call creates one set of variables.
result1 = my_image_filter(image1)
# Another set is created in the second call.
result2 = my_image_filter(image2)
```

A common way to share variables is to create them in a separate piece of code
and pass them to functions that use them.   For example by using a dictionary:

```python
variables_dict = {
    "conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
        name="conv1_weights")
    "conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
    ... etc. ...
}

def my_image_filter(input_images, variables_dict):
    conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
        strides=[1, 1, 1, 1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])

    conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
        strides=[1, 1, 1, 1], padding='SAME')
    return tf.nn.relu(conv2 + variables_dict["conv2_biases"])

# The 2 calls to my_image_filter() now use the same variables
result1 = my_image_filter(image1, variables_dict)
result2 = my_image_filter(image2, variables_dict)
```

While convenient, creating variables like above,
outside of the code, breaks encapsulation:

*  The code that builds the graph must document the names, types,
   and shapes of variables to create.
*  When the code changes, the callers may have to create more, or less,
   or different variables.

One way to address the problem is to use classes to create a model,
where the classes take care of managing the variables they need.
For a lighter solution, not involving classes, TensorFlow provides
a *Variable Scope* mechanism that allows to easily share named variables
while constructing a graph.

## Variable Scope Example

Variable Scope mechanism in TensorFlow consists of 2 main functions:

* `tf.get_variable(<name>, <shape>, <initializer>)`:
  Creates or returns a variable with a given name.
* `tf.variable_scope(<scope_name>)`:
  Manages namespaces for names passed to `tf.get_variable()`.

The function `tf.get_variable()` is used to get or create a variable instead
of a direct call to `tf.Variable`. It uses an *initializer* instead of passing
the value directly, as in `tf.Variable`. An initializer is a function that
takes the shape and provides a tensor with that shape. Here are some
initializers available in TensorFlow:

* `tf.constant_initializer(value)` initializes everything to the provided value,
* `tf.random_uniform_initializer(a, b)` initializes uniformly from [a, b],
* `tf.random_normal_initializer(mean, stddev)` initializes from the normal
  distribution with the given mean and standard deviation.

To see how `tf.get_variable()` solves the problem discussed
before, let's refactor the code that created one convolution into
a separate function, named `conv_relu`:

```python
def conv_relu(input, kernel_shape, bias_shape):
    # Create variable named "weights".
    weights = tf.get_variable("weights", kernel_shape,
        initializer=tf.random_normal_initializer())
    # Create variable named "biases".
    biases = tf.get_variable("biases", bias_shape,
        initializer=tf.constant_intializer(0.0))
    conv = tf.nn.conv2d(input, weights,
        strides=[1, 1, 1, 1], padding='SAME')
    return tf.nn.relu(conv + biases)
```

This function uses short names `"weights"` and `"biases"`.
We'd like to use it for both `conv1` and `conv2`, but
the variables need to have different names.
This is where `tf.variable_scope()` comes into play:
it pushes a namespace for variables.

```python
def my_image_filter(input_images):
    with tf.variable_scope("conv1"):
        # Variables created here will be named "conv1/weights", "conv1/biases".
        relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
    with tf.variable_scope("conv2"):
        # Variables created here will be named "conv2/weights", "conv2/biases".
        return conv_relu(relu1, [5, 5, 32, 32], [32])
```

Now, let's see what happens when we call `my_image_filter()` twice.

```
result1 = my_image_filter(image1)
result2 = my_image_filter(image2)
# Raises ValueError(... conv1/weights already exists ...)
```

As you can see, `tf.get_variable()` checks that already existing variables
are not shared by accident. If you want to share them, you need to specify
it by setting `reuse_variables()` as follows.

```
with tf.variable_scope("image_filters") as scope:
    result1 = my_image_filter(image1)
    scope.reuse_variables()
    result2 = my_image_filter(image2)
```

This is a good way to share variables, lightweight and safe.

## How Does Variable Scope Work?

### Understanding `tf.get_variable()`

To understand variable scope it is necessary to first
fully understand how `tf.get_variable()` works.
Here is how `tf.get_variable` is usually called.

```python
v = tf.get_variable(name, shape, dtype, initializer)
```

This call does one of two things depending on the scope it is called in.
Here are the two options.

* Case 1: the scope is set for creating new variables, as evidenced by
`tf.get_variable_scope().reuse == False`.

In this case, `v` will be a newly created `tf.Variable` with the provided
shape and data type. The full name of the created variable will be set to
the current variable scope name + the provided `name` and a check will be
performed to ensure that no variable with this full name exists yet.
If a variable with this full name already exists, the funtion will
raise a `ValueError`. If a new variable is created, it will be
initialized to the value `initializer(shape)`. For example:

```python
with tf.variable_scope("foo"):
    v = tf.get_variable("v", [1])
assert v.name == "foo/v:0"
```

* Case 2: the scope is set for reusing variables, as evidenced by
`tf.get_variable_scope().reuse == True`.

In this case, the call will search for an already existing variable with
name equal to the current variable scope name + the provided `name`.
If no such variable exists, a `ValueError` will be raised. If the variable
is found, it will be returned. For example:

```python
with tf.variable_scope("foo"):
    v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
    v1 = tf.get_variable("v", [1])
assert v1 == v
```

### Basics of `tf.variable_scope()`

Knowing how `tf.get_variable()` works makes it easy to understand variable
scope. The primary function of variable scope is to carry a name that will
be used as prefix for variable names and a reuse-flag to distinguish the two
cases described above. Nesting variable scopes appends their names in a way
analogous to how directories work:

```python
with tf.variable_scope("foo"):
    with tf.variable_scope("bar"):
        v = tf.get_variable("v", [1])
assert v.name == "foo/bar/v:0"
```

The current variable scope can be retrieved using `tf.get_variable_scope()`
and the `reuse` flag of the current variable scope can be set to `True` by
calling `tf.get_variable_scope().reuse_variables()`:

```python
with tf.variable_scope("foo"):
    v = tf.get_variable("v", [1])
    tf.get_variable_scope().reuse_variables()
    v1 = tf.get_variable("v", [1])
assert v1 == v
```

Note that you *cannot* set the `reuse` flag to `False`. The reason behind
this is to allow to compose functions that create models. Imagine you write
a function `my_image_filter(inputs)` as before. Someone calling the function
in a variable scope with `reuse=True` would expect all inner variables to be
reused as well. Allowing to force `reuse=False` inside the function would break
this contract and make it hard to share parameters in this way.

Even though you cannot set `reuse` to `False` explicitly, you can enter
a reusing variable scope and then exit it, going back to a non-reusing one.
This can be done using a `reuse=True` parameter when opening a variable scope.
Note also that, for the same reason as above, the `reuse` parameter is
inherited. So when you open a reusing variable scope, all sub-scopes will
be reusing too.

```python
with tf.variable_scope("root"):
    # At start, the scope is not reusing.
    assert tf.get_variable_scope().reuse == False
    with tf.variable_scope("foo"):
        # Opened a sub-scope, still not reusing.
        assert tf.get_variable_scope().reuse == False
    with tf.variable_scope("foo", reuse=True):
        # Explicitly opened a reusing scope.
        assert tf.get_variable_scope().reuse == True
        with tf.variable_scope("bar"):
            # Now sub-scope inherits the reuse flag.
            assert tf.get_variable_scope().reuse == True
    # Exited the reusing scope, back to a non-reusing one.
    assert tf.get_variable_scope().reuse == False
```

### Capturing variable scope

In all examples presented above, we shared parameters only because their
names agreed, that is, because we opened a reusing variable scope with
exactly the same string. In more complex cases, it might be useful to pass
a VariableScope object rather than rely on getting the names right.
To this end, variable scopes can be captured and used instead of names
when opening a new variable scope.

```python
with tf.variable_scope("foo") as foo_scope:
    v = tf.get_variable("v", [1])
with tf.variable_scope(foo_scope)
    w = tf.get_variable("w", [1])
with tf.variable_scope(foo_scope, reuse=True)
    v1 = tf.get_variable("v", [1])
    w1 = tf.get_variable("w", [1])
assert v1 == v
assert w1 == w
```

When opening a variable scope using a previously existing scope
we jump out of the current variable scope prefix to an entirely
different one. This is fully independent of where we do it.

```python
with tf.variable_scope("foo") as foo_scope:
    assert foo_scope.name == "foo"
with tf.variable_scope("bar")
    with tf.variable_scope("baz") as other_scope:
        assert other_scope.name == "bar/baz"
        with tf.variable_scope(foo_scope) as foo_scope2:
            assert foo_scope2.name == "foo"  # Not changed.
```

### Initializers in variable scope

Using `tf.get_variable()` allows to write functions that create or reuse
variables and can be transparently called from outside. But what if we wanted
to change the initializer of the created variables? Do we need to pass an extra
argument to every function that creates variables? What about the most common
case, when we want to set the default initializer for all variables in one
place, on top of all functions? To help with these cases, variable scope
can carry a default initializer. It is inherited by sub-scopes and passed
to each `tf.get_variable()` call. But it will be overridden if another
initializer is specified explicitly.

```python
with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
    v = tf.get_variable("v", [1])
    assert v.eval() == 0.4  # Default initializer as set above.
    w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
    assert w.eval() == 0.3  # Specific initializer overrides the default.
    with tf.variable_scope("bar"):
        v = get_variable("v", [1])
        assert v.eval() == 0.4  # Inherited default initializer.
    with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
        v = get_variable("v", [1])
        assert v.eval() == 0.2  # Changed default initializer.
```

### Names of ops in `tf.variable_scope()`

We discussed how `tf.variable_scope` governs the names of variables.
But how does it influence the names of other ops in the scope?
It is natural that ops created inside a variable scope should also
share that name. For this reason, when we do `with tf.variable_scope("name")`,
this implicitly opens a `tf.name_scope("name")`. For example:

```python
with tf.variable_scope("foo"):
    x = 1.0 + tf.get_variable("v", [1])
assert x.op.name == "foo/add"
```

Name scopes can be openend in addition to a variable scope, and then
they will only affect the names of the ops, but not of variables.

```python
with tf.variable_scope("foo"):
    with tf.name_scope("bar"):
        v = tf.get_variable("v", [1])
        x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"
```

When opening a variable scope using a captured object instead of a string,
we do not alter the current name scope for ops.


## Examples of Use

Here are pointers to a few files that make use of variable scope.
In particular, it is heavily used for recurrent neural networks
and sequence-to-sequence models.

File | What's in it?
--- | ---
`models/image/cifar10.py` | Model for detecting objects in images.
`models/rnn/rnn_cell.py` | Cell functions for recurrent neural networks.
`models/rnn/seq2seq.py` | Functions for building sequence-to-sequence models.