aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/tutorials/seq2seq/index.md
diff options
context:
space:
mode:
authorGravatar A. Unique TensorFlower <gardener@tensorflow.org>2017-02-09 17:57:11 -0800
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2017-02-09 18:09:10 -0800
commit1fbdd3ff8705566cbb203b7e0c926777289c36b3 (patch)
treea48a5239f0ccfb2740bef9738240ea1e14211f45 /tensorflow/g3doc/tutorials/seq2seq/index.md
parent4fe968822111aed7a17f582af8caa6513785eb21 (diff)
Minor fixes to Seq2Seq tutorial.
Change: 147106298
Diffstat (limited to 'tensorflow/g3doc/tutorials/seq2seq/index.md')
-rw-r--r--tensorflow/g3doc/tutorials/seq2seq/index.md57
1 files changed, 36 insertions, 21 deletions
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
index 9d94167f9a..a438ffac67 100644
--- a/tensorflow/g3doc/tutorials/seq2seq/index.md
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -16,8 +16,8 @@ python translate.py --data_dir [your_data_directory]
```
It will download English-to-French translation data from the
-[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html)
-prepare it for training and train. It takes about 20GB of disk space,
+[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html),
+prepare it for training, and train. It takes about 20GB of disk space,
and a while to download and prepare (see [later](#lets-run-it) for details),
so you can start and leave it running while reading this tutorial.
@@ -56,7 +56,7 @@ a fixed-size state vector, as that is the only thing passed to the decoder.
To allow the decoder more direct access to the input, an *attention* mechanism
was introduced in [Bahdanau et al., 2014](http://arxiv.org/abs/1409.0473)
([pdf](http://arxiv.org/pdf/1409.0473.pdf)).
-We will not go into the details of the attention mechanism (see the paper),
+We will not go into the details of the attention mechanism (see the paper);
suffice it to say that it allows the decoder to peek into the input at every
decoding step. A multi-layer sequence-to-sequence network with LSTM cells and
attention mechanism in the decoder looks like this.
@@ -82,12 +82,12 @@ to the encoder, i.e., corresponding to the letters *A, B, C* in the first
picture above. Similarly, `decoder_inputs` are tensors representing inputs
to the decoder, *GO, W, X, Y, Z* on the first picture.
-The `cell` argument is an instance of the `models.rnn.rnn_cell.RNNCell` class
+The `cell` argument is an instance of the `tf.contrib.rnn.RNNCell` class
that determines which cell will be used inside the model. You can use
an existing cell, such as `GRUCell` or `LSTMCell`, or you can write your own.
-Moreover, `rnn_cell` provides wrappers to construct multi-layer cells,
-add dropout to cell inputs or outputs, or to do other transformations,
-see the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples.
+Moreover, `tf.contrib.rnn` provides wrappers to construct multi-layer cells,
+add dropout to cell inputs or outputs, or to do other transformations.
+See the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples.
The call to `basic_rnn_seq2seq` returns two arguments: `outputs` and `states`.
Both of them are lists of tensors of the same length as `decoder_inputs`.
@@ -107,7 +107,8 @@ For example, let's analyze the following use of an embedding RNN model.
outputs, states = embedding_rnn_seq2seq(
encoder_inputs, decoder_inputs, cell,
num_encoder_symbols, num_decoder_symbols,
- output_projection=None, feed_previous=False)
+ embedding_size, output_projection=None,
+ feed_previous=False)
```
In the `embedding_rnn_seq2seq` model, all inputs (both `encoder_inputs` and
@@ -140,7 +141,7 @@ in [Jean et. al., 2014](http://arxiv.org/abs/1412.2007)
([pdf](http://arxiv.org/pdf/1412.2007.pdf)).
In addition to `basic_rnn_seq2seq` and `embedding_rnn_seq2seq` there are a few
-more sequence-to-sequence models in `seq2seq.py`, take a look there. They all
+more sequence-to-sequence models in `seq2seq.py`; take a look there. They all
have similar interfaces, so we will not describe them in detail. We will use
`embedding_attention_seq2seq` for our translation model below.
@@ -159,16 +160,29 @@ of the output projection. Both the sampled softmax loss and the output
projections are constructed by the following code in `seq2seq_model.py`.
```python
- if num_samples > 0 and num_samples < self.target_vocab_size:
- w = tf.get_variable("proj_w", [size, self.target_vocab_size])
- w_t = tf.transpose(w)
- b = tf.get_variable("proj_b", [self.target_vocab_size])
- output_projection = (w, b)
-
- def sampled_loss(inputs, labels):
- labels = tf.reshape(labels, [-1, 1])
- return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples,
- self.target_vocab_size)
+
+if num_samples > 0 and num_samples < self.target_vocab_size:
+ w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype)
+ w = tf.transpose(w_t)
+ b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype)
+ output_projection = (w, b)
+
+ def sampled_loss(labels, inputs):
+ labels = tf.reshape(labels, [-1, 1])
+ # We need to compute the sampled_softmax_loss using 32bit floats to
+ # avoid numerical instabilities.
+ local_w_t = tf.cast(w_t, tf.float32)
+ local_b = tf.cast(b, tf.float32)
+ local_inputs = tf.cast(inputs, tf.float32)
+ return tf.cast(
+ tf.nn.sampled_softmax_loss(
+ weights=local_w_t,
+ biases=local_b,
+ labels=labels,
+ inputs=local_inputs,
+ num_sampled=num_samples,
+ num_classes=self.target_vocab_size),
+ dtype)
```
First, note that we only construct a sampled softmax if the number of samples
@@ -183,8 +197,9 @@ matrix and add the biases, as is done in lines 124-126 in `seq2seq_model.py`.
```python
if output_projection is not None:
- self.outputs[b] = [tf.matmul(output, output_projection[0]) +
- output_projection[1] for ...]
+ for b in xrange(len(buckets)):
+ self.outputs[b] = [tf.matmul(output, output_projection[0]) +
+ output_projection[1] for ...]
```
### Bucketing and padding