diff options
author | A. Unique TensorFlower <gardener@tensorflow.org> | 2017-02-09 17:57:11 -0800 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2017-02-09 18:09:10 -0800 |
commit | 1fbdd3ff8705566cbb203b7e0c926777289c36b3 (patch) | |
tree | a48a5239f0ccfb2740bef9738240ea1e14211f45 /tensorflow/g3doc/tutorials/seq2seq/index.md | |
parent | 4fe968822111aed7a17f582af8caa6513785eb21 (diff) |
Minor fixes to Seq2Seq tutorial.
Change: 147106298
Diffstat (limited to 'tensorflow/g3doc/tutorials/seq2seq/index.md')
-rw-r--r-- | tensorflow/g3doc/tutorials/seq2seq/index.md | 57 |
1 files changed, 36 insertions, 21 deletions
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md index 9d94167f9a..a438ffac67 100644 --- a/tensorflow/g3doc/tutorials/seq2seq/index.md +++ b/tensorflow/g3doc/tutorials/seq2seq/index.md @@ -16,8 +16,8 @@ python translate.py --data_dir [your_data_directory] ``` It will download English-to-French translation data from the -[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html) -prepare it for training and train. It takes about 20GB of disk space, +[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html), +prepare it for training, and train. It takes about 20GB of disk space, and a while to download and prepare (see [later](#lets-run-it) for details), so you can start and leave it running while reading this tutorial. @@ -56,7 +56,7 @@ a fixed-size state vector, as that is the only thing passed to the decoder. To allow the decoder more direct access to the input, an *attention* mechanism was introduced in [Bahdanau et al., 2014](http://arxiv.org/abs/1409.0473) ([pdf](http://arxiv.org/pdf/1409.0473.pdf)). -We will not go into the details of the attention mechanism (see the paper), +We will not go into the details of the attention mechanism (see the paper); suffice it to say that it allows the decoder to peek into the input at every decoding step. A multi-layer sequence-to-sequence network with LSTM cells and attention mechanism in the decoder looks like this. @@ -82,12 +82,12 @@ to the encoder, i.e., corresponding to the letters *A, B, C* in the first picture above. Similarly, `decoder_inputs` are tensors representing inputs to the decoder, *GO, W, X, Y, Z* on the first picture. -The `cell` argument is an instance of the `models.rnn.rnn_cell.RNNCell` class +The `cell` argument is an instance of the `tf.contrib.rnn.RNNCell` class that determines which cell will be used inside the model. You can use an existing cell, such as `GRUCell` or `LSTMCell`, or you can write your own. -Moreover, `rnn_cell` provides wrappers to construct multi-layer cells, -add dropout to cell inputs or outputs, or to do other transformations, -see the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples. +Moreover, `tf.contrib.rnn` provides wrappers to construct multi-layer cells, +add dropout to cell inputs or outputs, or to do other transformations. +See the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples. The call to `basic_rnn_seq2seq` returns two arguments: `outputs` and `states`. Both of them are lists of tensors of the same length as `decoder_inputs`. @@ -107,7 +107,8 @@ For example, let's analyze the following use of an embedding RNN model. outputs, states = embedding_rnn_seq2seq( encoder_inputs, decoder_inputs, cell, num_encoder_symbols, num_decoder_symbols, - output_projection=None, feed_previous=False) + embedding_size, output_projection=None, + feed_previous=False) ``` In the `embedding_rnn_seq2seq` model, all inputs (both `encoder_inputs` and @@ -140,7 +141,7 @@ in [Jean et. al., 2014](http://arxiv.org/abs/1412.2007) ([pdf](http://arxiv.org/pdf/1412.2007.pdf)). In addition to `basic_rnn_seq2seq` and `embedding_rnn_seq2seq` there are a few -more sequence-to-sequence models in `seq2seq.py`, take a look there. They all +more sequence-to-sequence models in `seq2seq.py`; take a look there. They all have similar interfaces, so we will not describe them in detail. We will use `embedding_attention_seq2seq` for our translation model below. @@ -159,16 +160,29 @@ of the output projection. Both the sampled softmax loss and the output projections are constructed by the following code in `seq2seq_model.py`. ```python - if num_samples > 0 and num_samples < self.target_vocab_size: - w = tf.get_variable("proj_w", [size, self.target_vocab_size]) - w_t = tf.transpose(w) - b = tf.get_variable("proj_b", [self.target_vocab_size]) - output_projection = (w, b) - - def sampled_loss(inputs, labels): - labels = tf.reshape(labels, [-1, 1]) - return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples, - self.target_vocab_size) + +if num_samples > 0 and num_samples < self.target_vocab_size: + w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype) + w = tf.transpose(w_t) + b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype) + output_projection = (w, b) + + def sampled_loss(labels, inputs): + labels = tf.reshape(labels, [-1, 1]) + # We need to compute the sampled_softmax_loss using 32bit floats to + # avoid numerical instabilities. + local_w_t = tf.cast(w_t, tf.float32) + local_b = tf.cast(b, tf.float32) + local_inputs = tf.cast(inputs, tf.float32) + return tf.cast( + tf.nn.sampled_softmax_loss( + weights=local_w_t, + biases=local_b, + labels=labels, + inputs=local_inputs, + num_sampled=num_samples, + num_classes=self.target_vocab_size), + dtype) ``` First, note that we only construct a sampled softmax if the number of samples @@ -183,8 +197,9 @@ matrix and add the biases, as is done in lines 124-126 in `seq2seq_model.py`. ```python if output_projection is not None: - self.outputs[b] = [tf.matmul(output, output_projection[0]) + - output_projection[1] for ...] + for b in xrange(len(buckets)): + self.outputs[b] = [tf.matmul(output, output_projection[0]) + + output_projection[1] for ...] ``` ### Bucketing and padding |