Minor fixes to Seq2Seq tutorial.

Change: 147106298
author: A. Unique TensorFlower <gardener@tensorflow.org> 2017-02-09 17:57:11 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2017-02-09 18:09:10 -0800
commit: 1fbdd3ff8705566cbb203b7e0c926777289c36b3 (patch)
tree: a48a5239f0ccfb2740bef9738240ea1e14211f45 /tensorflow/g3doc/tutorials/seq2seq/index.md
parent: 4fe968822111aed7a17f582af8caa6513785eb21 (diff)
1 files changed, 36 insertions, 21 deletions
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
index 9d94167f9a..a438ffac67 100644
--- a/tensorflow/g3doc/tutorials/seq2seq/index.md
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -16,8 +16,8 @@ python translate.py --data_dir [your_data_directory]
 ```
 
 It will download English-to-French translation data from the
-[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html)
-prepare it for training and train. It takes about 20GB of disk space,
+[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html),
+prepare it for training, and train. It takes about 20GB of disk space,
 and a while to download and prepare (see [later](#lets-run-it) for details),
 so you can start and leave it running while reading this tutorial.
 
@@ -56,7 +56,7 @@ a fixed-size state vector, as that is the only thing passed to the decoder.
 To allow the decoder more direct access to the input, an *attention* mechanism
 was introduced in [Bahdanau et al., 2014](http://arxiv.org/abs/1409.0473)
 ([pdf](http://arxiv.org/pdf/1409.0473.pdf)).
-We will not go into the details of the attention mechanism (see the paper),
+We will not go into the details of the attention mechanism (see the paper);
 suffice it to say that it allows the decoder to peek into the input at every
 decoding step. A multi-layer sequence-to-sequence network with LSTM cells and
 attention mechanism in the decoder looks like this.
@@ -82,12 +82,12 @@ to the encoder, i.e., corresponding to the letters *A, B, C* in the first
 picture above. Similarly, `decoder_inputs` are tensors representing inputs
 to the decoder, *GO, W, X, Y, Z* on the first picture.
 
-The `cell` argument is an instance of the `models.rnn.rnn_cell.RNNCell` class
+The `cell` argument is an instance of the `tf.contrib.rnn.RNNCell` class
 that determines which cell will be used inside the model. You can use
 an existing cell, such as `GRUCell` or `LSTMCell`, or you can write your own.
-Moreover, `rnn_cell` provides wrappers to construct multi-layer cells,
-add dropout to cell inputs or outputs, or to do other transformations,
-see the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples.
+Moreover, `tf.contrib.rnn` provides wrappers to construct multi-layer cells,
+add dropout to cell inputs or outputs, or to do other transformations.
+See the [RNN Tutorial](../../tutorials/recurrent/index.md) for examples.
 
 The call to `basic_rnn_seq2seq` returns two arguments: `outputs` and `states`.
 Both of them are lists of tensors of the same length as `decoder_inputs`.
@@ -107,7 +107,8 @@ For example, let's analyze the following use of an embedding RNN model.
 outputs, states = embedding_rnn_seq2seq(
     encoder_inputs, decoder_inputs, cell,
     num_encoder_symbols, num_decoder_symbols,
-    output_projection=None, feed_previous=False)
+    embedding_size, output_projection=None,
+    feed_previous=False)
 ```
 
 In the `embedding_rnn_seq2seq` model, all inputs (both `encoder_inputs` and
@@ -140,7 +141,7 @@ in [Jean et. al., 2014](http://arxiv.org/abs/1412.2007)
 ([pdf](http://arxiv.org/pdf/1412.2007.pdf)).
 
 In addition to `basic_rnn_seq2seq` and `embedding_rnn_seq2seq` there are a few
-more sequence-to-sequence models in `seq2seq.py`, take a look there. They all
+more sequence-to-sequence models in `seq2seq.py`; take a look there. They all
 have similar interfaces, so we will not describe them in detail. We will use
 `embedding_attention_seq2seq` for our translation model below.
 
@@ -159,16 +160,29 @@ of the output projection. Both the sampled softmax loss and the output
 projections are constructed by the following code in `seq2seq_model.py`.
 
 ```python
-  if num_samples > 0 and num_samples < self.target_vocab_size:
-    w = tf.get_variable("proj_w", [size, self.target_vocab_size])
-    w_t = tf.transpose(w)
-    b = tf.get_variable("proj_b", [self.target_vocab_size])
-    output_projection = (w, b)
-
-    def sampled_loss(inputs, labels):
-      labels = tf.reshape(labels, [-1, 1])
-      return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples,
-                                        self.target_vocab_size)
+
+if num_samples > 0 and num_samples < self.target_vocab_size:
+  w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype)
+  w = tf.transpose(w_t)
+  b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype)
+  output_projection = (w, b)
+
+  def sampled_loss(labels, inputs):
+    labels = tf.reshape(labels, [-1, 1])
+    # We need to compute the sampled_softmax_loss using 32bit floats to
+    # avoid numerical instabilities.
+    local_w_t = tf.cast(w_t, tf.float32)
+    local_b = tf.cast(b, tf.float32)
+    local_inputs = tf.cast(inputs, tf.float32)
+    return tf.cast(
+        tf.nn.sampled_softmax_loss(
+            weights=local_w_t,
+            biases=local_b,
+            labels=labels,
+            inputs=local_inputs,
+            num_sampled=num_samples,
+            num_classes=self.target_vocab_size),
+        dtype)
 ```
 
 First, note that we only construct a sampled softmax if the number of samples
@@ -183,8 +197,9 @@ matrix and add the biases, as is done in lines 124-126 in `seq2seq_model.py`.
 
 ```python
 if output_projection is not None:
-  self.outputs[b] = [tf.matmul(output, output_projection[0]) +
-                     output_projection[1] for ...]
+  for b in xrange(len(buckets)):
+    self.outputs[b] = [tf.matmul(output, output_projection[0]) +
+                       output_projection[1] for ...]
 ```
 
 ### Bucketing and padding
author	A. Unique TensorFlower <gardener@tensorflow.org>	2017-02-09 17:57:11 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-02-09 18:09:10 -0800
commit	1fbdd3ff8705566cbb203b7e0c926777289c36b3 (patch)
tree	a48a5239f0ccfb2740bef9738240ea1e14211f45 /tensorflow/g3doc/tutorials/seq2seq/index.md
parent	4fe968822111aed7a17f582af8caa6513785eb21 (diff)