aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/tutorials/seq2seq/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/g3doc/tutorials/seq2seq/index.md')
-rw-r--r--tensorflow/g3doc/tutorials/seq2seq/index.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
index 3d64bcc91b..c1673f474d 100644
--- a/tensorflow/g3doc/tutorials/seq2seq/index.md
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -58,7 +58,7 @@ translation [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215)
In the basic model depicted above, every input has to be encoded into
a fixed-size state vector, as that is the only thing passed to the decoder.
To allow the decoder more direct access to the input, an *attention* mechanism
-was introduced in [Bahdanu et al., 2014](http://arxiv.org/abs/1409.0473)
+was introduced in [Bahdanau et al., 2014](http://arxiv.org/abs/1409.0473)
([pdf](http://arxiv.org/pdf/1409.0473.pdf)).
We will not go into the details of the attention mechanism (see the paper),
suffice it to say that it allows the decoder to peek into the input at every
@@ -176,8 +176,8 @@ projections are constructed by the following code in `seq2seq_model.py`.
```
First, note that we only construct a sampled softmax if the number of samples
-(512 by default) is smaller that the target vocabulary size. For vocabularies
-smaller than 512 it might be a better idea to just use a standard softmax loss.
+(512 by default) is smaller than the target vocabulary size. For vocabularies
+smaller than 512, it might be a better idea to just use a standard softmax loss.
Then, as you can see, we construct an output projection. It is a pair,
consisting of a weight matrix and a bias vector. If used, the rnn cell