diff options
Diffstat (limited to 'tensorflow/docs_src/api_guides/python/contrib.seq2seq.md')
-rw-r--r-- | tensorflow/docs_src/api_guides/python/contrib.seq2seq.md | 138 |
1 files changed, 0 insertions, 138 deletions
diff --git a/tensorflow/docs_src/api_guides/python/contrib.seq2seq.md b/tensorflow/docs_src/api_guides/python/contrib.seq2seq.md deleted file mode 100644 index 54f2fafc71..0000000000 --- a/tensorflow/docs_src/api_guides/python/contrib.seq2seq.md +++ /dev/null @@ -1,138 +0,0 @@ -# Seq2seq Library (contrib) -[TOC] - -Module for constructing seq2seq models and dynamic decoding. Builds on top of -libraries in `tf.contrib.rnn`. - -This library is composed of two primary components: - -* New attention wrappers for `tf.contrib.rnn.RNNCell` objects. -* A new object-oriented dynamic decoding framework. - -## Attention - -Attention wrappers are `RNNCell` objects that wrap other `RNNCell` objects and -implement attention. The form of attention is determined by a subclass of -`tf.contrib.seq2seq.AttentionMechanism`. These subclasses describe the form -of attention (e.g. additive vs. multiplicative) to use when creating the -wrapper. An instance of an `AttentionMechanism` is constructed with a -`memory` tensor, from which lookup keys and values tensors are created. - -### Attention Mechanisms - -The two basic attention mechanisms are: - -* `tf.contrib.seq2seq.BahdanauAttention` (additive attention, - [ref.](https://arxiv.org/abs/1409.0473)) -* `tf.contrib.seq2seq.LuongAttention` (multiplicative attention, - [ref.](https://arxiv.org/abs/1508.04025)) - -The `memory` tensor passed the attention mechanism's constructor is expected to -be shaped `[batch_size, memory_max_time, memory_depth]`; and often an additional -`memory_sequence_length` vector is accepted. If provided, the `memory` -tensors' rows are masked with zeros past their true sequence lengths. - -Attention mechanisms also have a concept of depth, usually determined as a -construction parameter `num_units`. For some kinds of attention (like -`BahdanauAttention`), both queries and memory are projected to tensors of depth -`num_units`. For other kinds (like `LuongAttention`), `num_units` should match -the depth of the queries; and the `memory` tensor will be projected to this -depth. - -### Attention Wrappers - -The basic attention wrapper is `tf.contrib.seq2seq.AttentionWrapper`. -This wrapper accepts an `RNNCell` instance, an instance of `AttentionMechanism`, -and an attention depth parameter (`attention_size`); as well as several -optional arguments that allow one to customize intermediate calculations. - -At each time step, the basic calculation performed by this wrapper is: - -```python -cell_inputs = concat([inputs, prev_state.attention], -1) -cell_output, next_cell_state = cell(cell_inputs, prev_state.cell_state) -score = attention_mechanism(cell_output) -alignments = softmax(score) -context = matmul(alignments, attention_mechanism.values) -attention = tf.layers.Dense(attention_size)(concat([cell_output, context], 1)) -next_state = AttentionWrapperState( - cell_state=next_cell_state, - attention=attention) -output = attention -return output, next_state -``` - -In practice, a number of the intermediate calculations are configurable. -For example, the initial concatenation of `inputs` and `prev_state.attention` -can be replaced with another mixing function. The function `softmax` can -be replaced with alternative options when calculating `alignments` from the -`score`. Finally, the outputs returned by the wrapper can be configured to -be the value `cell_output` instead of `attention`. - -The benefit of using a `AttentionWrapper` is that it plays nicely with -other wrappers and the dynamic decoder described below. For example, one can -write: - -```python -cell = tf.contrib.rnn.DeviceWrapper(LSTMCell(512), "/device:GPU:0") -attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs) -attn_cell = tf.contrib.seq2seq.AttentionWrapper( - cell, attention_mechanism, attention_size=256) -attn_cell = tf.contrib.rnn.DeviceWrapper(attn_cell, "/device:GPU:1") -top_cell = tf.contrib.rnn.DeviceWrapper(LSTMCell(512), "/device:GPU:1") -multi_cell = MultiRNNCell([attn_cell, top_cell]) -``` - -The `multi_rnn` cell will perform the bottom layer calculations on GPU 0; -attention calculations will be performed on GPU 1 and immediately passed -up to the top layer which is also calculated on GPU 1. The attention is -also passed forward in time to the next time step and copied to GPU 0 for the -next time step of `cell`. (*Note*: This is just an example of use, -not a suggested device partitioning strategy.) - -## Dynamic Decoding - -Example usage: - -``` python -cell = # instance of RNNCell - -if mode == "train": - helper = tf.contrib.seq2seq.TrainingHelper( - input=input_vectors, - sequence_length=input_lengths) -elif mode == "infer": - helper = tf.contrib.seq2seq.GreedyEmbeddingHelper( - embedding=embedding, - start_tokens=tf.tile([GO_SYMBOL], [batch_size]), - end_token=END_SYMBOL) - -decoder = tf.contrib.seq2seq.BasicDecoder( - cell=cell, - helper=helper, - initial_state=cell.zero_state(batch_size, tf.float32)) -outputs, _ = tf.contrib.seq2seq.dynamic_decode( - decoder=decoder, - output_time_major=False, - impute_finished=True, - maximum_iterations=20) -``` - -### Decoder base class and functions - -* `tf.contrib.seq2seq.Decoder` -* `tf.contrib.seq2seq.dynamic_decode` - -### Basic Decoder - -* `tf.contrib.seq2seq.BasicDecoderOutput` -* `tf.contrib.seq2seq.BasicDecoder` - -### Decoder Helpers - -* `tf.contrib.seq2seq.Helper` -* `tf.contrib.seq2seq.CustomHelper` -* `tf.contrib.seq2seq.GreedyEmbeddingHelper` -* `tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper` -* `tf.contrib.seq2seq.ScheduledOutputTrainingHelper` -* `tf.contrib.seq2seq.TrainingHelper` |