aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md')
-rw-r--r--tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md411
1 files changed, 411 insertions, 0 deletions
diff --git a/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md b/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md
new file mode 100644
index 0000000000..37bce5b76d
--- /dev/null
+++ b/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md
@@ -0,0 +1,411 @@
+# Recurrent Neural Networks for Drawing Classification
+
+[Quick, Draw!]: http://quickdraw.withgoogle.com
+
+[Quick, Draw!] is a game where a player is challenged to draw a number of
+objects and see if a computer can recognize the drawing.
+
+The recognition in [Quick, Draw!] is performed by a classifier that takes the
+user input, given as a sequence of strokes of points in x and y, and recognizes
+the object category that the user tried to draw.
+
+In this tutorial we'll show how to build an RNN-based recognizer for this
+problem. The model will use a combination of convolutional layers, LSTM layers,
+and a softmax output layer to classify the drawings:
+
+<center> ![RNN model structure](../../images/quickdraw_model.png) </center>
+
+The figure above shows the structure of the model that we will build in this
+tutorial. The input is a drawing that is encoded as a sequence of strokes of
+points in x, y, and n, where n indicates whether a the point is the first point
+in a new stroke.
+
+Then, a series of 1-dimensional convolutions is applied. Then LSTM layers are
+applied and the sum of the outputs of all LSTM steps is fed into a softmax layer
+to make a classification decision among the classes of drawings that we know.
+
+This tutorial uses the data from actual [Quick, Draw!] games [that is publicly
+available](https://quickdraw.withgoogle.com/data). This dataset contains of 50M
+drawings in 345 categories.
+
+## Run the tutorial code
+
+To try the code for this tutorial:
+
+1. @{$install$Install TensorFlow} if you haven't already.
+1. Download the [tutorial code]
+(https://github.com/tensorflow/models/tree/master/tutorials/rnn/quickdraw/train_model.py).
+1. [Download the data](#download-the-data) in `TFRecord` format from
+ [here](http://download.tensorflow.org/data/quickdraw_tutorial_dataset_v1.tar.gz) and unzip it. More details about [how to
+ obtain the original Quick, Draw!
+ data](#optional_download_the_full_quick_draw_data) and [how to convert that
+ to `TFRecord` files](#optional_converting_the_data) is available below.
+
+1. Execute the tutorial code with the following command to train the RNN-based
+ model described in this tutorial. Make sure to adjust the paths to point to
+ the unzipped data from the download in step 3.
+
+```shell
+ python train_model.py \
+ --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? \
+ --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? \
+ --classes_file=rnn_tutorial_data/training.tfrecord.classes
+```
+
+## Tutorial details
+
+### Download the data
+
+We make the data that we use in this tutorial available as `TFRecord` files
+containing `TFExamples`. You can download the data from here:
+
+http://download.tensorflow.org/data/quickdraw_tutorial_dataset_v1.tar.gz
+
+Alternatively you can download the original data in `ndjson` format from the
+Google cloud and convert it to the `TFRecord` files containing `TFExamples`
+yourself as described in the next section.
+
+### Optional: Download the full Quick Draw Data
+
+The full [Quick, Draw!](https://quickdraw.withgoogle.com)
+[dataset](https://quickdraw.withgoogle.com/data) is available on Google Cloud
+Storage as [ndjson](http://ndjson.org/) files separated by category. You can
+[browse the list of files in Cloud
+Console](https://console.cloud.google.com/storage/quickdraw_dataset).
+
+To download the data we recommend using
+[gsutil](https://cloud.google.com/storage/docs/gsutil_install#install) to
+download the entire dataset. Note that the original .ndjson files require
+downloading ~22GB.
+
+Then use the following command to check that your gsutil installation works and
+that you can access the data bucket:
+
+```shell
+gsutil ls -r "gs://quickdraw_dataset/full/simplified/*"
+```
+
+which will output a long list of files like the following:
+
+```shell
+gs://quickdraw_dataset/full/simplified/The Eiffel Tower.ndjson
+gs://quickdraw_dataset/full/simplified/The Great Wall of China.ndjson
+gs://quickdraw_dataset/full/simplified/The Mona Lisa.ndjson
+gs://quickdraw_dataset/full/simplified/aircraft carrier.ndjson
+...
+```
+
+Then create a folder and download the dataset there.
+
+```shell
+mkdir rnn_tutorial_data
+cd rnn_tutorial_data
+gsutil -m cp "gs://quickdraw_dataset/full/simplified/*" .
+```
+
+This download will take a while and download a bit more than 23GB of data.
+
+### Optional: Converting the data
+
+To convert the `ndjson` files to
+@{$python/python_io#TFRecords_Format_Details$TFRecord} files containing
+[`tf.train.Example`](https://www.tensorflow.org/code/tensorflow/core/example/example.proto)
+protos run the following command.
+
+```shell
+ python create_dataset.py --ndjson_path rnn_tutorial_data \
+ --output_path rnn_tutorial_data
+```
+
+This will store the data in 10 shards of
+@{$python/python_io#TFRecords_Format_Details$TFRecord} files with 10000 items
+per class for the training data and 1000 items per class as eval data.
+
+This conversion process is described in more detail in the following.
+
+The original QuickDraw data is formatted as `ndjson` files where each line
+contains a JSON object like the following:
+
+```json
+{"word":"cat",
+ "countrycode":"VE",
+ "timestamp":"2017-03-02 23:25:10.07453 UTC",
+ "recognized":true,
+ "key_id":"5201136883597312",
+ "drawing":[
+ [
+ [130,113,99,109,76,64,55,48,48,51,59,86,133,154,170,203,214,217,215,208,186,176,162,157,132],
+ [72,40,27,79,82,88,100,120,134,152,165,184,189,186,179,152,131,114,100,89,76,0,31,65,70]
+ ],[
+ [76,28,7],
+ [136,128,128]
+ ],[
+ [76,23,0],
+ [160,164,175]
+ ],[
+ [87,52,37],
+ [175,191,204]
+ ],[
+ [174,220,246,251],
+ [134,132,136,139]
+ ],[
+ [175,255],
+ [147,168]
+ ],[
+ [171,208,215],
+ [164,198,210]
+ ],[
+ [130,110,108,111,130,139,139,119],
+ [129,134,137,144,148,144,136,130]
+ ],[
+ [107,106],
+ [96,113]
+ ]
+ ]
+}
+```
+
+For our purpose of building a classifier we only care about the fields "`word`"
+and "`drawing`". While parsing the ndjson files, we process them line by line
+using a function that converts the strokes from the `drawing` field into a
+tensor of size `[number of points, 3]` containing the differences of consecutive
+points. This function also returns the class name as a string.
+
+```python
+def parse_line(ndjson_line):
+ """Parse an ndjson line and return ink (as np array) and classname."""
+ sample = json.loads(ndjson_line)
+ class_name = sample["word"]
+ inkarray = sample["drawing"]
+ stroke_lengths = [len(stroke[0]) for stroke in inkarray]
+ total_points = sum(stroke_lengths)
+ np_ink = np.zeros((total_points, 3), dtype=np.float32)
+ current_t = 0
+ for stroke in inkarray:
+ for i in [0, 1]:
+ np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
+ current_t += len(stroke[0])
+ np_ink[current_t - 1, 2] = 1 # stroke_end
+ # Preprocessing.
+ # 1. Size normalization.
+ lower = np.min(np_ink[:, 0:2], axis=0)
+ upper = np.max(np_ink[:, 0:2], axis=0)
+ scale = upper - lower
+ scale[scale == 0] = 1
+ np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
+ # 2. Compute deltas.
+ np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
+ return np_ink, class_name
+```
+
+Since we want the data to be shuffled for writing we read from each of the
+category files in random order and write to a random shard.
+
+For the training data we read the first 10000 items for each class and for the
+eval data we read the next 1000 items for each class.
+
+This data is then reformatted into a tensor of shape `[num_training_samples,
+max_length, 3]`. Then we determine the bounding box of the original drawing in
+screen coordinates and normalize the size such that the drawing has unit height.
+
+<center> ![Size normalization](../../images/quickdraw_sizenormalization.png) </center>
+
+Finally, we compute the differences between consecutive points and store these
+as a `VarLenFeature` in a
+[tensorflow.Example](https://www.tensorflow.org/code/tensorflow/core/example/example.proto)
+under the key `ink`. In addition we store the `class_index` as a single entry
+`FixedLengthFeature` and the `shape` of the `ink` as a `FixedLengthFeature` of
+length 2.
+
+### Defining the model
+
+To define the model we create a new `Estimator`. If you want to read more about
+estimators, we recommend @{$custom_estimators$this tutorial}.
+
+To build the model, we:
+
+1. reshape the input back into the original shape - where the mini batch is
+ padded to the maximal length of its contents. In addition to the ink data we
+ also have the lengths for each example and the target class. This happens in
+ the function [`_get_input_tensors`](#-get-input-tensors).
+
+1. pass the input through to a series of convolution layers in
+ [`_add_conv_layers`](#-add-conv-layers).
+
+1. pass the output of the convolutions into a series of bidirectional LSTM
+ layers in [`_add_rnn_layers`](#-add-rnn-layers). At the end of that, the
+ outputs for each time step are summed up to have a compact, fixed length
+ embedding of the input.
+
+1. classify this embedding using a softmax layer in
+ [`_add_fc_layers`](#-add-fc-layers).
+
+In code this looks like:
+
+```python
+inks, lengths, targets = _get_input_tensors(features, targets)
+convolved = _add_conv_layers(inks)
+final_state = _add_rnn_layers(convolved, lengths)
+logits =_add_fc_layers(final_state)
+```
+
+### _get_input_tensors
+
+To obtain the input features we first obtain the shape from the features dict
+and then create a 1D tensor of size `[batch_size]` containing the lengths of the
+input sequences. The ink is stored as a SparseTensor in the features dict which
+we convert into a dense tensor and then reshape to be `[batch_size, ?, 3]`. And
+finally, if targets were passed in we make sure they are stored as a 1D tensor
+of size `[batch_size]`
+
+In code this looks like this:
+
+```python
+shapes = features["shape"]
+lengths = tf.squeeze(
+ tf.slice(shapes, begin=[0, 0], size=[params["batch_size"], 1]))
+inks = tf.reshape(
+ tf.sparse_tensor_to_dense(features["ink"]),
+ [params["batch_size"], -1, 3])
+if targets is not None:
+ targets = tf.squeeze(targets)
+```
+
+### _add_conv_layers
+
+The desired number of convolution layers and the lengths of the filters is
+configured through the parameters `num_conv` and `conv_len` in the `params`
+dict.
+
+The input is a sequence where each point has dimensionality 3. We are going to
+use 1D convolutions where we treat the 3 input features as channels. That means
+that the input is a `[batch_size, length, 3]` tensor and the output will be a
+`[batch_size, length, number_of_filters]` tensor.
+
+```python
+convolved = inks
+for i in range(len(params.num_conv)):
+ convolved_input = convolved
+ if params.batch_norm:
+ convolved_input = tf.layers.batch_normalization(
+ convolved_input,
+ training=(mode == tf.estimator.ModeKeys.TRAIN))
+ # Add dropout layer if enabled and not first convolution layer.
+ if i > 0 and params.dropout:
+ convolved_input = tf.layers.dropout(
+ convolved_input,
+ rate=params.dropout,
+ training=(mode == tf.estimator.ModeKeys.TRAIN))
+ convolved = tf.layers.conv1d(
+ convolved_input,
+ filters=params.num_conv[i],
+ kernel_size=params.conv_len[i],
+ activation=None,
+ strides=1,
+ padding="same",
+ name="conv1d_%d" % i)
+return convolved, lengths
+```
+
+### _add_rnn_layers
+
+We pass the output from the convolutions into bidirectional LSTM layers for
+which we use a helper function from contrib.
+
+```python
+outputs, _, _ = contrib_rnn.stack_bidirectional_dynamic_rnn(
+ cells_fw=[cell(params.num_nodes) for _ in range(params.num_layers)],
+ cells_bw=[cell(params.num_nodes) for _ in range(params.num_layers)],
+ inputs=convolved,
+ sequence_length=lengths,
+ dtype=tf.float32,
+ scope="rnn_classification")
+```
+
+see the code for more details and how to use `CUDA` accelerated implementations.
+
+To create a compact, fixed-length embedding, we sum up the output of the LSTMs.
+We first zero out the regions of the batch where the sequences have no data.
+
+```python
+mask = tf.tile(
+ tf.expand_dims(tf.sequence_mask(lengths, tf.shape(outputs)[1]), 2),
+ [1, 1, tf.shape(outputs)[2]])
+zero_outside = tf.where(mask, outputs, tf.zeros_like(outputs))
+outputs = tf.reduce_sum(zero_outside, axis=1)
+```
+
+### _add_fc_layers
+
+The embedding of the input is passed into a fully connected layer which we then
+use as a softmax layer.
+
+```python
+tf.layers.dense(final_state, params.num_classes)
+```
+
+### Loss, predictions, and optimizer
+
+Finally, we need to add a loss, a training op, and predictions to create the
+`ModelFn`:
+
+```python
+cross_entropy = tf.reduce_mean(
+ tf.nn.sparse_softmax_cross_entropy_with_logits(
+ labels=targets, logits=logits))
+# Add the optimizer.
+train_op = tf.contrib.layers.optimize_loss(
+ loss=cross_entropy,
+ global_step=tf.train.get_global_step(),
+ learning_rate=params.learning_rate,
+ optimizer="Adam",
+ # some gradient clipping stabilizes training in the beginning.
+ clip_gradients=params.gradient_clipping_norm,
+ summaries=["learning_rate", "loss", "gradients", "gradient_norm"])
+predictions = tf.argmax(logits, axis=1)
+return model_fn_lib.ModelFnOps(
+ mode=mode,
+ predictions={"logits": logits,
+ "predictions": predictions},
+ loss=cross_entropy,
+ train_op=train_op,
+ eval_metric_ops={"accuracy": tf.metrics.accuracy(targets, predictions)})
+```
+
+### Training and evaluating the model
+
+To train and evaluate the model we can rely on the functionalities of the
+`Estimator` APIs and easily run training and evaluation with the `Experiment`
+APIs:
+
+```python
+ estimator = tf.estimator.Estimator(
+ model_fn=model_fn,
+ model_dir=output_dir,
+ config=config,
+ params=model_params)
+ # Train the model.
+ tf.contrib.learn.Experiment(
+ estimator=estimator,
+ train_input_fn=get_input_fn(
+ mode=tf.contrib.learn.ModeKeys.TRAIN,
+ tfrecord_pattern=FLAGS.training_data,
+ batch_size=FLAGS.batch_size),
+ train_steps=FLAGS.steps,
+ eval_input_fn=get_input_fn(
+ mode=tf.contrib.learn.ModeKeys.EVAL,
+ tfrecord_pattern=FLAGS.eval_data,
+ batch_size=FLAGS.batch_size),
+ min_eval_frequency=1000)
+```
+
+Note that this tutorial is just a quick example on a relatively small dataset to
+get you familiar with the APIs of recurrent neural networks and estimators. Such
+models can be even more powerful if you try them on a large dataset.
+
+When training the model for 1M steps you can expect to get an accuracy of
+approximately of approximately 70% on the top-1 candidate. Note that this
+accuracy is sufficient to build the quickdraw game because of the game dynamics
+the user will be able to adjust their drawing until it is ready. Also, the game
+does not use the top-1 candidate only but accepts a drawing as correct if the
+target category shows up with a score better than a fixed threshold.