diff options
Diffstat (limited to 'tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md')
-rw-r--r-- | tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md | 410 |
1 files changed, 0 insertions, 410 deletions
diff --git a/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md b/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md deleted file mode 100644 index 657fab8a53..0000000000 --- a/tensorflow/docs_src/tutorials/sequences/recurrent_quickdraw.md +++ /dev/null @@ -1,410 +0,0 @@ -# Recurrent Neural Networks for Drawing Classification - -[Quick, Draw!]: http://quickdraw.withgoogle.com - -[Quick, Draw!] is a game where a player is challenged to draw a number of -objects and see if a computer can recognize the drawing. - -The recognition in [Quick, Draw!] is performed by a classifier that takes the -user input, given as a sequence of strokes of points in x and y, and recognizes -the object category that the user tried to draw. - -In this tutorial we'll show how to build an RNN-based recognizer for this -problem. The model will use a combination of convolutional layers, LSTM layers, -and a softmax output layer to classify the drawings: - -<center> ![RNN model structure](../../images/quickdraw_model.png) </center> - -The figure above shows the structure of the model that we will build in this -tutorial. The input is a drawing that is encoded as a sequence of strokes of -points in x, y, and n, where n indicates whether a the point is the first point -in a new stroke. - -Then, a series of 1-dimensional convolutions is applied. Then LSTM layers are -applied and the sum of the outputs of all LSTM steps is fed into a softmax layer -to make a classification decision among the classes of drawings that we know. - -This tutorial uses the data from actual [Quick, Draw!] games [that is publicly -available](https://quickdraw.withgoogle.com/data). This dataset contains of 50M -drawings in 345 categories. - -## Run the tutorial code - -To try the code for this tutorial: - -1. [Install TensorFlow](../../install/index.md) if you haven't already. -1. Download the [tutorial code] -(https://github.com/tensorflow/models/tree/master/tutorials/rnn/quickdraw/train_model.py). -1. [Download the data](#download-the-data) in `TFRecord` format from - [here](http://download.tensorflow.org/data/quickdraw_tutorial_dataset_v1.tar.gz) and unzip it. More details about [how to - obtain the original Quick, Draw! - data](#optional_download_the_full_quick_draw_data) and [how to convert that - to `TFRecord` files](#optional_converting_the_data) is available below. - -1. Execute the tutorial code with the following command to train the RNN-based - model described in this tutorial. Make sure to adjust the paths to point to - the unzipped data from the download in step 3. - -```shell - python train_model.py \ - --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? \ - --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? \ - --classes_file=rnn_tutorial_data/training.tfrecord.classes -``` - -## Tutorial details - -### Download the data - -We make the data that we use in this tutorial available as `TFRecord` files -containing `TFExamples`. You can download the data from here: -<a rel="nofollow" href="http://download.tensorflow.org/data/quickdraw_tutorial_dataset_v1.tar.gz">http://download.tensorflow.org/data/quickdraw_tutorial_dataset_v1.tar.gz</a> (~1GB). - -Alternatively you can download the original data in `ndjson` format from the -Google cloud and convert it to the `TFRecord` files containing `TFExamples` -yourself as described in the next section. - -### Optional: Download the full Quick Draw Data - -The full [Quick, Draw!](https://quickdraw.withgoogle.com) -[dataset](https://quickdraw.withgoogle.com/data) is available on Google Cloud -Storage as [ndjson](http://ndjson.org/) files separated by category. You can -[browse the list of files in Cloud -Console](https://console.cloud.google.com/storage/quickdraw_dataset). - -To download the data we recommend using -[gsutil](https://cloud.google.com/storage/docs/gsutil_install#install) to -download the entire dataset. Note that the original .ndjson files require -downloading ~22GB. - -Then use the following command to check that your gsutil installation works and -that you can access the data bucket: - -```shell -gsutil ls -r "gs://quickdraw_dataset/full/simplified/*" -``` - -which will output a long list of files like the following: - -```shell -gs://quickdraw_dataset/full/simplified/The Eiffel Tower.ndjson -gs://quickdraw_dataset/full/simplified/The Great Wall of China.ndjson -gs://quickdraw_dataset/full/simplified/The Mona Lisa.ndjson -gs://quickdraw_dataset/full/simplified/aircraft carrier.ndjson -... -``` - -Then create a folder and download the dataset there. - -```shell -mkdir rnn_tutorial_data -cd rnn_tutorial_data -gsutil -m cp "gs://quickdraw_dataset/full/simplified/*" . -``` - -This download will take a while and download a bit more than 23GB of data. - -### Optional: Converting the data - -To convert the `ndjson` files to -[TFRecord](../../api_guides/python/python_io.md#TFRecords_Format_Details) files containing -[`tf.train.Example`](https://www.tensorflow.org/code/tensorflow/core/example/example.proto) -protos run the following command. - -```shell - python create_dataset.py --ndjson_path rnn_tutorial_data \ - --output_path rnn_tutorial_data -``` - -This will store the data in 10 shards of -[TFRecord](../../api_guides/python/python_io.md#TFRecords_Format_Details) files with 10000 items -per class for the training data and 1000 items per class as eval data. - -This conversion process is described in more detail in the following. - -The original QuickDraw data is formatted as `ndjson` files where each line -contains a JSON object like the following: - -```json -{"word":"cat", - "countrycode":"VE", - "timestamp":"2017-03-02 23:25:10.07453 UTC", - "recognized":true, - "key_id":"5201136883597312", - "drawing":[ - [ - [130,113,99,109,76,64,55,48,48,51,59,86,133,154,170,203,214,217,215,208,186,176,162,157,132], - [72,40,27,79,82,88,100,120,134,152,165,184,189,186,179,152,131,114,100,89,76,0,31,65,70] - ],[ - [76,28,7], - [136,128,128] - ],[ - [76,23,0], - [160,164,175] - ],[ - [87,52,37], - [175,191,204] - ],[ - [174,220,246,251], - [134,132,136,139] - ],[ - [175,255], - [147,168] - ],[ - [171,208,215], - [164,198,210] - ],[ - [130,110,108,111,130,139,139,119], - [129,134,137,144,148,144,136,130] - ],[ - [107,106], - [96,113] - ] - ] -} -``` - -For our purpose of building a classifier we only care about the fields "`word`" -and "`drawing`". While parsing the ndjson files, we process them line by line -using a function that converts the strokes from the `drawing` field into a -tensor of size `[number of points, 3]` containing the differences of consecutive -points. This function also returns the class name as a string. - -```python -def parse_line(ndjson_line): - """Parse an ndjson line and return ink (as np array) and classname.""" - sample = json.loads(ndjson_line) - class_name = sample["word"] - inkarray = sample["drawing"] - stroke_lengths = [len(stroke[0]) for stroke in inkarray] - total_points = sum(stroke_lengths) - np_ink = np.zeros((total_points, 3), dtype=np.float32) - current_t = 0 - for stroke in inkarray: - for i in [0, 1]: - np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i] - current_t += len(stroke[0]) - np_ink[current_t - 1, 2] = 1 # stroke_end - # Preprocessing. - # 1. Size normalization. - lower = np.min(np_ink[:, 0:2], axis=0) - upper = np.max(np_ink[:, 0:2], axis=0) - scale = upper - lower - scale[scale == 0] = 1 - np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale - # 2. Compute deltas. - np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2] - return np_ink, class_name -``` - -Since we want the data to be shuffled for writing we read from each of the -category files in random order and write to a random shard. - -For the training data we read the first 10000 items for each class and for the -eval data we read the next 1000 items for each class. - -This data is then reformatted into a tensor of shape `[num_training_samples, -max_length, 3]`. Then we determine the bounding box of the original drawing in -screen coordinates and normalize the size such that the drawing has unit height. - -<center> ![Size normalization](../../images/quickdraw_sizenormalization.png) </center> - -Finally, we compute the differences between consecutive points and store these -as a `VarLenFeature` in a -[tensorflow.Example](https://www.tensorflow.org/code/tensorflow/core/example/example.proto) -under the key `ink`. In addition we store the `class_index` as a single entry -`FixedLengthFeature` and the `shape` of the `ink` as a `FixedLengthFeature` of -length 2. - -### Defining the model - -To define the model we create a new `Estimator`. If you want to read more about -estimators, we recommend [this tutorial](../../guide/custom_estimators.md). - -To build the model, we: - -1. reshape the input back into the original shape - where the mini batch is - padded to the maximal length of its contents. In addition to the ink data we - also have the lengths for each example and the target class. This happens in - the function [`_get_input_tensors`](#-get-input-tensors). - -1. pass the input through to a series of convolution layers in - [`_add_conv_layers`](#-add-conv-layers). - -1. pass the output of the convolutions into a series of bidirectional LSTM - layers in [`_add_rnn_layers`](#-add-rnn-layers). At the end of that, the - outputs for each time step are summed up to have a compact, fixed length - embedding of the input. - -1. classify this embedding using a softmax layer in - [`_add_fc_layers`](#-add-fc-layers). - -In code this looks like: - -```python -inks, lengths, targets = _get_input_tensors(features, targets) -convolved = _add_conv_layers(inks) -final_state = _add_rnn_layers(convolved, lengths) -logits =_add_fc_layers(final_state) -``` - -### _get_input_tensors - -To obtain the input features we first obtain the shape from the features dict -and then create a 1D tensor of size `[batch_size]` containing the lengths of the -input sequences. The ink is stored as a SparseTensor in the features dict which -we convert into a dense tensor and then reshape to be `[batch_size, ?, 3]`. And -finally, if targets were passed in we make sure they are stored as a 1D tensor -of size `[batch_size]` - -In code this looks like this: - -```python -shapes = features["shape"] -lengths = tf.squeeze( - tf.slice(shapes, begin=[0, 0], size=[params["batch_size"], 1])) -inks = tf.reshape( - tf.sparse_tensor_to_dense(features["ink"]), - [params["batch_size"], -1, 3]) -if targets is not None: - targets = tf.squeeze(targets) -``` - -### _add_conv_layers - -The desired number of convolution layers and the lengths of the filters is -configured through the parameters `num_conv` and `conv_len` in the `params` -dict. - -The input is a sequence where each point has dimensionality 3. We are going to -use 1D convolutions where we treat the 3 input features as channels. That means -that the input is a `[batch_size, length, 3]` tensor and the output will be a -`[batch_size, length, number_of_filters]` tensor. - -```python -convolved = inks -for i in range(len(params.num_conv)): - convolved_input = convolved - if params.batch_norm: - convolved_input = tf.layers.batch_normalization( - convolved_input, - training=(mode == tf.estimator.ModeKeys.TRAIN)) - # Add dropout layer if enabled and not first convolution layer. - if i > 0 and params.dropout: - convolved_input = tf.layers.dropout( - convolved_input, - rate=params.dropout, - training=(mode == tf.estimator.ModeKeys.TRAIN)) - convolved = tf.layers.conv1d( - convolved_input, - filters=params.num_conv[i], - kernel_size=params.conv_len[i], - activation=None, - strides=1, - padding="same", - name="conv1d_%d" % i) -return convolved, lengths -``` - -### _add_rnn_layers - -We pass the output from the convolutions into bidirectional LSTM layers for -which we use a helper function from contrib. - -```python -outputs, _, _ = contrib_rnn.stack_bidirectional_dynamic_rnn( - cells_fw=[cell(params.num_nodes) for _ in range(params.num_layers)], - cells_bw=[cell(params.num_nodes) for _ in range(params.num_layers)], - inputs=convolved, - sequence_length=lengths, - dtype=tf.float32, - scope="rnn_classification") -``` - -see the code for more details and how to use `CUDA` accelerated implementations. - -To create a compact, fixed-length embedding, we sum up the output of the LSTMs. -We first zero out the regions of the batch where the sequences have no data. - -```python -mask = tf.tile( - tf.expand_dims(tf.sequence_mask(lengths, tf.shape(outputs)[1]), 2), - [1, 1, tf.shape(outputs)[2]]) -zero_outside = tf.where(mask, outputs, tf.zeros_like(outputs)) -outputs = tf.reduce_sum(zero_outside, axis=1) -``` - -### _add_fc_layers - -The embedding of the input is passed into a fully connected layer which we then -use as a softmax layer. - -```python -tf.layers.dense(final_state, params.num_classes) -``` - -### Loss, predictions, and optimizer - -Finally, we need to add a loss, a training op, and predictions to create the -`ModelFn`: - -```python -cross_entropy = tf.reduce_mean( - tf.nn.sparse_softmax_cross_entropy_with_logits( - labels=targets, logits=logits)) -# Add the optimizer. -train_op = tf.contrib.layers.optimize_loss( - loss=cross_entropy, - global_step=tf.train.get_global_step(), - learning_rate=params.learning_rate, - optimizer="Adam", - # some gradient clipping stabilizes training in the beginning. - clip_gradients=params.gradient_clipping_norm, - summaries=["learning_rate", "loss", "gradients", "gradient_norm"]) -predictions = tf.argmax(logits, axis=1) -return model_fn_lib.ModelFnOps( - mode=mode, - predictions={"logits": logits, - "predictions": predictions}, - loss=cross_entropy, - train_op=train_op, - eval_metric_ops={"accuracy": tf.metrics.accuracy(targets, predictions)}) -``` - -### Training and evaluating the model - -To train and evaluate the model we can rely on the functionalities of the -`Estimator` APIs and easily run training and evaluation with the `Experiment` -APIs: - -```python - estimator = tf.estimator.Estimator( - model_fn=model_fn, - model_dir=output_dir, - config=config, - params=model_params) - # Train the model. - tf.contrib.learn.Experiment( - estimator=estimator, - train_input_fn=get_input_fn( - mode=tf.contrib.learn.ModeKeys.TRAIN, - tfrecord_pattern=FLAGS.training_data, - batch_size=FLAGS.batch_size), - train_steps=FLAGS.steps, - eval_input_fn=get_input_fn( - mode=tf.contrib.learn.ModeKeys.EVAL, - tfrecord_pattern=FLAGS.eval_data, - batch_size=FLAGS.batch_size), - min_eval_frequency=1000) -``` - -Note that this tutorial is just a quick example on a relatively small dataset to -get you familiar with the APIs of recurrent neural networks and estimators. Such -models can be even more powerful if you try them on a large dataset. - -When training the model for 1M steps you can expect to get an accuracy of -approximately of approximately 70% on the top-1 candidate. Note that this -accuracy is sufficient to build the quickdraw game because of the game dynamics -the user will be able to adjust their drawing until it is ready. Also, the game -does not use the top-1 candidate only but accepts a drawing as correct if the -target category shows up with a score better than a fixed threshold. |