diff options
Diffstat (limited to 'tensorflow/g3doc/how_tos/reading_data/index.md')
-rw-r--r-- | tensorflow/g3doc/how_tos/reading_data/index.md | 40 |
1 files changed, 20 insertions, 20 deletions
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md index b37d3042e7..945af144ca 100644 --- a/tensorflow/g3doc/how_tos/reading_data/index.md +++ b/tensorflow/g3doc/how_tos/reading_data/index.md @@ -1,4 +1,4 @@ -# Reading data +# Reading data <a class="md-anchor" id="AUTOGENERATED-reading-data"></a> There are three main methods of getting data into a TensorFlow program: @@ -10,13 +10,14 @@ There are three main methods of getting data into a TensorFlow program: <!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! --> ## Contents -* [Feeding](#AUTOGENERATED-feeding) +### [Reading data](#AUTOGENERATED-reading-data) +* [Feeding](#Feeding) * [Reading from files](#AUTOGENERATED-reading-from-files) * [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits) * [File formats](#AUTOGENERATED-file-formats) * [Preprocessing](#AUTOGENERATED-preprocessing) * [Batching](#AUTOGENERATED-batching) - * [Creating threads to prefetch using `QueueRunner` objects](#AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects) + * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner) * [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record) * [Sparse input data](#AUTOGENERATED-sparse-input-data) * [Preloaded data](#AUTOGENERATED-preloaded-data) @@ -25,7 +26,7 @@ There are three main methods of getting data into a TensorFlow program: <!-- TOC-END This section was generated by neural network, THANKS FOR READING! --> -## Feeding <div class="md-anchor" id="AUTOGENERATED-feeding">{#AUTOGENERATED-feeding}</div> +## Feeding <a class="md-anchor" id="Feeding"></a> TensorFlow's feed mechanism lets you inject data into any Tensor in a computation graph. A python computation can thus feed data directly into the @@ -53,7 +54,7 @@ in [tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py), and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md). -## Reading from files <div class="md-anchor" id="AUTOGENERATED-reading-from-files">{#AUTOGENERATED-reading-from-files}</div> +## Reading from files <a class="md-anchor" id="AUTOGENERATED-reading-from-files"></a> A typical pipeline for reading records from files has the following stages: @@ -66,7 +67,7 @@ A typical pipeline for reading records from files has the following stages: 7. *Optional* preprocessing 8. Example queue -### Filenames, shuffling, and epoch limits <div class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits">{#AUTOGENERATED-filenames--shuffling--and-epoch-limits}</div> +### Filenames, shuffling, and epoch limits <a class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits"></a> For the list of filenames, use either a constant string Tensor (like `["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the @@ -88,7 +89,7 @@ The queue runner works in a thread separate from the reader that pulls filenames from the queue, so the shuffling and enqueuing process does not block the reader. -### File formats <div class="md-anchor" id="AUTOGENERATED-file-formats">{#AUTOGENERATED-file-formats}</div> +### File formats <a class="md-anchor" id="AUTOGENERATED-file-formats"></a> Select the reader that matches your input file format and pass the filename queue to the reader's read method. The read method outputs a key identifying @@ -96,7 +97,7 @@ the file and record (useful for debugging if you have some weird records), and a scalar string value. Use one (or more) of the decoder and conversion ops to decode this string into the tensors that make up an example. -#### CSV files +#### CSV files <a class="md-anchor" id="AUTOGENERATED-csv-files"></a> To read text files in [comma-separated value (CSV) format](https://tools.ietf.org/html/rfc4180), use a @@ -138,7 +139,7 @@ You must call `tf.train.start_queue_runners()` to populate the queue before you call `run()` or `eval()` to execute the `read()`. Otherwise `read()` will block while it waits for filenames from the queue. -#### Fixed length records +#### Fixed length records <a class="md-anchor" id="AUTOGENERATED-fixed-length-records"></a> To read binary files in which each record is a fixed number of bytes, use [tf.FixedLengthRecordReader](../../api_docs/python/io_ops.md#FixedLengthRecordReader) @@ -154,7 +155,7 @@ needed. For CIFAR-10, you can see how to do the reading and decoding in and described in [this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data). -#### Standard TensorFlow format +#### Standard TensorFlow format <a class="md-anchor" id="AUTOGENERATED-standard-tensorflow-format"></a> Another approach is to convert whatever data you have into a supported format. This approach makes it easier to mix and match data sets and network @@ -180,7 +181,7 @@ found in [tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py), which you can compare with the `fully_connected_feed` version. -### Preprocessing <div class="md-anchor" id="AUTOGENERATED-preprocessing">{#AUTOGENERATED-preprocessing}</div> +### Preprocessing <a class="md-anchor" id="AUTOGENERATED-preprocessing"></a> You can then do any preprocessing of these examples you want. This would be any processing that doesn't depend on trainable parameters. Examples include @@ -189,7 +190,7 @@ etc. See [tensorflow/models/image/cifar10/cifar10.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py) for an example. -### Batching <div class="md-anchor" id="AUTOGENERATED-batching">{#AUTOGENERATED-batching}</div> +### Batching <a class="md-anchor" id="AUTOGENERATED-batching"></a> At the end of the pipeline we use another queue to batch together examples for training, evaluation, or inference. For this we use a queue that randomizes the @@ -267,8 +268,7 @@ summary to the graph that indicates how full the example queue is. If you have enough reading threads, that summary will stay above zero. You can [view your summaries as training progresses using TensorBoard](../summaries_and_tensorboard/index.md). -<a name="QueueRunner"></a> -### Creating threads to prefetch using `QueueRunner` objects <div class="md-anchor" id="AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects">{#AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects}</div> +### Creating threads to prefetch using `QueueRunner` objects <a class="md-anchor" id="QueueRunner"></a> The short version: many of the `tf.train` functions listed above add [`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your @@ -312,7 +312,7 @@ coord.join(threads) sess.close() ``` -#### Aside: What is happening here? +#### Aside: What is happening here? <a class="md-anchor" id="AUTOGENERATED-aside--what-is-happening-here-"></a> First we create the graph. It will have a few pipeline stages that are connected by queues. The first stage will generate filenames to read and enqueue @@ -357,7 +357,7 @@ exception). For more about threading, queues, QueueRunners, and Coordinators [see here](../threading_and_queues/index.md). -#### Aside: How clean shut-down when limiting epochs works +#### Aside: How clean shut-down when limiting epochs works <a class="md-anchor" id="AUTOGENERATED-aside--how-clean-shut-down-when-limiting-epochs-works"></a> Imagine you have a model that has set a limit on the number of epochs to train on. That means that the thread generating filenames will only run that many @@ -400,7 +400,7 @@ errors and exiting. Once all the training threads are done, [tf.train.Coordinator.join()](../../api_docs/python/train.md#Coordinator.join) will return and you can exit cleanly. -### Filtering records or producing multiple examples per record <div class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record">{#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record}</div> +### Filtering records or producing multiple examples per record <a class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record"></a> Instead of examples with shapes `[x, y, z]`, you will produce a batch of examples with shape `[batch, x, y, z]`. The batch size can be 0 if you want to @@ -409,14 +409,14 @@ are producing multiple examples per record. Then simply set `enqueue_many=True` when calling one of the batching functions (such as `shuffle_batch` or `shuffle_batch_join`). -### Sparse input data <div class="md-anchor" id="AUTOGENERATED-sparse-input-data">{#AUTOGENERATED-sparse-input-data}</div> +### Sparse input data <a class="md-anchor" id="AUTOGENERATED-sparse-input-data"></a> SparseTensors don't play well with queues. If you use SparseTensors you have to decode the string records using [tf.parse_example](../../api_docs/python/io_ops.md#parse_example) **after** batching (instead of using `tf.parse_single_example` before batching). -## Preloaded data <div class="md-anchor" id="AUTOGENERATED-preloaded-data">{#AUTOGENERATED-preloaded-data}</div> +## Preloaded data <a class="md-anchor" id="AUTOGENERATED-preloaded-data"></a> This is only used for small data sets that can be loaded entirely in memory. There are two approaches: @@ -475,7 +475,7 @@ An MNIST example that preloads the data using constants can be found in You can compare these with the `fully_connected_feed` and `fully_connected_reader` versions above. -## Multiple input pipelines <div class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines">{#AUTOGENERATED-multiple-input-pipelines}</div> +## Multiple input pipelines <a class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines"></a> Commonly you will want to train on one dataset and evaluate (or "eval") on another. One way to do this is to actually have two separate processes: |