aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/how_tos/reading_data/index.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/g3doc/how_tos/reading_data/index.md')
-rw-r--r--tensorflow/g3doc/how_tos/reading_data/index.md40
1 files changed, 20 insertions, 20 deletions
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md
index b37d3042e7..945af144ca 100644
--- a/tensorflow/g3doc/how_tos/reading_data/index.md
+++ b/tensorflow/g3doc/how_tos/reading_data/index.md
@@ -1,4 +1,4 @@
-# Reading data
+# Reading data <a class="md-anchor" id="AUTOGENERATED-reading-data"></a>
There are three main methods of getting data into a TensorFlow program:
@@ -10,13 +10,14 @@ There are three main methods of getting data into a TensorFlow program:
<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
## Contents
-* [Feeding](#AUTOGENERATED-feeding)
+### [Reading data](#AUTOGENERATED-reading-data)
+* [Feeding](#Feeding)
* [Reading from files](#AUTOGENERATED-reading-from-files)
* [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits)
* [File formats](#AUTOGENERATED-file-formats)
* [Preprocessing](#AUTOGENERATED-preprocessing)
* [Batching](#AUTOGENERATED-batching)
- * [Creating threads to prefetch using `QueueRunner` objects](#AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects)
+ * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner)
* [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record)
* [Sparse input data](#AUTOGENERATED-sparse-input-data)
* [Preloaded data](#AUTOGENERATED-preloaded-data)
@@ -25,7 +26,7 @@ There are three main methods of getting data into a TensorFlow program:
<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
-## Feeding <div class="md-anchor" id="AUTOGENERATED-feeding">{#AUTOGENERATED-feeding}</div>
+## Feeding <a class="md-anchor" id="Feeding"></a>
TensorFlow's feed mechanism lets you inject data into any Tensor in a
computation graph. A python computation can thus feed data directly into the
@@ -53,7 +54,7 @@ in
[tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py),
and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md).
-## Reading from files <div class="md-anchor" id="AUTOGENERATED-reading-from-files">{#AUTOGENERATED-reading-from-files}</div>
+## Reading from files <a class="md-anchor" id="AUTOGENERATED-reading-from-files"></a>
A typical pipeline for reading records from files has the following stages:
@@ -66,7 +67,7 @@ A typical pipeline for reading records from files has the following stages:
7. *Optional* preprocessing
8. Example queue
-### Filenames, shuffling, and epoch limits <div class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits">{#AUTOGENERATED-filenames--shuffling--and-epoch-limits}</div>
+### Filenames, shuffling, and epoch limits <a class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits"></a>
For the list of filenames, use either a constant string Tensor (like
`["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the
@@ -88,7 +89,7 @@ The queue runner works in a thread separate from the reader that pulls
filenames from the queue, so the shuffling and enqueuing process does not
block the reader.
-### File formats <div class="md-anchor" id="AUTOGENERATED-file-formats">{#AUTOGENERATED-file-formats}</div>
+### File formats <a class="md-anchor" id="AUTOGENERATED-file-formats"></a>
Select the reader that matches your input file format and pass the filename
queue to the reader's read method. The read method outputs a key identifying
@@ -96,7 +97,7 @@ the file and record (useful for debugging if you have some weird records), and
a scalar string value. Use one (or more) of the decoder and conversion ops to
decode this string into the tensors that make up an example.
-#### CSV files
+#### CSV files <a class="md-anchor" id="AUTOGENERATED-csv-files"></a>
To read text files in [comma-separated value (CSV)
format](https://tools.ietf.org/html/rfc4180), use a
@@ -138,7 +139,7 @@ You must call `tf.train.start_queue_runners()` to populate the queue before
you call `run()` or `eval()` to execute the `read()`. Otherwise `read()` will
block while it waits for filenames from the queue.
-#### Fixed length records
+#### Fixed length records <a class="md-anchor" id="AUTOGENERATED-fixed-length-records"></a>
To read binary files in which each record is a fixed number of bytes, use
[tf.FixedLengthRecordReader](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
@@ -154,7 +155,7 @@ needed. For CIFAR-10, you can see how to do the reading and decoding in
and described in
[this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data).
-#### Standard TensorFlow format
+#### Standard TensorFlow format <a class="md-anchor" id="AUTOGENERATED-standard-tensorflow-format"></a>
Another approach is to convert whatever data you have into a supported format.
This approach makes it easier to mix and match data sets and network
@@ -180,7 +181,7 @@ found in
[tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py),
which you can compare with the `fully_connected_feed` version.
-### Preprocessing <div class="md-anchor" id="AUTOGENERATED-preprocessing">{#AUTOGENERATED-preprocessing}</div>
+### Preprocessing <a class="md-anchor" id="AUTOGENERATED-preprocessing"></a>
You can then do any preprocessing of these examples you want. This would be any
processing that doesn't depend on trainable parameters. Examples include
@@ -189,7 +190,7 @@ etc. See
[tensorflow/models/image/cifar10/cifar10.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py)
for an example.
-### Batching <div class="md-anchor" id="AUTOGENERATED-batching">{#AUTOGENERATED-batching}</div>
+### Batching <a class="md-anchor" id="AUTOGENERATED-batching"></a>
At the end of the pipeline we use another queue to batch together examples for
training, evaluation, or inference. For this we use a queue that randomizes the
@@ -267,8 +268,7 @@ summary to the graph that indicates how full the example queue is. If you have
enough reading threads, that summary will stay above zero. You can
[view your summaries as training progresses using TensorBoard](../summaries_and_tensorboard/index.md).
-<a name="QueueRunner"></a>
-### Creating threads to prefetch using `QueueRunner` objects <div class="md-anchor" id="AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects">{#AUTOGENERATED-creating-threads-to-prefetch-using--queuerunner--objects}</div>
+### Creating threads to prefetch using `QueueRunner` objects <a class="md-anchor" id="QueueRunner"></a>
The short version: many of the `tf.train` functions listed above add
[`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your
@@ -312,7 +312,7 @@ coord.join(threads)
sess.close()
```
-#### Aside: What is happening here?
+#### Aside: What is happening here? <a class="md-anchor" id="AUTOGENERATED-aside--what-is-happening-here-"></a>
First we create the graph. It will have a few pipeline stages that are
connected by queues. The first stage will generate filenames to read and enqueue
@@ -357,7 +357,7 @@ exception).
For more about threading, queues, QueueRunners, and Coordinators
[see here](../threading_and_queues/index.md).
-#### Aside: How clean shut-down when limiting epochs works
+#### Aside: How clean shut-down when limiting epochs works <a class="md-anchor" id="AUTOGENERATED-aside--how-clean-shut-down-when-limiting-epochs-works"></a>
Imagine you have a model that has set a limit on the number of epochs to train
on. That means that the thread generating filenames will only run that many
@@ -400,7 +400,7 @@ errors and exiting. Once all the training threads are done,
[tf.train.Coordinator.join()](../../api_docs/python/train.md#Coordinator.join)
will return and you can exit cleanly.
-### Filtering records or producing multiple examples per record <div class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record">{#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record}</div>
+### Filtering records or producing multiple examples per record <a class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record"></a>
Instead of examples with shapes `[x, y, z]`, you will produce a batch of
examples with shape `[batch, x, y, z]`. The batch size can be 0 if you want to
@@ -409,14 +409,14 @@ are producing multiple examples per record. Then simply set `enqueue_many=True`
when calling one of the batching functions (such as `shuffle_batch` or
`shuffle_batch_join`).
-### Sparse input data <div class="md-anchor" id="AUTOGENERATED-sparse-input-data">{#AUTOGENERATED-sparse-input-data}</div>
+### Sparse input data <a class="md-anchor" id="AUTOGENERATED-sparse-input-data"></a>
SparseTensors don't play well with queues. If you use SparseTensors you have
to decode the string records using
[tf.parse_example](../../api_docs/python/io_ops.md#parse_example) **after**
batching (instead of using `tf.parse_single_example` before batching).
-## Preloaded data <div class="md-anchor" id="AUTOGENERATED-preloaded-data">{#AUTOGENERATED-preloaded-data}</div>
+## Preloaded data <a class="md-anchor" id="AUTOGENERATED-preloaded-data"></a>
This is only used for small data sets that can be loaded entirely in memory.
There are two approaches:
@@ -475,7 +475,7 @@ An MNIST example that preloads the data using constants can be found in
You can compare these with the `fully_connected_feed` and
`fully_connected_reader` versions above.
-## Multiple input pipelines <div class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines">{#AUTOGENERATED-multiple-input-pipelines}</div>
+## Multiple input pipelines <a class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines"></a>
Commonly you will want to train on one dataset and evaluate (or "eval") on
another. One way to do this is to actually have two separate processes: