aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/how_tos/reading_data/index.md
diff options
context:
space:
mode:
authorGravatar Vijay Vasudevan <vrv@google.com>2015-11-18 10:47:35 -0800
committerGravatar Vijay Vasudevan <vrv@google.com>2015-11-18 10:47:35 -0800
commitab34d55ce7618e52069a2e1c9e51aac5a1ea81c3 (patch)
tree9c79427b45ff6501e8374ceb7b4fc3bdb2828e15 /tensorflow/g3doc/how_tos/reading_data/index.md
parent9eb88d56ab6a9a361662d73a258593d8fbf10b62 (diff)
TensorFlow: more features, performance improvements, and doc fixes.
Changes: - Add Split/Concat() methods to TensorUtil (meant for convenience, not speed) by Chris. - Changes to linear algebra ops interface by Rasmus - Tests for tensorboard by Daniel - Fix bug in histogram calculation by Cassandra - Added tool for backwards compatibility of OpDefs. Tool Checks in history of opdefs and their changes, checks for backwards-incompatible changes. All done by @josh11b - Fix some protobuf example proto docs by Oliver - Add derivative of MatrixDeterminant by @yaroslavvb - Add a priority queue queue by @ebrevdo - Doc and typo fixes by Aurelien and @dave-andersen - Speed improvements to ConvBackwardFilter by @andydavis - Improve speed of Alexnet on TitanX by @zheng-xq - Add some host memory annotations to some GPU kernels by Yuan. - Add support for doubles in histogram summary by @jmchen-g Base CL: 108158338
Diffstat (limited to 'tensorflow/g3doc/how_tos/reading_data/index.md')
-rw-r--r--tensorflow/g3doc/how_tos/reading_data/index.md52
1 files changed, 18 insertions, 34 deletions
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md
index 64209b8bd0..089ee4e34d 100644
--- a/tensorflow/g3doc/how_tos/reading_data/index.md
+++ b/tensorflow/g3doc/how_tos/reading_data/index.md
@@ -1,4 +1,4 @@
-# Reading data <a class="md-anchor" id="AUTOGENERATED-reading-data"></a>
+# Reading data
There are three main methods of getting data into a TensorFlow program:
@@ -8,25 +8,9 @@ There are three main methods of getting data into a TensorFlow program:
* Preloaded data: a constant or variable in the TensorFlow graph holds
all the data (for small data sets).
-<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
-## Contents
-### [Reading data](#AUTOGENERATED-reading-data)
-* [Feeding](#Feeding)
-* [Reading from files](#AUTOGENERATED-reading-from-files)
- * [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits)
- * [File formats](#AUTOGENERATED-file-formats)
- * [Preprocessing](#AUTOGENERATED-preprocessing)
- * [Batching](#AUTOGENERATED-batching)
- * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner)
- * [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record)
- * [Sparse input data](#AUTOGENERATED-sparse-input-data)
-* [Preloaded data](#AUTOGENERATED-preloaded-data)
-* [Multiple input pipelines](#AUTOGENERATED-multiple-input-pipelines)
+[TOC]
-
-<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
-
-## Feeding <a class="md-anchor" id="Feeding"></a>
+## Feeding {#Feeding}
TensorFlow's feed mechanism lets you inject data into any Tensor in a
computation graph. A python computation can thus feed data directly into the
@@ -54,7 +38,7 @@ in
[`tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py),
and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md).
-## Reading from files <a class="md-anchor" id="AUTOGENERATED-reading-from-files"></a>
+## Reading from files
A typical pipeline for reading records from files has the following stages:
@@ -67,7 +51,7 @@ A typical pipeline for reading records from files has the following stages:
7. *Optional* preprocessing
8. Example queue
-### Filenames, shuffling, and epoch limits <a class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits"></a>
+### Filenames, shuffling, and epoch limits
For the list of filenames, use either a constant string Tensor (like
`["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the
@@ -89,7 +73,7 @@ The queue runner works in a thread separate from the reader that pulls
filenames from the queue, so the shuffling and enqueuing process does not
block the reader.
-### File formats <a class="md-anchor" id="AUTOGENERATED-file-formats"></a>
+### File formats
Select the reader that matches your input file format and pass the filename
queue to the reader's read method. The read method outputs a key identifying
@@ -97,7 +81,7 @@ the file and record (useful for debugging if you have some weird records), and
a scalar string value. Use one (or more) of the decoder and conversion ops to
decode this string into the tensors that make up an example.
-#### CSV files <a class="md-anchor" id="AUTOGENERATED-csv-files"></a>
+#### CSV files
To read text files in [comma-separated value (CSV)
format](https://tools.ietf.org/html/rfc4180), use a
@@ -139,7 +123,7 @@ You must call `tf.train.start_queue_runners` to populate the queue before
you call `run` or `eval` to execute the `read`. Otherwise `read` will
block while it waits for filenames from the queue.
-#### Fixed length records <a class="md-anchor" id="AUTOGENERATED-fixed-length-records"></a>
+#### Fixed length records
To read binary files in which each record is a fixed number of bytes, use
[`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
@@ -155,7 +139,7 @@ needed. For CIFAR-10, you can see how to do the reading and decoding in
and described in
[this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data).
-#### Standard TensorFlow format <a class="md-anchor" id="AUTOGENERATED-standard-tensorflow-format"></a>
+#### Standard TensorFlow format
Another approach is to convert whatever data you have into a supported format.
This approach makes it easier to mix and match data sets and network
@@ -181,7 +165,7 @@ found in
[`tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py),
which you can compare with the `fully_connected_feed` version.
-### Preprocessing <a class="md-anchor" id="AUTOGENERATED-preprocessing"></a>
+### Preprocessing
You can then do any preprocessing of these examples you want. This would be any
processing that doesn't depend on trainable parameters. Examples include
@@ -190,7 +174,7 @@ etc. See
[`tensorflow/models/image/cifar10/cifar10.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py)
for an example.
-### Batching <a class="md-anchor" id="AUTOGENERATED-batching"></a>
+### Batching
At the end of the pipeline we use another queue to batch together examples for
training, evaluation, or inference. For this we use a queue that randomizes the
@@ -268,7 +252,7 @@ summary to the graph that indicates how full the example queue is. If you have
enough reading threads, that summary will stay above zero. You can
[view your summaries as training progresses using TensorBoard](../../how_tos/summaries_and_tensorboard/index.md).
-### Creating threads to prefetch using `QueueRunner` objects <a class="md-anchor" id="QueueRunner"></a>
+### Creating threads to prefetch using `QueueRunner` objects {#QueueRunner}
The short version: many of the `tf.train` functions listed above add
[`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your
@@ -312,7 +296,7 @@ coord.join(threads)
sess.close()
```
-#### Aside: What is happening here? <a class="md-anchor" id="AUTOGENERATED-aside--what-is-happening-here-"></a>
+#### Aside: What is happening here?
First we create the graph. It will have a few pipeline stages that are
connected by queues. The first stage will generate filenames to read and enqueue
@@ -357,7 +341,7 @@ exception).
For more about threading, queues, QueueRunners, and Coordinators
[see here](../../how_tos/threading_and_queues/index.md).
-#### Aside: How clean shut-down when limiting epochs works <a class="md-anchor" id="AUTOGENERATED-aside--how-clean-shut-down-when-limiting-epochs-works"></a>
+#### Aside: How clean shut-down when limiting epochs works
Imagine you have a model that has set a limit on the number of epochs to train
on. That means that the thread generating filenames will only run that many
@@ -400,7 +384,7 @@ errors and exiting. Once all the training threads are done,
[`tf.train.Coordinator.join`](../../api_docs/python/train.md#Coordinator.join)
will return and you can exit cleanly.
-### Filtering records or producing multiple examples per record <a class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record"></a>
+### Filtering records or producing multiple examples per record
Instead of examples with shapes `[x, y, z]`, you will produce a batch of
examples with shape `[batch, x, y, z]`. The batch size can be 0 if you want to
@@ -409,14 +393,14 @@ are producing multiple examples per record. Then simply set `enqueue_many=True`
when calling one of the batching functions (such as `shuffle_batch` or
`shuffle_batch_join`).
-### Sparse input data <a class="md-anchor" id="AUTOGENERATED-sparse-input-data"></a>
+### Sparse input data
SparseTensors don't play well with queues. If you use SparseTensors you have
to decode the string records using
[`tf.parse_example`](../../api_docs/python/io_ops.md#parse_example) **after**
batching (instead of using `tf.parse_single_example` before batching).
-## Preloaded data <a class="md-anchor" id="AUTOGENERATED-preloaded-data"></a>
+## Preloaded data
This is only used for small data sets that can be loaded entirely in memory.
There are two approaches:
@@ -475,7 +459,7 @@ An MNIST example that preloads the data using constants can be found in
You can compare these with the `fully_connected_feed` and
`fully_connected_reader` versions above.
-## Multiple input pipelines <a class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines"></a>
+## Multiple input pipelines
Commonly you will want to train on one dataset and evaluate (or "eval") on
another. One way to do this is to actually have two separate processes: