diff options
author | 2015-11-18 10:47:35 -0800 | |
---|---|---|
committer | 2015-11-18 10:47:35 -0800 | |
commit | ab34d55ce7618e52069a2e1c9e51aac5a1ea81c3 (patch) | |
tree | 9c79427b45ff6501e8374ceb7b4fc3bdb2828e15 /tensorflow/g3doc/how_tos/reading_data/index.md | |
parent | 9eb88d56ab6a9a361662d73a258593d8fbf10b62 (diff) |
TensorFlow: more features, performance improvements, and doc fixes.
Changes:
- Add Split/Concat() methods to TensorUtil (meant for convenience, not
speed) by Chris.
- Changes to linear algebra ops interface by Rasmus
- Tests for tensorboard by Daniel
- Fix bug in histogram calculation by Cassandra
- Added tool for backwards compatibility of OpDefs. Tool
Checks in history of opdefs and their changes, checks for
backwards-incompatible changes. All done by @josh11b
- Fix some protobuf example proto docs by Oliver
- Add derivative of MatrixDeterminant by @yaroslavvb
- Add a priority queue queue by @ebrevdo
- Doc and typo fixes by Aurelien and @dave-andersen
- Speed improvements to ConvBackwardFilter by @andydavis
- Improve speed of Alexnet on TitanX by @zheng-xq
- Add some host memory annotations to some GPU kernels by Yuan.
- Add support for doubles in histogram summary by @jmchen-g
Base CL: 108158338
Diffstat (limited to 'tensorflow/g3doc/how_tos/reading_data/index.md')
-rw-r--r-- | tensorflow/g3doc/how_tos/reading_data/index.md | 52 |
1 files changed, 18 insertions, 34 deletions
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md index 64209b8bd0..089ee4e34d 100644 --- a/tensorflow/g3doc/how_tos/reading_data/index.md +++ b/tensorflow/g3doc/how_tos/reading_data/index.md @@ -1,4 +1,4 @@ -# Reading data <a class="md-anchor" id="AUTOGENERATED-reading-data"></a> +# Reading data There are three main methods of getting data into a TensorFlow program: @@ -8,25 +8,9 @@ There are three main methods of getting data into a TensorFlow program: * Preloaded data: a constant or variable in the TensorFlow graph holds all the data (for small data sets). -<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! --> -## Contents -### [Reading data](#AUTOGENERATED-reading-data) -* [Feeding](#Feeding) -* [Reading from files](#AUTOGENERATED-reading-from-files) - * [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits) - * [File formats](#AUTOGENERATED-file-formats) - * [Preprocessing](#AUTOGENERATED-preprocessing) - * [Batching](#AUTOGENERATED-batching) - * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner) - * [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record) - * [Sparse input data](#AUTOGENERATED-sparse-input-data) -* [Preloaded data](#AUTOGENERATED-preloaded-data) -* [Multiple input pipelines](#AUTOGENERATED-multiple-input-pipelines) +[TOC] - -<!-- TOC-END This section was generated by neural network, THANKS FOR READING! --> - -## Feeding <a class="md-anchor" id="Feeding"></a> +## Feeding {#Feeding} TensorFlow's feed mechanism lets you inject data into any Tensor in a computation graph. A python computation can thus feed data directly into the @@ -54,7 +38,7 @@ in [`tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py), and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md). -## Reading from files <a class="md-anchor" id="AUTOGENERATED-reading-from-files"></a> +## Reading from files A typical pipeline for reading records from files has the following stages: @@ -67,7 +51,7 @@ A typical pipeline for reading records from files has the following stages: 7. *Optional* preprocessing 8. Example queue -### Filenames, shuffling, and epoch limits <a class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits"></a> +### Filenames, shuffling, and epoch limits For the list of filenames, use either a constant string Tensor (like `["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the @@ -89,7 +73,7 @@ The queue runner works in a thread separate from the reader that pulls filenames from the queue, so the shuffling and enqueuing process does not block the reader. -### File formats <a class="md-anchor" id="AUTOGENERATED-file-formats"></a> +### File formats Select the reader that matches your input file format and pass the filename queue to the reader's read method. The read method outputs a key identifying @@ -97,7 +81,7 @@ the file and record (useful for debugging if you have some weird records), and a scalar string value. Use one (or more) of the decoder and conversion ops to decode this string into the tensors that make up an example. -#### CSV files <a class="md-anchor" id="AUTOGENERATED-csv-files"></a> +#### CSV files To read text files in [comma-separated value (CSV) format](https://tools.ietf.org/html/rfc4180), use a @@ -139,7 +123,7 @@ You must call `tf.train.start_queue_runners` to populate the queue before you call `run` or `eval` to execute the `read`. Otherwise `read` will block while it waits for filenames from the queue. -#### Fixed length records <a class="md-anchor" id="AUTOGENERATED-fixed-length-records"></a> +#### Fixed length records To read binary files in which each record is a fixed number of bytes, use [`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader) @@ -155,7 +139,7 @@ needed. For CIFAR-10, you can see how to do the reading and decoding in and described in [this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data). -#### Standard TensorFlow format <a class="md-anchor" id="AUTOGENERATED-standard-tensorflow-format"></a> +#### Standard TensorFlow format Another approach is to convert whatever data you have into a supported format. This approach makes it easier to mix and match data sets and network @@ -181,7 +165,7 @@ found in [`tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py), which you can compare with the `fully_connected_feed` version. -### Preprocessing <a class="md-anchor" id="AUTOGENERATED-preprocessing"></a> +### Preprocessing You can then do any preprocessing of these examples you want. This would be any processing that doesn't depend on trainable parameters. Examples include @@ -190,7 +174,7 @@ etc. See [`tensorflow/models/image/cifar10/cifar10.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py) for an example. -### Batching <a class="md-anchor" id="AUTOGENERATED-batching"></a> +### Batching At the end of the pipeline we use another queue to batch together examples for training, evaluation, or inference. For this we use a queue that randomizes the @@ -268,7 +252,7 @@ summary to the graph that indicates how full the example queue is. If you have enough reading threads, that summary will stay above zero. You can [view your summaries as training progresses using TensorBoard](../../how_tos/summaries_and_tensorboard/index.md). -### Creating threads to prefetch using `QueueRunner` objects <a class="md-anchor" id="QueueRunner"></a> +### Creating threads to prefetch using `QueueRunner` objects {#QueueRunner} The short version: many of the `tf.train` functions listed above add [`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your @@ -312,7 +296,7 @@ coord.join(threads) sess.close() ``` -#### Aside: What is happening here? <a class="md-anchor" id="AUTOGENERATED-aside--what-is-happening-here-"></a> +#### Aside: What is happening here? First we create the graph. It will have a few pipeline stages that are connected by queues. The first stage will generate filenames to read and enqueue @@ -357,7 +341,7 @@ exception). For more about threading, queues, QueueRunners, and Coordinators [see here](../../how_tos/threading_and_queues/index.md). -#### Aside: How clean shut-down when limiting epochs works <a class="md-anchor" id="AUTOGENERATED-aside--how-clean-shut-down-when-limiting-epochs-works"></a> +#### Aside: How clean shut-down when limiting epochs works Imagine you have a model that has set a limit on the number of epochs to train on. That means that the thread generating filenames will only run that many @@ -400,7 +384,7 @@ errors and exiting. Once all the training threads are done, [`tf.train.Coordinator.join`](../../api_docs/python/train.md#Coordinator.join) will return and you can exit cleanly. -### Filtering records or producing multiple examples per record <a class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record"></a> +### Filtering records or producing multiple examples per record Instead of examples with shapes `[x, y, z]`, you will produce a batch of examples with shape `[batch, x, y, z]`. The batch size can be 0 if you want to @@ -409,14 +393,14 @@ are producing multiple examples per record. Then simply set `enqueue_many=True` when calling one of the batching functions (such as `shuffle_batch` or `shuffle_batch_join`). -### Sparse input data <a class="md-anchor" id="AUTOGENERATED-sparse-input-data"></a> +### Sparse input data SparseTensors don't play well with queues. If you use SparseTensors you have to decode the string records using [`tf.parse_example`](../../api_docs/python/io_ops.md#parse_example) **after** batching (instead of using `tf.parse_single_example` before batching). -## Preloaded data <a class="md-anchor" id="AUTOGENERATED-preloaded-data"></a> +## Preloaded data This is only used for small data sets that can be loaded entirely in memory. There are two approaches: @@ -475,7 +459,7 @@ An MNIST example that preloads the data using constants can be found in You can compare these with the `fully_connected_feed` and `fully_connected_reader` versions above. -## Multiple input pipelines <a class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines"></a> +## Multiple input pipelines Commonly you will want to train on one dataset and evaluate (or "eval") on another. One way to do this is to actually have two separate processes: |