aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorGravatar Katherine Wu <kathywu@google.com>2017-10-24 10:08:03 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2017-10-24 10:11:45 -0700
commit58b071639d97afdbc5ac5e222a4be81dcb344962 (patch)
tree393af2e6b989038c5b59476f0a7ab0b0da42e289
parent86895d4a87a4d2cf2e1106b3fa3c176378d1029a (diff)
Added a dataset page to the api guide
PiperOrigin-RevId: 173272637
-rw-r--r--tensorflow/docs_src/api_guides/python/input_dataset.md81
-rw-r--r--tensorflow/docs_src/api_guides/python/reading_data.md23
-rw-r--r--tensorflow/docs_src/programmers_guide/datasets.md2
3 files changed, 98 insertions, 8 deletions
diff --git a/tensorflow/docs_src/api_guides/python/input_dataset.md b/tensorflow/docs_src/api_guides/python/input_dataset.md
new file mode 100644
index 0000000000..2798d76be9
--- /dev/null
+++ b/tensorflow/docs_src/api_guides/python/input_dataset.md
@@ -0,0 +1,81 @@
+# `Dataset` Input Pipeline
+[TOC]
+
+@{tf.data.Dataset} allows you to build complex input pipelines. See the
+@{$datasets$programmer's guide} for an in-depth explanation of how to use this
+API.
+
+## Reader classes
+
+Classes that create a dataset from input files.
+
+* @{tf.data.FixedLengthRecordDataset}
+* @{tf.data.TextLineDataset}
+* @{tf.data.TFRecordDataset}
+
+## Creating new datasets
+
+Static methods in `Dataset` that create new datasets.
+
+* @{tf.data.Dataset.from_generator}
+* @{tf.data.Dataset.from_sparse_tensor_slices}
+* @{tf.data.Dataset.from_tensor_slices}
+* @{tf.data.Dataset.from_tensors}
+* @{tf.data.Dataset.list_files}
+* @{tf.data.Dataset.range}
+* @{tf.data.Dataset.zip}
+
+## Transformations on existing datasets
+
+These functions transform an existing dataset, and return a new dataset. Calls
+can be chained together, as shown in the example below:
+
+```
+train_data = train_data.batch(100).shuffle().repeat()
+```
+
+* @{tf.data.Dataset.apply}
+* @{tf.data.Dataset.batch}
+* @{tf.data.Dataset.cache}
+* @{tf.data.Dataset.concatenate}
+* @{tf.data.Dataset.filter}
+* @{tf.data.Dataset.flat_map}
+* @{tf.data.Dataset.interleave}
+* @{tf.data.Dataset.map}
+* @{tf.data.Dataset.padded_batch}
+* @{tf.data.Dataset.prefetch}
+* @{tf.data.Dataset.repeat}
+* @{tf.data.Dataset.shard}
+* @{tf.data.Dataset.shuffle}
+* @{tf.data.Dataset.skip}
+* @{tf.data.Dataset.take}
+
+### Custom transformation functions
+
+Custom transformation functions can be applied to a `Dataset` using @{tf.data.Dataset.apply}. Below are custom transformation functions from `tf.contrib.data`:
+
+* @{tf.contrib.data.batch_and_drop_remainder}
+* @{tf.contrib.data.dense_to_sparse_batch}
+* @{tf.contrib.data.enumerate_dataset}
+* @{tf.contrib.data.group_by_window}
+* @{tf.contrib.data.ignore_errors}
+* @{tf.contrib.data.rejection_resample}
+* @{tf.contrib.data.sloppy_interleave}
+* @{tf.contrib.data.unbatch}
+
+## Iterating over datasets
+
+These functions make a @{tf.data.Iterator} from a `Dataset`.
+
+* @{tf.data.Dataset.make_initializable_iterator}
+* @{tf.data.Dataset.make_one_shot_iterator}
+
+The `Iterator` class also contains static methods that create a @{tf.data.Iterator} that can be used with multiple `Dataset` objects.
+
+* @{tf.data.Iterator.from_structure}
+* @{tf.data.Iterator.from_string_handle}
+
+## Extra functions from `tf.contrib.data`
+
+* @{tf.contrib.data.read_batch_features}
+
diff --git a/tensorflow/docs_src/api_guides/python/reading_data.md b/tensorflow/docs_src/api_guides/python/reading_data.md
index 8b6196ea34..7609ca91d0 100644
--- a/tensorflow/docs_src/api_guides/python/reading_data.md
+++ b/tensorflow/docs_src/api_guides/python/reading_data.md
@@ -3,16 +3,25 @@
Note: The preferred way to feed data into a tensorflow program is using the
@{$datasets$Datasets API}.
-There are three other methods of getting data into a TensorFlow program:
+There are four methods of getting data into a TensorFlow program:
+* `Dataset` API: Easily construct a complex input pipeline. (preferred method)
* Feeding: Python code provides the data when running each step.
-* Reading from files: an input pipeline reads the data from files
+* `QueueRunner`: a queue-based input pipeline reads the data from files
at the beginning of a TensorFlow graph.
* Preloaded data: a constant or variable in the TensorFlow graph holds
all the data (for small data sets).
[TOC]
+## Dataset API
+
+See the @{$datasets$programmer's guide} for an in-depth explanation of
+@{tf.data.Dataset}. The `Dataset` API allows you to extract and preprocess data
+from different input/file formats, and apply transformations such as batch,
+shuffle, and map to the dataset. This is an improved version of the old input
+methods, feeding and `QueueRunner`.
+
## Feeding
TensorFlow's feed mechanism lets you inject data into any Tensor in a
@@ -22,7 +31,7 @@ graph.
Supply feed data through the `feed_dict` argument to a run() or eval() call
that initiates computation.
-Note: "Feeding" is the least efficient way to feed data into a tensorflow
+Warning: "Feeding" is the least efficient way to feed data into a tensorflow
program and should only be used for small experiments and debugging.
```python
@@ -44,9 +53,9 @@ in
[`tensorflow/examples/tutorials/mnist/fully_connected_feed.py`](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/fully_connected_feed.py),
and is described in the @{$mechanics$MNIST tutorial}.
-## Reading from files
+## `QueueRunner`
-A typical pipeline for reading records from files has the following stages:
+A typical queue-based pipeline for reading records from files has the following stages:
1. The list of filenames
2. *Optional* filename shuffling
@@ -57,8 +66,8 @@ A typical pipeline for reading records from files has the following stages:
7. *Optional* preprocessing
8. Example queue
-Note: This section discusses implementing input pipelines using the
-queue-based APIs which can be cleanly replaced by the ${$datasets$Dataset API}.
+Warning: This section discusses implementing input pipelines using the
+queue-based APIs which can be cleanly replaced by the @{$datasets$Dataset API}.
### Filenames, shuffling, and epoch limits
diff --git a/tensorflow/docs_src/programmers_guide/datasets.md b/tensorflow/docs_src/programmers_guide/datasets.md
index fd1c927539..38e5612fb4 100644
--- a/tensorflow/docs_src/programmers_guide/datasets.md
+++ b/tensorflow/docs_src/programmers_guide/datasets.md
@@ -1,6 +1,6 @@
# Importing Data
-The `Dataset` API enables you to build complex input pipelines from
+The @{tf.data.Dataset$`Dataset`} API enables you to build complex input pipelines from
simple, reusable pieces. For example, the pipeline for an image model might
aggregate data from files in a distributed file system, apply random
perturbations to each image, and merge randomly selected images into a batch