diff options
author | 2017-10-24 10:08:03 -0700 | |
---|---|---|
committer | 2017-10-24 10:11:45 -0700 | |
commit | 58b071639d97afdbc5ac5e222a4be81dcb344962 (patch) | |
tree | 393af2e6b989038c5b59476f0a7ab0b0da42e289 | |
parent | 86895d4a87a4d2cf2e1106b3fa3c176378d1029a (diff) |
Added a dataset page to the api guide
PiperOrigin-RevId: 173272637
-rw-r--r-- | tensorflow/docs_src/api_guides/python/input_dataset.md | 81 | ||||
-rw-r--r-- | tensorflow/docs_src/api_guides/python/reading_data.md | 23 | ||||
-rw-r--r-- | tensorflow/docs_src/programmers_guide/datasets.md | 2 |
3 files changed, 98 insertions, 8 deletions
diff --git a/tensorflow/docs_src/api_guides/python/input_dataset.md b/tensorflow/docs_src/api_guides/python/input_dataset.md new file mode 100644 index 0000000000..2798d76be9 --- /dev/null +++ b/tensorflow/docs_src/api_guides/python/input_dataset.md @@ -0,0 +1,81 @@ +# `Dataset` Input Pipeline +[TOC] + +@{tf.data.Dataset} allows you to build complex input pipelines. See the +@{$datasets$programmer's guide} for an in-depth explanation of how to use this +API. + +## Reader classes + +Classes that create a dataset from input files. + +* @{tf.data.FixedLengthRecordDataset} +* @{tf.data.TextLineDataset} +* @{tf.data.TFRecordDataset} + +## Creating new datasets + +Static methods in `Dataset` that create new datasets. + +* @{tf.data.Dataset.from_generator} +* @{tf.data.Dataset.from_sparse_tensor_slices} +* @{tf.data.Dataset.from_tensor_slices} +* @{tf.data.Dataset.from_tensors} +* @{tf.data.Dataset.list_files} +* @{tf.data.Dataset.range} +* @{tf.data.Dataset.zip} + +## Transformations on existing datasets + +These functions transform an existing dataset, and return a new dataset. Calls +can be chained together, as shown in the example below: + +``` +train_data = train_data.batch(100).shuffle().repeat() +``` + +* @{tf.data.Dataset.apply} +* @{tf.data.Dataset.batch} +* @{tf.data.Dataset.cache} +* @{tf.data.Dataset.concatenate} +* @{tf.data.Dataset.filter} +* @{tf.data.Dataset.flat_map} +* @{tf.data.Dataset.interleave} +* @{tf.data.Dataset.map} +* @{tf.data.Dataset.padded_batch} +* @{tf.data.Dataset.prefetch} +* @{tf.data.Dataset.repeat} +* @{tf.data.Dataset.shard} +* @{tf.data.Dataset.shuffle} +* @{tf.data.Dataset.skip} +* @{tf.data.Dataset.take} + +### Custom transformation functions + +Custom transformation functions can be applied to a `Dataset` using @{tf.data.Dataset.apply}. Below are custom transformation functions from `tf.contrib.data`: + +* @{tf.contrib.data.batch_and_drop_remainder} +* @{tf.contrib.data.dense_to_sparse_batch} +* @{tf.contrib.data.enumerate_dataset} +* @{tf.contrib.data.group_by_window} +* @{tf.contrib.data.ignore_errors} +* @{tf.contrib.data.rejection_resample} +* @{tf.contrib.data.sloppy_interleave} +* @{tf.contrib.data.unbatch} + +## Iterating over datasets + +These functions make a @{tf.data.Iterator} from a `Dataset`. + +* @{tf.data.Dataset.make_initializable_iterator} +* @{tf.data.Dataset.make_one_shot_iterator} + +The `Iterator` class also contains static methods that create a @{tf.data.Iterator} that can be used with multiple `Dataset` objects. + +* @{tf.data.Iterator.from_structure} +* @{tf.data.Iterator.from_string_handle} + +## Extra functions from `tf.contrib.data` + +* @{tf.contrib.data.read_batch_features} + diff --git a/tensorflow/docs_src/api_guides/python/reading_data.md b/tensorflow/docs_src/api_guides/python/reading_data.md index 8b6196ea34..7609ca91d0 100644 --- a/tensorflow/docs_src/api_guides/python/reading_data.md +++ b/tensorflow/docs_src/api_guides/python/reading_data.md @@ -3,16 +3,25 @@ Note: The preferred way to feed data into a tensorflow program is using the @{$datasets$Datasets API}. -There are three other methods of getting data into a TensorFlow program: +There are four methods of getting data into a TensorFlow program: +* `Dataset` API: Easily construct a complex input pipeline. (preferred method) * Feeding: Python code provides the data when running each step. -* Reading from files: an input pipeline reads the data from files +* `QueueRunner`: a queue-based input pipeline reads the data from files at the beginning of a TensorFlow graph. * Preloaded data: a constant or variable in the TensorFlow graph holds all the data (for small data sets). [TOC] +## Dataset API + +See the @{$datasets$programmer's guide} for an in-depth explanation of +@{tf.data.Dataset}. The `Dataset` API allows you to extract and preprocess data +from different input/file formats, and apply transformations such as batch, +shuffle, and map to the dataset. This is an improved version of the old input +methods, feeding and `QueueRunner`. + ## Feeding TensorFlow's feed mechanism lets you inject data into any Tensor in a @@ -22,7 +31,7 @@ graph. Supply feed data through the `feed_dict` argument to a run() or eval() call that initiates computation. -Note: "Feeding" is the least efficient way to feed data into a tensorflow +Warning: "Feeding" is the least efficient way to feed data into a tensorflow program and should only be used for small experiments and debugging. ```python @@ -44,9 +53,9 @@ in [`tensorflow/examples/tutorials/mnist/fully_connected_feed.py`](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/fully_connected_feed.py), and is described in the @{$mechanics$MNIST tutorial}. -## Reading from files +## `QueueRunner` -A typical pipeline for reading records from files has the following stages: +A typical queue-based pipeline for reading records from files has the following stages: 1. The list of filenames 2. *Optional* filename shuffling @@ -57,8 +66,8 @@ A typical pipeline for reading records from files has the following stages: 7. *Optional* preprocessing 8. Example queue -Note: This section discusses implementing input pipelines using the -queue-based APIs which can be cleanly replaced by the ${$datasets$Dataset API}. +Warning: This section discusses implementing input pipelines using the +queue-based APIs which can be cleanly replaced by the @{$datasets$Dataset API}. ### Filenames, shuffling, and epoch limits diff --git a/tensorflow/docs_src/programmers_guide/datasets.md b/tensorflow/docs_src/programmers_guide/datasets.md index fd1c927539..38e5612fb4 100644 --- a/tensorflow/docs_src/programmers_guide/datasets.md +++ b/tensorflow/docs_src/programmers_guide/datasets.md @@ -1,6 +1,6 @@ # Importing Data -The `Dataset` API enables you to build complex input pipelines from +The @{tf.data.Dataset$`Dataset`} API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch |