diff options
Diffstat (limited to 'tensorflow/docs_src/guide/premade_estimators.md')
-rw-r--r-- | tensorflow/docs_src/guide/premade_estimators.md | 432 |
1 files changed, 0 insertions, 432 deletions
diff --git a/tensorflow/docs_src/guide/premade_estimators.md b/tensorflow/docs_src/guide/premade_estimators.md deleted file mode 100644 index 9b64d51b98..0000000000 --- a/tensorflow/docs_src/guide/premade_estimators.md +++ /dev/null @@ -1,432 +0,0 @@ -# Premade Estimators - -This document introduces the TensorFlow programming environment and shows you -how to solve the Iris classification problem in TensorFlow. - -## Prerequisites - -Prior to using the sample code in this document, you'll need to do the -following: - -* [Install TensorFlow](../install/index.md). -* If you installed TensorFlow with virtualenv or Anaconda, activate your - TensorFlow environment. -* Install or upgrade pandas by issuing the following command: - - pip install pandas - -## Getting the sample code - -Take the following steps to get the sample code we'll be going through: - -1. Clone the TensorFlow Models repository from GitHub by entering the following - command: - - git clone https://github.com/tensorflow/models - -1. Change directory within that branch to the location containing the examples - used in this document: - - cd models/samples/core/get_started/ - -The program described in this document is -[`premade_estimator.py`](https://github.com/tensorflow/models/blob/master/samples/core/get_started/premade_estimator.py). -This program uses -[`iris_data.py`](https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.py) -to fetch its training data. - -### Running the program - -You run TensorFlow programs as you would run any Python program. For example: - -``` bsh -python premade_estimator.py -``` - -The program should output training logs followed by some predictions against -the test set. For example, the first line in the following output shows that -the model thinks there is a 99.6% chance that the first example in the test -set is a Setosa. Since the test set expected Setosa, this appears to be -a good prediction. - -``` None -... -Prediction is "Setosa" (99.6%), expected "Setosa" - -Prediction is "Versicolor" (99.8%), expected "Versicolor" - -Prediction is "Virginica" (97.9%), expected "Virginica" -``` - -If the program generates errors instead of answers, ask yourself the following -questions: - -* Did you install TensorFlow properly? -* Are you using the correct version of TensorFlow? -* Did you activate the environment you installed TensorFlow in? (This is - only relevant in certain installation mechanisms.) - -## The programming stack - -Before getting into the details of the program itself, let's investigate the -programming environment. As the following illustration shows, TensorFlow -provides a programming stack consisting of multiple API layers: - -<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;"> -<img style="width:100%" src="../images/tensorflow_programming_environment.png"> -</div> - -We strongly recommend writing TensorFlow programs with the following APIs: - -* [Estimators](../guide/estimators.md), which represent a complete model. - The Estimator API provides methods to train the model, to judge the model's - accuracy, and to generate predictions. -* [Datasets for Estimators](../guide/datasets_for_estimators.md), which build a data input - pipeline. The Dataset API has methods to load and manipulate data, and feed - it into your model. The Dataset API meshes well with the Estimators API. - -## Classifying irises: an overview - -The sample program in this document builds and tests a model that -classifies Iris flowers into three different species based on the size of their -[sepals](https://en.wikipedia.org/wiki/Sepal) and -[petals](https://en.wikipedia.org/wiki/Petal). - -<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;"> -<img style="width:100%" - alt="Petal geometry compared for three iris species: Iris setosa, Iris virginica, and Iris versicolor" - src="../images/iris_three_species.jpg"> -</div> - -**From left to right, -[*Iris setosa*](https://commons.wikimedia.org/w/index.php?curid=170298) (by -[Radomil](https://commons.wikimedia.org/wiki/User:Radomil), CC BY-SA 3.0), -[*Iris versicolor*](https://commons.wikimedia.org/w/index.php?curid=248095) (by -[Dlanglois](https://commons.wikimedia.org/wiki/User:Dlanglois), CC BY-SA 3.0), -and [*Iris virginica*](https://www.flickr.com/photos/33397993@N05/3352169862) -(by [Frank Mayfield](https://www.flickr.com/photos/33397993@N05), CC BY-SA -2.0).** - -### The data set - -The Iris data set contains four features and one -[label](https://developers.google.com/machine-learning/glossary/#label). -The four features identify the following botanical characteristics of -individual Iris flowers: - -* sepal length -* sepal width -* petal length -* petal width - -Our model will represent these features as `float32` numerical data. - -The label identifies the Iris species, which must be one of the following: - -* Iris setosa (0) -* Iris versicolor (1) -* Iris virginica (2) - -Our model will represent the label as `int32` categorical data. - -The following table shows three examples in the data set: - -|sepal length | sepal width | petal length | petal width| species (label) | -|------------:|------------:|-------------:|-----------:|:---------------:| -| 5.1 | 3.3 | 1.7 | 0.5 | 0 (Setosa) | -| 5.0 | 2.3 | 3.3 | 1.0 | 1 (versicolor)| -| 6.4 | 2.8 | 5.6 | 2.2 | 2 (virginica) | - -### The algorithm - -The program trains a Deep Neural Network classifier model having the following -topology: - -* 2 hidden layers. -* Each hidden layer contains 10 nodes. - -The following figure illustrates the features, hidden layers, and predictions -(not all of the nodes in the hidden layers are shown): - -<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;"> -<img style="width:100%" - alt="A diagram of the network architecture: Inputs, 2 hidden layers, and outputs" - src="../images/custom_estimators/full_network.png"> -</div> - -### Inference - -Running the trained model on an unlabeled example yields three predictions, -namely, the likelihood that this flower is the given Iris species. The sum of -those output predictions will be 1.0. For example, the prediction on an -unlabeled example might be something like the following: - -* 0.03 for Iris Setosa -* 0.95 for Iris Versicolor -* 0.02 for Iris Virginica - -The preceding prediction indicates a 95% probability that the given unlabeled -example is an Iris Versicolor. - -## Overview of programming with Estimators - -An Estimator is TensorFlow's high-level representation of a complete model. It -handles the details of initialization, logging, saving and restoring, and many -other features so you can concentrate on your model. For more details see -[Estimators](../guide/estimators.md). - -An Estimator is any class derived from `tf.estimator.Estimator`. TensorFlow -provides a collection of -`tf.estimator` -(for example, `LinearRegressor`) to implement common ML algorithms. Beyond -those, you may write your own -[custom Estimators](../guide/custom_estimators.md). -We recommend using pre-made Estimators when just getting started. - -To write a TensorFlow program based on pre-made Estimators, you must perform the -following tasks: - -* Create one or more input functions. -* Define the model's feature columns. -* Instantiate an Estimator, specifying the feature columns and various - hyperparameters. -* Call one or more methods on the Estimator object, passing the appropriate - input function as the source of the data. - -Let's see how those tasks are implemented for Iris classification. - -## Create input functions - -You must create input functions to supply data for training, -evaluating, and prediction. - -An **input function** is a function that returns a `tf.data.Dataset` object -which outputs the following two-element tuple: - -* [`features`](https://developers.google.com/machine-learning/glossary/#feature) - A Python dictionary in which: - * Each key is the name of a feature. - * Each value is an array containing all of that feature's values. -* `label` - An array containing the values of the - [label](https://developers.google.com/machine-learning/glossary/#label) for - every example. - -Just to demonstrate the format of the input function, here's a simple -implementation: - -```python -def input_evaluation_set(): - features = {'SepalLength': np.array([6.4, 5.0]), - 'SepalWidth': np.array([2.8, 2.3]), - 'PetalLength': np.array([5.6, 3.3]), - 'PetalWidth': np.array([2.2, 1.0])} - labels = np.array([2, 1]) - return features, labels -``` - -Your input function may generate the `features` dictionary and `label` list any -way you like. However, we recommend using TensorFlow's Dataset API, which can -parse all sorts of data. At a high level, the Dataset API consists of the -following classes: - -<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;"> -<img style="width:100%" - alt="A diagram showing subclasses of the Dataset class" - src="../images/dataset_classes.png"> -</div> - -Where the individual members are: - -* `Dataset` - Base class containing methods to create and transform - datasets. Also allows you to initialize a dataset from data in memory, or from - a Python generator. -* `TextLineDataset` - Reads lines from text files. -* `TFRecordDataset` - Reads records from TFRecord files. -* `FixedLengthRecordDataset` - Reads fixed size records from binary files. -* `Iterator` - Provides a way to access one data set element at a time. - -The Dataset API can handle a lot of common cases for you. For example, -using the Dataset API, you can easily read in records from a large collection -of files in parallel and join them into a single stream. - -To keep things simple in this example we are going to load the data with -[pandas](https://pandas.pydata.org/), and build our input pipeline from this -in-memory data. - -Here is the input function used for training in this program, which is available -in [`iris_data.py`](https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.py): - -``` python -def train_input_fn(features, labels, batch_size): - """An input function for training""" - # Convert the inputs to a Dataset. - dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) - - # Shuffle, repeat, and batch the examples. - return dataset.shuffle(1000).repeat().batch(batch_size) -``` - -## Define the feature columns - -A [**feature column**](https://developers.google.com/machine-learning/glossary/#feature_columns) -is an object describing how the model should use raw input data from the -features dictionary. When you build an Estimator model, you pass it a list of -feature columns that describes each of the features you want the model to use. -The `tf.feature_column` module provides many options for representing data -to the model. - -For Iris, the 4 raw features are numeric values, so we'll build a list of -feature columns to tell the Estimator model to represent each of the four -features as 32-bit floating-point values. Therefore, the code to create the -feature column is: - -```python -# Feature columns describe how to use the input. -my_feature_columns = [] -for key in train_x.keys(): - my_feature_columns.append(tf.feature_column.numeric_column(key=key)) -``` - -Feature columns can be far more sophisticated than those we're showing here. We -detail feature columns [later on](../guide/feature_columns.md) in our Getting -Started guide. - -Now that we have the description of how we want the model to represent the raw -features, we can build the estimator. - - -## Instantiate an estimator - -The Iris problem is a classic classification problem. Fortunately, TensorFlow -provides several pre-made classifier Estimators, including: - -* `tf.estimator.DNNClassifier` for deep models that perform multi-class - classification. -* `tf.estimator.DNNLinearCombinedClassifier` for wide & deep models. -* `tf.estimator.LinearClassifier` for classifiers based on linear models. - -For the Iris problem, `tf.estimator.DNNClassifier` seems like the best choice. -Here's how we instantiated this Estimator: - -```python -# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer. -classifier = tf.estimator.DNNClassifier( - feature_columns=my_feature_columns, - # Two hidden layers of 10 nodes each. - hidden_units=[10, 10], - # The model must choose between 3 classes. - n_classes=3) -``` - -## Train, Evaluate, and Predict - -Now that we have an Estimator object, we can call methods to do the following: - -* Train the model. -* Evaluate the trained model. -* Use the trained model to make predictions. - -### Train the model - -Train the model by calling the Estimator's `train` method as follows: - -```python -# Train the Model. -classifier.train( - input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size), - steps=args.train_steps) -``` - -Here we wrap up our `input_fn` call in a -[`lambda`](https://docs.python.org/3/tutorial/controlflow.html) -to capture the arguments while providing an input function that takes no -arguments, as expected by the Estimator. The `steps` argument tells the method -to stop training after a number of training steps. - -### Evaluate the trained model - -Now that the model has been trained, we can get some statistics on its -performance. The following code block evaluates the accuracy of the trained -model on the test data: - -```python -# Evaluate the model. -eval_result = classifier.evaluate( - input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size)) - -print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result)) -``` - -Unlike our call to the `train` method, we did not pass the `steps` -argument to evaluate. Our `eval_input_fn` only yields a single -[epoch](https://developers.google.com/machine-learning/glossary/#epoch) of data. - -Running this code yields the following output (or something similar): - -```none -Test set accuracy: 0.967 -``` - -The `eval_result` dictionary also contains the `average_loss` (mean loss per sample), the `loss` (mean loss per mini-batch) and the value of the estimator's `global_step` (the number of training iterations it underwent). - -### Making predictions (inferring) from the trained model - -We now have a trained model that produces good evaluation results. -We can now use the trained model to predict the species of an Iris flower -based on some unlabeled measurements. As with training and evaluation, we make -predictions using a single function call: - -```python -# Generate predictions from the model -expected = ['Setosa', 'Versicolor', 'Virginica'] -predict_x = { - 'SepalLength': [5.1, 5.9, 6.9], - 'SepalWidth': [3.3, 3.0, 3.1], - 'PetalLength': [1.7, 4.2, 5.4], - 'PetalWidth': [0.5, 1.5, 2.1], -} - -predictions = classifier.predict( - input_fn=lambda:iris_data.eval_input_fn(predict_x, - batch_size=args.batch_size)) -``` - -The `predict` method returns a Python iterable, yielding a dictionary of -prediction results for each example. The following code prints a few -predictions and their probabilities: - - -``` python -template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"') - -for pred_dict, expec in zip(predictions, expected): - class_id = pred_dict['class_ids'][0] - probability = pred_dict['probabilities'][class_id] - - print(template.format(iris_data.SPECIES[class_id], - 100 * probability, expec)) -``` - -Running the preceding code yields the following output: - -``` None -... -Prediction is "Setosa" (99.6%), expected "Setosa" - -Prediction is "Versicolor" (99.8%), expected "Versicolor" - -Prediction is "Virginica" (97.9%), expected "Virginica" -``` - - -## Summary - -Pre-made Estimators are an effective way to quickly create standard models. - -Now that you've gotten started writing TensorFlow programs, consider the -following material: - -* [Checkpoints](../guide/checkpoints.md) to learn how to save and restore models. -* [Datasets for Estimators](../guide/datasets_for_estimators.md) to learn more about importing - data into your model. -* [Creating Custom Estimators](../guide/custom_estimators.md) to learn how to - write your own Estimator, customized for a particular problem. |