Add docs for the estimators with examples.

Change: 138557107
author: A. Unique TensorFlower <gardener@tensorflow.org> 2016-11-08 13:28:56 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2016-11-08 16:31:10 -0800
commit: e8e72f64dcd5401d923707063da8b23fbc5d65d8 (patch)
tree: 9560ca141b0b9629a18c1a3ed8aeb01d82a591b2
parent: 128ce5d5f7aa9ed4b4354fe78af1c2c686327640 (diff)
1 files changed, 248 insertions, 1 deletions
diff --git a/tensorflow/contrib/learn/python/learn/estimators/__init__.py b/tensorflow/contrib/learn/python/learn/estimators/__init__.py
index 620f23d1ff..ef5a16d7dc 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/__init__.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/__init__.py
@@ -13,7 +13,254 @@
 # limitations under the License.
 # ==============================================================================
 
-"""Estimators."""
+"""An estimator is a rule for calculating an estimate of a given quantity.
+
+# Estimators
+
+* **Estimators** are used to train and evaluate TensorFlow models.
+They support regression and classification problems.
+* **Classifiers** are functions that have discrete outcomes.
+* **Regressors** are functions that predict continuous values.
+
+## Choosing the correct estimator
+
+* For **Regression** problems use one of the following:
+    * `LinearRegressor`: Uses linear model.
+    * `DNNRegressor`: Uses DNN.
+    * `DNNLinearCombinedRegressor`: Uses Wide & Deep.
+    * `TensorForestEstimator`: Uses RandomForest. Use `.predict()` for
+       regression problems.
+    * `Estimator`: Use when you need a custom model.
+
+* For **Classification** problems use one of the following:
+    * `LinearClassifier`: Multiclass classifier using Linear model.
+    * `DNNClassifier`: Multiclass classifier using DNN.
+    * `DNNLinearCombinedClassifier`: Multiclass classifier using Wide & Deep.
+    * `TensorForestEstimator`: Uses RandomForest. Use `.predict_proba()` when
+      using for binary classification problems.
+    * `SVM`: Binary classifier using linear SVMs.
+    * `LogisticRegressor`: Use when you need custom model for binary
+       classification.
+    * `Estimator`: Use when you need custom model for N class classification.
+
+## Pre-canned Estimators
+
+Pre-canned estimators are machine learning estimators premade for general
+purpose problems. If you need more customization, you can always write your
+own custom estimator as described in the section below.
+
+Pre-canned estimators are tested and optimized for speed and quality.
+
+### Define the feature columns
+
+Here are some possible types of feature columns used as inputs to a pre-canned
+estimator.
+
+Feature columns may vary based on the estimator used. So you can see which
+feature columns are fed to each estimator in the below section.
+
+```python
+sparse_feature_a = sparse_column_with_keys(
+    column_name="sparse_feature_a", keys=["AB", "CD", ...])
+
+embedding_feature_a = embedding_column(
+    sparse_id_column=sparse_feature_a, dimension=3, combiner="sum")
+
+sparse_feature_b = sparse_column_with_hash_bucket(
+    column_name="sparse_feature_b", hash_bucket_size=1000)
+
+embedding_feature_b = embedding_column(
+    sparse_id_column=sparse_feature_b, dimension=16, combiner="sum")
+
+crossed_feature_a_x_b = crossed_column(
+    columns=[sparse_feature_a, sparse_feature_b], hash_bucket_size=10000)
+
+real_feature = real_valued_column("real_feature")
+real_feature_buckets = bucketized_column(
+    source_column=real_feature,
+    boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
+```
+
+### Create the pre-canned estimator
+
+DNNClassifier, DNNRegressor, and DNNLinearCombinedClassifier are all pretty
+similar to each other in how you use them. You can easily plug in an
+optimizer and/or regularization to those estimators.
+
+#### DNNClassifier
+
+A classifier for TensorFlow DNN models.
+
+```python
+my_features = [embedding_feature_a, embedding_feature_b]
+estimator = DNNClassifier(
+    feature_columns=my_features,
+    hidden_units=[1024, 512, 256],
+    optimizer=tf.train.ProximalAdagradOptimizer(
+        learning_rate=0.1,
+        l1_regularization_strength=0.001
+    ))
+```
+
+#### DNNRegressor
+
+A regressor for TensorFlow DNN models.
+
+```python
+my_features = [embedding_feature_a, embedding_feature_b]
+
+estimator = DNNRegressor(
+feature_columns=my_features,
+hidden_units=[1024, 512, 256])
+
+# Or estimator using the ProximalAdagradOptimizer optimizer with
+# regularization.
+estimator = DNNRegressor(
+    feature_columns=my_features,
+    hidden_units=[1024, 512, 256],
+    optimizer=tf.train.ProximalAdagradOptimizer(
+        learning_rate=0.1,
+        l1_regularization_strength=0.001
+    ))
+```
+
+#### DNNLinearCombinedClassifier
+
+A classifier for TensorFlow Linear and DNN joined training models.
+
+* Wide and deep model
+* Multi class (2 by default)
+
+```python
+my_linear_features = [crossed_feature_a_x_b]
+my_deep_features = [embedding_feature_a, embedding_feature_b]
+estimator = DNNLinearCombinedClassifier(
+      # Common settings
+      n_classes=n_classes,
+      weight_column_name=weight_column_name,
+      # Wide settings
+      linear_feature_columns=my_linear_features,
+      linear_optimizer=tf.train.FtrlOptimizer(...),
+      # Deep settings
+      dnn_feature_columns=my_deep_features,
+      dnn_hidden_units=[1000, 500, 100],
+      dnn_optimizer=tf.train.AdagradOptimizer(...))
+```
+
+#### LinearClassifier
+
+Train a linear model to classify instances into one of multiple possible
+classes. When number of possible classes is 2, this is binary classification.
+
+```python
+my_features = [sparse_feature_b, crossed_feature_a_x_b]
+estimator = LinearClassifier(
+    feature_columns=my_features,
+    optimizer=tf.train.FtrlOptimizer(
+        learning_rate=0.1,
+        l1_regularization_strength=0.001
+    ))
+```
+
+#### LinearRegressor
+
+Train a linear regression model to predict a label value given observation of
+feature values.
+
+```python
+my_features = [sparse_feature_b, crossed_feature_a_x_b]
+estimator = LinearRegressor(
+    feature_columns=my_features)
+```
+
+#### SVM - Support Vector Machine
+
+Support Vector Machine (SVM) model for binary classification.
+
+Currently only linear SVMs are supported.
+
+```python
+my_features = [real_feature, sparse_feature_a]
+estimator = SVM(
+    example_id_column='example_id',
+    feature_columns=my_features,
+    l2_regularization=10.0)
+```
+
+#### TensorForestEstimator
+
+Supports regression and binary classification.
+
+### Use the estimator
+
+There are two main functions for using estimators, one of which is for
+training, and one of which is for evaluation.
+You can specify different data sources for each one in order to use different
+datasets for train and eval.
+
+```python
+# Input builders
+def input_fn_train: # returns x, Y
+  ...
+estimator.fit(input_fn=input_fn_train)
+
+def input_fn_eval: # returns x, Y
+  ...
+estimator.evaluate(input_fn=input_fn_eval)
+estimator.predict(x=x)
+```
+
+## Creating Custom Estimator
+
+To create a custom `Estimator`, provide a function to `Estimator`'s
+constructor that builds your model (`model_fn`, below):
+
+
+```python
+estimator = tf.contrib.learn.Estimator(
+      model_fn=model_fn,
+      model_dir=model_dir)  # Where the model's data (e.g., checkpoints)
+                            # are saved.
+```
+
+Here is a skeleton of this function, with descriptions of its arguments and
+return values in the accompanying tables:
+
+```python
+def model_fn(features, targets, mode, params):
+   # Logic to do the following:
+   # 1. Configure the model via TensorFlow operations
+   # 2. Define the loss function for training/evaluation
+   # 3. Define the training operation/optimizer
+   # 4. Generate predictions
+   return predictions, loss, train_op
+```
+
+You may use `mode` and check against
+`tf.contrib.learn.ModeKeys.{TRAIN, EVAL, INFER}` to parameterize `model_fn`.
+
+In the Further Reading section below, there is an end-to-end TensorFlow
+tutorial for building a custom estimator.
+
+## Additional Estimators
+
+There are two additional estimators under
+`tensorflow.contrib.factorization.python.ops`:
+
+*   K-Means
+*   Gaussian mixture model (GMM) clustering
+
+## Further reading
+
+For further reading, there are several tutorials with relevant topics,
+including:
+
+*   [Overview of linear models](../../../tutorials/linear/overview.md)
+*   [Linear model tutorial](../../../tutorials/wide/index.md)
+*   [Wide and deep learning tutorial](../../../tutorials/wide_and_deep/index.md)
+*   [Custom estimator tutorial](../../../tutorials/estimators/index.md)
+*   [Building input functions](../../../tutorials/input_fn/index.md)
+"""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
author	A. Unique TensorFlower <gardener@tensorflow.org>	2016-11-08 13:28:56 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2016-11-08 16:31:10 -0800
commit	e8e72f64dcd5401d923707063da8b23fbc5d65d8 (patch)
tree	9560ca141b0b9629a18c1a3ed8aeb01d82a591b2
parent	128ce5d5f7aa9ed4b4354fe78af1c2c686327640 (diff)