aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/g3doc/tutorials
diff options
context:
space:
mode:
authorGravatar A. Unique TensorFlower <gardener@tensorflow.org>2016-06-28 08:24:35 -0800
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2016-06-28 09:33:01 -0700
commited1ab8d654020ca534bbbe2365d4a2a1517e8a2e (patch)
treec7a7509220aa6cf7fe1859bc39af0e942c25a9d7 /tensorflow/g3doc/tutorials
parent7dd3a2bb99eb682f13b670975f7f4e9ea5b7b7b4 (diff)
Publishes tutorials for tf.contrib.learn linear models and wide and deep models.
Change: 126082003
Diffstat (limited to 'tensorflow/g3doc/tutorials')
-rw-r--r--tensorflow/g3doc/tutorials/wide/index.md482
-rw-r--r--tensorflow/g3doc/tutorials/wide_and_deep/index.md275
2 files changed, 757 insertions, 0 deletions
diff --git a/tensorflow/g3doc/tutorials/wide/index.md b/tensorflow/g3doc/tutorials/wide/index.md
new file mode 100644
index 0000000000..5dd409f4e4
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/wide/index.md
@@ -0,0 +1,482 @@
+# TensorFlow Linear Model Tutorial
+
+In this tutorial, we will use the TF.Learn API in TensorFlow to solve a binary
+classification problem: Given census data about a person such as age, gender,
+education and occupation (the features), we will try to predict whether or not
+the person earns more than 50,000 dollars a year (the target label). We will
+train a **logistic regression** model, and given an individual's information our
+model will output a number between 0 and 1, which can be interpreted as the
+probability that the individual has an annual income of over 50,000 dollars.
+
+## Setup
+
+To try the code for this tutorial:
+
+1. [Install TensorFlow](../../get_started/os_setup.md) if you haven't
+already.
+
+2. Download [the tutorial code](
+https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py).
+
+3. Install the pandas data analysis library. tf.learn doesn't require pandas, but it does support it, and this tutorial uses pandas. To install pandas:
+ 1. Get `pip`:
+
+ ```shell
+ # Ubuntu/Linux 64-bit
+ $ sudo apt-get install python-pip python-dev
+
+ # Mac OS X
+ $ sudo easy_install pip
+ $ sudo easy_install --upgrade six
+ ```
+
+ 2. Use `pip` to install pandas:
+
+ ```shell
+ $ sudo pip install pandas
+ ```
+
+ If you have trouble installing pandas, consult the [instructions]
+(http://pandas.pydata.org/pandas-docs/stable/install.html) on the pandas site.
+
+4. Execute the tutorial code with the following command to train the linear
+model described in this tutorial:
+
+ ```shell
+ $ python wide_n_deep_tutorial.py --model_type=wide
+ ```
+
+Read on to find out how this code builds its linear model.
+
+## Reading The Census Data
+
+The dataset we'll be using is the [Census Income Dataset]
+(https://archive.ics.uci.edu/ml/datasets/Census+Income). You can download the
+[training data]
+(https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data) and
+[test data]
+(https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test)
+manually or use code like this:
+
+```python
+import tempfile
+import urllib
+train_file = tempfile.NamedTemporaryFile()
+test_file = tempfile.NamedTemporaryFile()
+urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
+urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
+```
+
+Once the CSV files are downloaded, let's read them into [Pandas]
+(http://pandas.pydata.org/) dataframes.
+
+```python
+import pandas as pd
+COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
+ "marital_status", "occupation", "relationship", "race", "gender",
+ "capital_gain", "capital_loss", "hours_per_week", "native_country",
+ "income_bracket"]
+df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
+df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
+```
+
+Since the task is a binary classification problem, we'll construct a label
+column named "label" whose value is 1 if the income is over 50K, and 0
+otherwise.
+
+```python
+LABEL_COLUMN = "label"
+df_train[LABEL_COLUMN] = (df_train["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)
+df_test[LABEL_COLUMN] = (df_test["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)
+```
+
+Next, let's take a look at the dataframe and see which columns we can use to
+predict the target label. The columns can be grouped into two types—categorical
+and continuous columns:
+
+* A column is called **categorical** if its value can only be one of the
+ categories in a finite set. For example, the native country of a person
+ (U.S., India, Japan, etc.) or the education level (high school, college,
+ etc.) are categorical columns.
+* A column is called **continuous** if its value can be any numerical value in
+ a continuous range. For example, the capital gain of a person (e.g. $14,084)
+ is a continuous column.
+
+```python
+CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation",
+ "relationship", "race", "gender", "native_country"]
+CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"]
+```
+
+Here's a list of columns available in the Census Income dataset:
+
+| Column Name | Type | Description | {.sortable}
+| -------------- | ----------- | --------------------------------- |
+| age | Continuous | The age of the individual |
+| workclass | Categorical | The type of employer the |
+: : : individual has (government, :
+: : : military, private, etc.). :
+| fnlwgt | Continuous | The number of people the census |
+: : : takers believe that observation :
+: : : represents (sample weight). This :
+: : : variable will not be used. :
+| education | Categorical | The highest level of education |
+: : : achieved for that individual. :
+| education_num | Continuous | The highest level of education in |
+: : : numerical form. :
+| marital_status | Categorical | Marital status of the individual. |
+| occupation | Categorical | The occupation of the individual. |
+| relationship | Categorical | Wife, Own-child, Husband, |
+: : : Not-in-family, Other-relative, :
+: : : Unmarried. :
+| race | Categorical | White, Asian-Pac-Islander, |
+: : : Amer-Indian-Eskimo, Other, Black. :
+| gender | Categorical | Female, Male. |
+| capital_gain | Continuous | Capital gains recorded. |
+| capital_loss | Continuous | Capital Losses recorded. |
+| hours_per_week | Continuous | Hours worked per week. |
+| native_country | Categorical | Country of origin of the |
+: : : individual. :
+| income | Categorical | ">50K" or "<=50K", meaning |
+: : : whether the person makes more :
+: : : than \$50,000 annually. :
+
+## Converting Data into Tensors
+
+When building a TF.Learn model, the input data is specified by means of an Input
+Builder function. This builder function will not be called until it is later
+passed to TF.Learn methods such as `fit` and `evaluate`. The purpose of this
+function is to construct the input data, which is represented in the form of
+[Tensors]
+(https://www.tensorflow.org/versions/r0.9/api_docs/python/framework.html#Tensor)
+or [SparseTensors]
+(https://www.tensorflow.org/versions/r0.9/api_docs/python/sparse_ops.html#SparseTensor).
+In more detail, the Input Builder function returns the following as a pair:
+
+1. `feature_cols`: A dict from feature column names to `Tensors` or
+ `SparseTensors`.
+2. `label`: A `Tensor` containing the label column.
+
+The keys of the `feature_cols` will be used to when construct columns in the
+next section. Because we want to call the `fit` and `evaluate` methods with
+different data, we define two different input builder functions,
+`train_input_fn` and `test_input_fn` which are identical except that they pass
+different data to `input_fn`. Note that `input_fn` will be called while
+constructing the TensorFlow graph, not while running the graph. What it is
+returning is a representation of the input data as the fundamental unit of
+TensorFlow computations, a `Tensor` (or `SparseTensor`).
+
+Our model represents the input data as *constant* tensors, meaning that the
+tensor represents a constant value, in this case the values of a particular
+column of `df_train` or `df_test`. This is the simplest way to pass data into
+TensorFlow. Another more advanced way to represent input data would be to
+construct an [Input Reader]
+(https://www.tensorflow.org/versions/r0.9/api_docs/python/io_ops.html#inputs-and-readers)
+that represents a file or other data source, and iterates through the file as
+TensorFlow runs the graph. Each continuous column in the train or test dataframe
+will be converted into a `Tensor`, which in general is a good format to
+represent dense data. For cateogorical data, we must represent the data as a
+`SparseTensor`. This data format is good for representing sparse data.
+
+```python
+import tensorflow as tf
+
+def input_fn(df):
+ # Creates a dictionary mapping from each continuous feature column name (k) to
+ # the values of that column stored in a constant Tensor.
+ continuous_cols = {k: tf.constant(df[k].values)
+ for k in CONTINUOUS_COLUMNS}
+ # Creates a dictionary mapping from each categorical feature column name (k)
+ # to the values of that column stored in a tf.SparseTensor.
+ categorical_cols = {k: tf.SparseTensor(
+ indices=[[i, 0] for i in range(df[k].size)],
+ values=df[k].values,
+ shape=[df[k].size, 1])
+ for k in CATEGORICAL_COLUMNS}
+ # Merges the two dictionaries into one.
+ feature_cols = dict(continuous_cols.items() + categorical_cols.items())
+ # Converts the label column into a constant Tensor.
+ label = tf.constant(df[LABEL_COLUMN].values)
+ # Returns the feature columns and the label.
+ return feature_cols, label
+
+def train_input_fn():
+ return input_fn(df_train)
+
+def eval_input_fn():
+ return input_fn(df_test)
+```
+
+## Selecting and Engineering Features for the Model
+
+Selecting and crafting the right set of feature columns is key to learning an
+effective model. A **feature column** can be either one of the raw columns in
+the original dataframe (let's call them **base feature columns**), or any new
+columns created based on some transformations defined over one or multiple base
+columns (let's call them **derived feature columns**). Basically, "feature
+column" is an abstract concept of any raw or derived variable that can be used
+to predict the target label.
+
+### Base Categorical Feature Columns
+
+To define a feature column for a categorical feature, we can create a
+`SparseColumn` using the TF.Learn API. If you know the set of all possible
+feature values of a column and there are only a few of them, you can use
+`sparse_column_with_keys`. Each key in the list will get assigned an
+auto-incremental ID starting from 0. For example, for the `gender` column we can
+assign the feature string "female" to an integer ID of 0 and "male" to 1 by
+doing:
+
+```python
+gender = tf.contrib.layers.sparse_column_with_keys(
+ column_name="gender", keys=["female", "male"])
+```
+
+What if we don't know the set of possible values in advance? Not a problem. We
+can use `sparse_column_with_hash_bucket` instead:
+
+```python
+education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
+```
+
+What will happen is that each possible value in the feature column `education`
+will be hashed to an integer ID as we encounter them in training. See an example
+illustration below:
+
+ID | Feature
+--- | -------------
+... |
+9 | `"Bachelors"`
+... |
+103 | `"Doctorate"`
+... |
+375 | `"Masters"`
+... |
+
+No matter which way we choose to define a `SparseColumn`, each feature string
+will be mapped into an integer ID by looking up a fixed mapping or by hashing.
+Note that hashing collisions are possible, but may not significantly impact the
+model quality. Under the hood, the `LinearModel` class is responsible for
+managing the mapping and creating `tf.Variable` to store the model parameters
+(also known as model weights) for each feature ID. The model parameters will be
+learned through the model training process we'll go through later.
+
+We'll do the similar trick to define the other categorical features:
+
+```python
+race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=[
+ "Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
+marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
+relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
+workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
+occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
+native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
+```
+
+### Base Continuous Feature Columns
+
+Similarly, we can define a `RealValuedColumn` for each continuous feature column
+that we want to use in the model:
+
+```python
+age = tf.contrib.layers.real_valued_column("age")
+education_num = tf.contrib.layers.real_valued_column("education_num")
+capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
+capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
+hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
+```
+
+### Making Continuous Features Categorical through Bucketization
+
+Sometimes the relationship between a continuous feature and the label is not
+linear. As an hypothetical example, a person's income may grow with age in the
+early stage of one's career, then the growth may slow at some point, and finally
+the income decreases after retirement. In this scenario, using the raw `age` as
+a real-valued feature column might not be a good choice because the model can
+only learn one of the three cases:
+
+1. Income always increases at some rate as age grows (positive correlation),
+1. Income always decreases at some rate as age grows (negative correlation), or
+1. Income stays the same no matter at what age (no correlation)
+
+If we want to learn the fine-grained correlation between income and each age
+group seperately, we can leverage **bucketization**. Bucketization is a process
+of dividing the entire range of a continuous feature into a set of consecutive
+bins/buckets, and then converting the original numerical feature into a bucket
+ID (as a categorical feature) depending on which bucket that value falls into.
+So, we can define a `bucketized_column` over `age` as:
+
+```python
+age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
+```
+
+where the `boundaries` is a list of bucket boundaries. In this case, there are
+10 boundaries, resulting in 11 age group buckets (from age 17 and below, 18-24,
+25-29, ..., to 65 and over).
+
+### Intersecting Multiple Columns with CrossedColumn
+
+Using each base feature column separately may not be enough to explain the data.
+For example, the correlation between education and the label (earning > 50,000
+dollars) may be different for different occupations. Therefore, if we only learn
+a single model weight for `education="Bachelors"` and `education="Masters"`, we
+won't be able to capture every single education-occupation combination (e.g.
+distinguishing between `education="Bachelors" AND occupation="Exec-managerial"`
+and `education="Bachelors" AND occupation="Craft-repair"`). To learn the
+differences between different feature combinations, we can add **crossed feature
+columns** to the model.
+
+```python
+education_x_occupation = tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4))
+```
+
+We can also create a `CrossedColumn` over more than two columns. Each
+constituent column can be either a base feature column that is categorical
+(`SparseColumn`), a bucketized real-valued feature column (`BucketizedColumn`),
+or even another `CrossColumn`. Here's an example:
+
+```python
+age_buckets_x_race_x_occupation = tf.contrib.layers.crossed_column(
+ [age_buckets, race, occupation], hash_bucket_size=int(1e6))
+```
+
+## Defining The Logistic Regression Model
+
+After processing the input data and defining all the feature columns, we're now
+ready to put them all together and build a Logistic Regression model. In the
+previous section we've seen several types of base and derived feature columns,
+including:
+
+* `SparseColumn`
+* `RealValuedColumn`
+* `BucketizedColumn`
+* `CrossedColumn`
+
+All of these are subclasses of the abstract `FeatureColumn` class, and can be
+added to the `feature_columns` field of a model:
+
+```python
+model_dir = tempfile.mkdtemp()
+m = tf.contrib.learn.LinearClassifier(feature_columns=[
+ gender, native_country, education, occupation, workclass, marital_status, race,
+ age_buckets, education_x_occupation, age_buckets_x_race_x_occupation],
+ model_dir=model_dir)
+```
+
+The model also automatically learns a bias term, which controls the prediction
+one would make without observing any features (see the section "How Logistic
+Regression Works" for more explanations). The learned model files will be stored
+in `model_dir`.
+
+## Training and Evaluating Our Model
+
+After adding all the features to the model, now let's look at how to actually
+train the model. Training a model is just a one-liner using the TF.Learn API:
+
+```python
+m.fit(input_fn=train_input_fn, steps=200)
+```
+
+After the model is trained, we can evaluate how good our model is at predicting
+the labels of the holdout data:
+
+```python
+results = m.evaluate(input_fn=eval_input_fn, steps=1)
+for key in sorted(results):
+ print "%s: %s" % (key, results[key])
+```
+
+The first line of the output should be something like `accuracy: 0.83557522`,
+which means the accuracy is 83.6%. Feel free to try more features and
+transformations and see if you can do even better!
+
+If you'd like to see a working end-to-end example, you can download our [example
+code]
+(https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py)
+and set the `model_type` flag to `wide`.
+
+## Adding Regularization to Prevent Overfitting
+
+Regularization is a technique used to avoid **overfitting**. Overfitting happens
+when your model does well on the data it is trained on, but worse on test data
+that the model has not seen before, such as live traffic. Overfitting generally
+occurs when a model is excessively complex, such as having too many parameters
+relative to the number of observed training data. Regularization allows for you
+to control your model's complexity and makes the model more generalizable to
+unseen data.
+
+In the Linear Model library, you can add L1 and L2 regularizations to the model
+as:
+
+```
+m = tf.contrib.learn.LinearClassifier(feature_columns=[
+ gender, native_country, education, occupation, workclass, marital_status, race,
+ age_buckets, education_x_occupation, age_buckets_x_race_x_occupation],
+ optimizer=tf.train.FtrlOptimizer(
+ learning_rate=0.1,
+ l1_regularization_strength=1.0,
+ l2_regularization_strength=1.0),
+ model_dir=model_dir)
+```
+
+One important difference between L1 and L2 regularization is that L1
+regularization tends to make model weights stay at zero, creating sparser
+models, whereas L2 regularization also tries to make the model weights closer to
+zero but not necessarily zero. Therefore, if you increase the strength of L1
+regularization, you will have a smaller model size because many of the model
+weights will be zero. This is often desirable when the feature space is very
+large but sparse, and when there are resource constraints that prevent you from
+serving a model that is too large.
+
+In practice, you should try various combinations of L1, L2 regularization
+strengths and find the best parameters that best control overfitting and give
+you a desirable model size.
+
+## How Logistic Regression Works
+
+Finally, let's take a minute to talk about what the Logistic Regression model
+actually looks like in case you're not already familiar with it. We'll denote
+the label as $$Y$$, and the set of observed features as a feature vector
+$$\mathbf{x}=[x_1, x_2, ..., x_d]$$. We define $$Y=1$$ if an individual earned >
+50,000 dollars and $$Y=0$$ otherwise. In Logistic Regression, the probability of
+the label being positive ($$Y=1$$) given the features $$\mathbf{x}$$ is given
+as:
+
+$$ P(Y=1|\mathbf{x}) = \frac{1}{1+\exp(-(\mathbf{w}^T\mathbf{x}+b))}$$
+
+where $$\mathbf{w}=[w_1, w_2, ..., w_d]$$ are the model weights for the features
+$$\mathbf{x}=[x_1, x_2, ..., x_d]$$. $$b$$ is a constant that is often called
+the **bias** of the model. The equation consists of two parts—A linear model and
+a logistic function:
+
+* **Linear Model**: First, we can see that $$\mathbf{w}^T\mathbf{x}+b = b +
+ w_1x_1 + ... +w_dx_d$$ is a linear model where the output is a linear
+ function of the input features $$\mathbf{x}$$. The bias $$b$$ is the
+ prediction one would make without observing any features. The model weight
+ $$w_i$$ reflects how the feature $$x_i$$ is correlated with the positive
+ label. If $$x_i$$ is positively correlated with the positive label, the
+ weight $$w_i$$ increases, and the probability $$P(Y=1|\mathbf{x})$$ will be
+ closer to 1. On the other hand, if $$x_i$$ is negatively correlated with the
+ positive label, then the weight $$w_i$$ decreases and the probability
+ $$P(Y=1|\mathbf{x})$$ will be closer to 0.
+
+* **Logistic Function**: Second, we can see that there's a logistic function
+ (also known as the sigmoid function) $$S(t) = 1/(1+\exp(-t))$$ being applied
+ to the linear model. The logistic function is used to convert the output of
+ the linear model $$\mathbf{w}^T\mathbf{x}+b$$ from any real number into the
+ range of $$[0, 1]$$, which can be interpreted as a probability.
+
+Model training is an optimization problem: The goal is to find a set of model
+weights (i.e. model parameters) to minimize a **loss function** defined over the
+training data, such as logistic loss for Logistic Regression models. The loss
+function measures the discrepancy between the ground-truth label and the model's
+prediction. If the prediction is very close to the ground-truth label, the loss
+value will be low; if the prediction is very far from the label, then the loss
+value would be high.
+
+## Learn Deeper
+
+If you're interested in learning more, check out our [Wide & Deep Learning
+Tutorial](../wide_and_deep/) where we'll show you how to combine
+the strengths of linear models and deep neural networks by jointly training them
+using the TF.Learn API.
diff --git a/tensorflow/g3doc/tutorials/wide_and_deep/index.md b/tensorflow/g3doc/tutorials/wide_and_deep/index.md
new file mode 100644
index 0000000000..910e91e1d0
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/wide_and_deep/index.md
@@ -0,0 +1,275 @@
+# TensorFlow Wide & Deep Learning Tutorial
+
+In the previous [TensorFlow Linear Model Tutorial](../wide/),
+we trained a logistic regression model to predict the probability that the
+individual has an annual income of over 50,000 dollars using the [Census Income
+Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income). TensorFlow is
+great for training deep neural networks too, and you might be thinking which one
+you should choose—Well, why not both? Would it be possible to combine the
+strengths of both in one model?
+
+In this tutorial, we'll introduce how to use the TF.Learn API to jointly train a
+wide linear model and a deep feed-forward neural network. This approach combines
+the strengths of memorization and generalization. It's useful for generic
+large-scale regression and classification problems with sparse input features
+(e.g., categorical features with a large number of possible feature values). If
+you're interested in learning more about how Wide & Deep Learning works, please
+check out our [research paper](http://arxiv.org/abs/1606.07792).
+
+![Wide & Deep Spectrum of Models]
+(../../images/wide_n_deep.svg "Wide & Deep")
+
+The figure above shows a comparison of a wide model (logistic regression with
+sparse features and transformations), a deep model (feed-forward neural network
+with an embedding layer and several hidden layers), and a Wide & Deep model
+(joint training of both). At a high level, there are only 3 steps to configure a
+wide, deep, or Wide & Deep model using the TF.Learn API:
+
+1. Select features for the wide part: Choose the sparse base columns and
+ crossed columns you want to use.
+1. Select features for the deep part: Choose the continuous columns, the
+ embedding dimension for each categorical column, and the hidden layer sizes.
+1. Put them all together in a Wide & Deep model
+ (`DNNLinearCombinedClassifier`).
+
+And that's it! Let's go through a simple example.
+
+## Setup
+
+To try the code for this tutorial:
+
+1. [Install TensorFlow](../../get_started/os_setup.md) if you haven't
+already.
+
+2. Download [the tutorial code](
+https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py).
+
+3. Install the pandas data analysis library. tf.learn doesn't require pandas, but it does support it, and this tutorial uses pandas. To install pandas:
+ 1. Get `pip`:
+
+ ```shell
+ # Ubuntu/Linux 64-bit
+ $ sudo apt-get install python-pip python-dev
+
+ # Mac OS X
+ $ sudo easy_install pip
+ $ sudo easy_install --upgrade six
+ ```
+
+ 2. Use `pip` to install pandas:
+
+ ```shell
+ $ sudo pip install pandas
+ ```
+
+ If you have trouble installing pandas, consult the [instructions]
+(http://pandas.pydata.org/pandas-docs/stable/install.html) on the pandas site.
+
+4. Execute the tutorial code with the following command to train the linear
+model described in this tutorial:
+
+ ```shell
+ $ python wide_n_deep_tutorial.py --model_type=wide_n_deep
+ ```
+
+Read on to find out how this code builds its linear model.
+
+
+## Define Base Feature Columns
+
+First, let's define the base categorical and continuous feature columns that
+we'll use. These base columns will be the building blocks used by both the wide
+part and the deep part of the model.
+
+```python
+import tensorflow as tf
+
+# Categorical base columns.
+gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
+race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=[
+ "Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
+education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
+marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
+relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
+workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
+occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
+native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
+
+# Continuous base columns.
+age = tf.contrib.layers.real_valued_column("age")
+age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
+education_num = tf.contrib.layers.real_valued_column("education_num")
+capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
+capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
+hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
+```
+
+## The Wide Model: Linear Model with Crossed Feature Columns
+
+The wide model is a linear model with a wide set of sparse and crossed feature
+columns:
+
+```python
+wide_columns = [
+ gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets,
+ tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)),
+ tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)),
+ tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]
+```
+
+Wide models with crossed feature columns can memorize sparse interactions
+between features effectively. That being said, one limitation of crossed feature
+columns is that they do not generalize to feature combinations that have not
+appeared in the training data. Let's add a deep model with embeddings to fix
+that.
+
+## The Deep Model: Neural Network with Embeddings
+
+The deep model is a feed-forward neural network, as shown in the previous
+figure. Each of the sparse, high-dimensional categorical features are first
+converted into a low-dimensional and dense real-valued vector, often referred to
+as an embedding vector. These low-dimensional dense embedding vectors are
+concatenated with the continuous features, and then fed into the hidden layers
+of a neural network in the forward pass. The embedding values are initialized
+randomly, and are trained along with all other model parameters to minimize the
+training loss. If you're interested in learning more about embeddings, check out
+the TensorFlow tutorial on [Vector Representations of Words]
+(https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/index.html), or
+[Word Embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.
+
+We'll configure the embeddings for the categorical columns using
+`embedding_column`, and concatenate them with the continuous columns:
+
+```python
+deep_columns = [
+ tf.contrib.layers.embedding_column(workclass, dimension=8),
+ tf.contrib.layers.embedding_column(education, dimension=8),
+ tf.contrib.layers.embedding_column(marital_status, dimension=8),
+ tf.contrib.layers.embedding_column(gender, dimension=8),
+ tf.contrib.layers.embedding_column(relationship, dimension=8),
+ tf.contrib.layers.embedding_column(race, dimension=8),
+ tf.contrib.layers.embedding_column(native_country, dimension=8),
+ tf.contrib.layers.embedding_column(occupation, dimension=8),
+ age, education_num, capital_gain, capital_loss, hours_per_week]
+```
+
+The higher the `dimension` of the embedding is, the more degrees of freedom the
+model will have to learn the representations of the features. For simplicity, we
+set the dimension to 8 for all feature columns here. Empirically, a more
+informed decision for the number of dimensions is to start with a value on the
+order of $$k\log_2(n)$$ or $$k\sqrt[4]n$$, where $$n$$ is the number of unique
+features in a feature column and $$k$$ is a small constant (usually smaller than
+10).
+
+Through dense embeddings, deep models can generalize better and make predictions
+on feature pairs that were previously unseen in the training data. However, it
+is difficult to learn effective low-dimensional representations for feature
+columns when the underlying interaction matrix between two feature columns is
+sparse and high-rank. In such cases, the interaction between most feature pairs
+should be zero except a few, but dense embeddings will lead to nonzero
+predictions for all feature pairs, and thus can over-generalize. On the other
+hand, linear models with crossed features can memorize these “exception rules”
+effectively with fewer model parameters.
+
+Now, let's see how to jointly train wide and deep models and allow them to
+complement each other’s strengths and weaknesses.
+
+## Combining Wide and Deep Models into One
+
+The wide models and deep models are combined by summing up their final output
+log odds as the prediction, then feeding the prediction to a logistic loss
+function. All the graph definition and variable allocations have already been
+handled for you under the hood, so you simply need to create a
+`DNNLinearCombinedClassifier`:
+
+```python
+import tempfile
+model_dir = tempfile.mkdtemp()
+m = tf.contrib.learn.DNNLinearCombinedClassifier(
+ model_dir=model_dir,
+ linear_feature_columns=wide_columns,
+ dnn_feature_columns=deep_columns,
+ dnn_hidden_units=[100, 50])
+```
+
+## Training and Evaluating The Model
+
+Before we train the model, let's read in the Census dataset as we did in the
+[TensorFlow Linear Model tutorial](../wide/). The code for
+input data processing is provided here again for your convenience:
+
+```python
+import pandas as pd
+import urllib
+
+# Define the column names for the data sets.
+COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
+ "marital_status", "occupation", "relationship", "race", "gender",
+ "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
+LABEL_COLUMN = 'label'
+CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation",
+ "relationship", "race", "gender", "native_country"]
+CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss",
+ "hours_per_week"]
+
+# Download the training and test data to temporary files.
+# Alternatively, you can download them yourself and change train_file and
+# test_file to your own paths.
+train_file = tempfile.NamedTemporaryFile()
+test_file = tempfile.NamedTemporaryFile()
+urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
+urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
+
+# Read the training and test data sets into Pandas dataframe.
+df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
+df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
+df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
+df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
+
+def input_fn(df):
+ # Creates a dictionary mapping from each continuous feature column name (k) to
+ # the values of that column stored in a constant Tensor.
+ continuous_cols = {k: tf.constant(df[k].values)
+ for k in CONTINUOUS_COLUMNS}
+ # Creates a dictionary mapping from each categorical feature column name (k)
+ # to the values of that column stored in a tf.SparseTensor.
+ categorical_cols = {k: tf.SparseTensor(
+ indices=[[i, 0] for i in range(df[k].size)],
+ values=df[k].values,
+ shape=[df[k].size, 1])
+ for k in CATEGORICAL_COLUMNS}
+ # Merges the two dictionaries into one.
+ feature_cols = dict(continuous_cols.items() + categorical_cols.items())
+ # Converts the label column into a constant Tensor.
+ label = tf.constant(df[LABEL_COLUMN].values)
+ # Returns the feature columns and the label.
+ return feature_cols, label
+
+def train_input_fn():
+ return input_fn(df_train)
+
+def eval_input_fn():
+ return input_fn(df_test)
+```
+
+After reading in the data, you can train and evaluate the model:
+
+```python
+m.fit(input_fn=train_input_fn, steps=200)
+results = m.evaluate(input_fn=eval_input_fn, steps=1)
+for key in sorted(results):
+ print "%s: %s" % (key, results[key])
+```
+
+The first line of the output should be something like `accuracy: 0.84429705`. We
+can see that the accuracy was improved from about 83.6% using a wide-only linear
+model to about 84.4% using a Wide & Deep model. If you'd like to see a working
+end-to-end example, you can download our [example code]
+(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
+
+Note that this tutorial is just a quick example on a small dataset to get you
+familiar with the API. Wide & Deep Learning will be even more powerful if you
+try it on a large dataset with many sparse feature columns that have a large
+number of possible feature values. Again, feel free to take a look at our
+[research paper](http://arxiv.org/abs/1606.07792) for more ideas about how to
+apply Wide & Deep Learning in real-world large-scale maching learning problems.