tensorflow/docs_src/guide/keras.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623

# Keras

Keras is a high-level API to build and train deep learning models. It's used for
fast prototyping, advanced research, and production, with three key advantages:

- *User friendly*<br>
  Keras has a simple, consistent interface optimized for common use cases. It
  provides clear and actionable feedback for user errors.
- *Modular and composable*<br>
  Keras models are made by connecting configurable building blocks together,
  with few restrictions.
- *Easy to extend*<br> Write custom building blocks to express new ideas for
  research. Create new layers, loss functions, and develop state-of-the-art
  models.

## Import tf.keras

`tf.keras` is TensorFlow's implementation of the
[Keras API specification](https://keras.io){:.external}. This is a high-level
API to build and train models that includes first-class support for
TensorFlow-specific functionality, such as [eager execution](#eager_execution),
`tf.data` pipelines, and [Estimators](./estimators.md).
`tf.keras` makes TensorFlow easier to use without sacrificing flexibility and
performance.

To get started, import `tf.keras` as part of your TensorFlow program setup:

```python
import tensorflow as tf
from tensorflow import keras
```

`tf.keras` can run any Keras-compatible code, but keep in mind:

* The `tf.keras` version in the latest TensorFlow release might not be the same
  as the latest `keras` version from PyPI. Check `tf.keras.__version__`.
* When [saving a model's weights](#weights_only), `tf.keras` defaults to the
  [checkpoint format](../get_started/checkpoints.md). Pass `save_format='h5'` to
  use HDF5.

## Build a simple model

### Sequential model

In Keras, you assemble *layers* to build *models*. A model is (usually) a graph
of layers. The most common type of model is a stack of layers: the
`tf.keras.Sequential` model.

To build a simple, fully-connected network (i.e. multi-layer perceptron):

```python
model = keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(keras.layers.Dense(64, activation='relu'))
# Add another:
model.add(keras.layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(keras.layers.Dense(10, activation='softmax'))
```

### Configure the layers

There are many `tf.keras.layers` available with some common constructor
parameters:

* `activation`: Set the activation function for the layer. This parameter is
  specified by the name of a built-in function or as a callable object. By
  default, no activation is applied.
* `kernel_initializer` and `bias_initializer`: The initialization schemes
  that create the layer's weights (kernel and bias). This parameter is a name or
  a callable object. This defaults to the `"Glorot uniform"` initializer.
* `kernel_regularizer` and `bias_regularizer`: The regularization schemes
  that apply the layer's weights (kernel and bias), such as L1 or L2
  regularization. By default, no regularization is applied.

The following instantiates `tf.keras.layers.Dense` layers using constructor
arguments:

```python
# Create a sigmoid layer:
layers.Dense(64, activation='sigmoid')
# Or:
layers.Dense(64, activation=tf.sigmoid)

# A linear layer with L1 regularization of factor 0.01 applied to the kernel matrix:
layers.Dense(64, kernel_regularizer=keras.regularizers.l1(0.01))
# A linear layer with L2 regularization of factor 0.01 applied to the bias vector:
layers.Dense(64, bias_regularizer=keras.regularizers.l2(0.01))

# A linear layer with a kernel initialized to a random orthogonal matrix:
layers.Dense(64, kernel_initializer='orthogonal')
# A linear layer with a bias vector initialized to 2.0s:
layers.Dense(64, bias_initializer=keras.initializers.constant(2.0))
```

## Train and evaluate

### Set up training

After the model is constructed, configure its learning process by calling the
`compile` method:

```python
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```

`tf.keras.Model.compile` takes three important arguments:

* `optimizer`: This object specifies the training procedure. Pass it optimizer
  instances from the `tf.train` module, such as
  [`AdamOptimizer`](/api_docs/python/tf/train/AdamOptimizer),
  [`RMSPropOptimizer`](/api_docs/python/tf/train/RMSPropOptimizer), or
  [`GradientDescentOptimizer`](/api_docs/python/tf/train/GradientDescentOptimizer).
* `loss`: The function to minimize during optimization. Common choices include
  mean square error (`mse`), `categorical_crossentropy`, and
  `binary_crossentropy`. Loss functions are specified by name or by
  passing a callable object from the `tf.keras.losses` module.
* `metrics`: Used to monitor training. These are string names or callables from
  the `tf.keras.metrics` module.

The following shows a few examples of configuring a model for training:

```python
# Configure a model for mean-squared error regression.
model.compile(optimizer=tf.train.AdamOptimizer(0.01),
              loss='mse',       # mean squared error
              metrics=['mae'])  # mean absolute error

# Configure a model for categorical classification.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.01),
              loss=keras.losses.categorical_crossentropy,
              metrics=[keras.metrics.categorical_accuracy])
```

### Input NumPy data

For small datasets, use in-memory [NumPy](https://www.numpy.org/){:.external}
arrays to train and evaluate a model. The model is "fit" to the training data
using the `fit` method:

```python
import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

model.fit(data, labels, epochs=10, batch_size=32)
```

`tf.keras.Model.fit` takes three important arguments:

* `epochs`: Training is structured into *epochs*. An epoch is one iteration over
  the entire input data (this is done in smaller batches).
* `batch_size`: When passed NumPy data, the model slices the data into smaller
  batches and iterates over these batches during training. This integer
  specifies the size of each batch. Be aware that the last batch may be smaller
  if the total number of samples is not divisible by the batch size.
* `validation_data`: When prototyping a model, you want to easily monitor its
  performance on some validation data. Passing this argument—a tuple of inputs
  and labels—allows the model to display the loss and metrics in inference mode
  for the passed data, at the end of each epoch.

Here's an example using `validation_data`:

```python
import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))

model.fit(data, labels, epochs=10, batch_size=32,
          validation_data=(val_data, val_labels))
```

### Input tf.data datasets

Use the [Datasets API](./datasets.md) to scale to large datasets
or multi-device training. Pass a `tf.data.Dataset` instance to the `fit`
method:

```python
# Instantiates a toy dataset instance:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()

# Don't forget to specify `steps_per_epoch` when calling `fit` on a dataset.
model.fit(dataset, epochs=10, steps_per_epoch=30)
```

Here, the `fit` method uses the `steps_per_epoch` argument—this is the number of
training steps the model runs before it moves to the next epoch. Since the
`Dataset` yields batches of data, this snippet does not require a `batch_size`.

Datasets can also be used for validation:

```python
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))
val_dataset = val_dataset.batch(32).repeat()

model.fit(dataset, epochs=10, steps_per_epoch=30,
          validation_data=val_dataset,
          validation_steps=3)
```

### Evaluate and predict

The `tf.keras.Model.evaluate` and `tf.keras.Model.predict` methods can use NumPy
data and a `tf.data.Dataset`.

To *evaluate* the inference-mode loss and metrics for the data provided:

```python
model.evaluate(x, y, batch_size=32)

model.evaluate(dataset, steps=30
```

And to *predict* the output of the last layer in inference for the data provided,
as a NumPy array:

```
model.predict(x, batch_size=32)

model.predict(dataset, steps=30)
```


## Build advanced models

### Functional API

The `tf.keras.Sequential` model is a simple stack of layers that cannot
represent arbitrary models. Use the
[Keras functional API](https://keras.io/getting-started/functional-api-guide/){:.external}
to build complex model topologies such as:

* Multi-input models,
* Multi-output models,
* Models with shared layers (the same layer called several times),
* Models with non-sequential data flows (e.g. residual connections).

Building a model with the functional API works like this:

1. A layer instance is callable and returns a tensor.
2. Input tensors and output tensors are used to define a `tf.keras.Model`
   instance.
3. This model is trained just like the `Sequential` model.

The following example uses the functional API to build a simple, fully-connected
network:

```python
inputs = keras.Input(shape=(32,))  # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(64, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)
```

### Model subclassing

Build a fully-customizable model by subclassing `tf.keras.Model` and defining
your own forward pass. Create layers in the `__init__` method and set them as
attributes of the class instance. Define the forward pass in the `call` method.

Model subclassing is particularly useful when
[eager execution](./eager.md) is enabled since the forward pass
can be written imperatively.

Key Point: Use the right API for the job. While model subclassing offers
flexibility, it comes at a cost of greater complexity and more opportunities for
user errors. If possible, prefer the functional API.

The following example shows a subclassed `tf.keras.Model` using a custom forward
pass:

```python
class MyModel(keras.Model):

  def __init__(self, num_classes=10):
    super(MyModel, self).__init__(name='my_model')
    self.num_classes = num_classes
    # Define your layers here.
    self.dense_1 = keras.layers.Dense(32, activation='relu')
    self.dense_2 = keras.layers.Dense(num_classes, activation='sigmoid')

  def call(self, inputs):
    # Define your forward pass here,
    # using layers you previously defined (in `__init__`).
    x = self.dense_1(inputs)
    return self.dense_2(x)

  def compute_output_shape(self, input_shape):
    # You need to override this function if you want to use the subclassed model
    # as part of a functional-style model.
    # Otherwise, this method is optional.
    shape = tf.TensorShape(input_shape).as_list()
    shape[-1] = self.num_classes
    return tf.TensorShape(shape)


# Instantiates the subclassed model.
model = MyModel(num_classes=10)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs.
model.fit(data, labels, batch_size=32, epochs=5)
```


### Custom layers

Create a custom layer by subclassing `tf.keras.layers.Layer` and implementing
the following methods:

* `build`: Create the weights of the layer. Add weights with the `add_weight`
  method.
* `call`: Define the forward pass.
* `compute_output_shape`: Specify how to compute the output shape of the layer
  given the input shape.
* Optionally, a layer can be serialized by implementing the `get_config` method
  and the `from_config` class method.

Here's an example of a custom layer that implements a `matmul` of an input with
a kernel matrix:

```python
class MyLayer(keras.layers.Layer):

  def __init__(self, output_dim, **kwargs):
    self.output_dim = output_dim
    super(MyLayer, self).__init__(**kwargs)

  def build(self, input_shape):
    shape = tf.TensorShape((input_shape[1], self.output_dim))
    # Create a trainable weight variable for this layer.
    self.kernel = self.add_weight(name='kernel',
                                  shape=shape,
                                  initializer='uniform',
                                  trainable=True)
    # Be sure to call this at the end
    super(MyLayer, self).build(input_shape)

  def call(self, inputs):
    return tf.matmul(inputs, self.kernel)

  def compute_output_shape(self, input_shape):
    shape = tf.TensorShape(input_shape).as_list()
    shape[-1] = self.output_dim
    return tf.TensorShape(shape)

  def get_config(self):
    base_config = super(MyLayer, self).get_config()
    base_config['output_dim'] = self.output_dim

  @classmethod
  def from_config(cls, config):
    return cls(**config)


# Create a model using the custom layer
model = keras.Sequential([MyLayer(10),
                          keras.layers.Activation('softmax')])

# The compile step specifies the training configuration
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs.
model.fit(data, targets, batch_size=32, epochs=5)
```


## Callbacks

A callback is an object passed to a model to customize and extend its behavior
during training. You can write your own custom callback, or use the built-in
`tf.keras.callbacks` that include:

* `tf.keras.callbacks.ModelCheckpoint`: Save checkpoints of your model at
  regular intervals.
* `tf.keras.callbacks.LearningRateScheduler`: Dynamically change the learning
  rate.
* `tf.keras.callbacks.EarlyStopping`: Interrupt training when validation
  performance has stopped improving.
* `tf.keras.callbacks.TensorBoard`: Monitor the model's behavior using
  [TensorBoard](./summaries_and_tensorboard.md).

To use a `tf.keras.callbacks.Callback`, pass it to the model's `fit` method:

```python
callbacks = [
  # Interrupt training if `val_loss` stops improving for over 2 epochs
  keras.callbacks.EarlyStopping(patience=2, monitor='val_loss'),
  # Write TensorBoard logs to `./logs` directory
  keras.callbacks.TensorBoard(log_dir='./logs')
]
model.fit(data, labels, batch_size=32, epochs=5, callbacks=callbacks,
          validation_data=(val_data, val_targets))
```


## Save and restore

### Weights only

Save and load the weights of a model using `tf.keras.Model.save_weights`:

```python
# Save weights to a TensorFlow Checkpoint file
model.save_weights('./my_model')

# Restore the model's state,
# this requires a model with the same architecture.
model.load_weights('my_model')
```

By default, this saves the model's weights in the
[TensorFlow checkpoint](../get_started/checkpoints.md) file format. Weights can
also be saved to the Keras HDF5 format (the default for the multi-backend
implementation of Keras):

```python
# Save weights to a HDF5 file
model.save_weights('my_model.h5', save_format='h5')

# Restore the model's state
model.load_weights('my_model.h5')
```


### Configuration only

A model's configuration can be saved—this serializes the model architecture
without any weights. A saved configuration can recreate and initialize the same
model, even without the code that defined the original model. Keras supports
JSON and YAML serialization formats:

```python
# Serialize a model to JSON format
json_string = model.to_json()

# Recreate the model (freshly initialized)
fresh_model = keras.models.from_json(json_string)

# Serializes a model to YAML format
yaml_string = model.to_yaml()

# Recreate the model
fresh_model = keras.models.from_yaml(yaml_string)
```

Caution: Subclassed models are not serializable because their architecture is
defined by the Python code in the body of the `call` method.


### Entire model

The entire model can be saved to a file that contains the weight values, the
model's configuration, and even the optimizer's configuration. This allows you
to checkpoint a model and resume training later—from the exact same
state—without access to the original code.

```python
# Create a trivial model
model = keras.Sequential([
  keras.layers.Dense(10, activation='softmax', input_shape=(32,)),
  keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, targets, batch_size=32, epochs=5)


# Save entire model to a HDF5 file
model.save('my_model.h5')

# Recreate the exact same model, including weights and optimizer.
model = keras.models.load_model('my_model.h5')
```


## Eager execution

[Eager execution](./eager.md) is an imperative programming
environment that evaluates operations immediately. This is not required for
Keras, but is supported by `tf.keras` and useful for inspecting your program and
debugging.

All of the `tf.keras` model-building APIs are compatible with eager execution.
And while the `Sequential` and functional APIs can be used, eager execution
especially benefits *model subclassing* and building *custom layers*—the APIs
that require you to write the forward pass as code (instead of the APIs that
create models by assembling existing layers).

See the [eager execution guide](./eager.md#build_a_model) for
examples of using Keras models with custom training loops and `tf.GradientTape`.


## Distribution

### Estimators

The [Estimators](./estimators.md) API is used for training models
for distributed environments. This targets industry use cases such as
distributed training on large datasets that can export a model for production.

A `tf.keras.Model` can be trained with the `tf.estimator` API by converting the
model to an `tf.estimator.Estimator` object with
`tf.keras.estimator.model_to_estimator`. See
[Creating Estimators from Keras models](./estimators.md#creating_estimators_from_keras_models).

```python
model = keras.Sequential([layers.Dense(10,activation='softmax'),
                          layers.Dense(10,activation='softmax')])

model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

estimator = keras.estimator.model_to_estimator(model)
```

Note: Enable [eager execution](./eager.md) for debugging
[Estimator input functions](./premade_estimators.md#create_input_functions)
and inspecting data.

### Multiple GPUs

`tf.keras` models can run on multiple GPUs using
`tf.contrib.distribute.DistributionStrategy`. This API provides distributed
training on multiple GPUs with almost no changes to existing code.

Currently, `tf.contrib.distribute.MirroredStrategy` is the only supported
distribution strategy. `MirroredStrategy` does in-graph replication with
synchronous training using all-reduce on a single machine. To use
`DistributionStrategy` with Keras, convert the `tf.keras.Model` to a
`tf.estimator.Estimator` with `tf.keras.estimator.model_to_estimator`, then
train the estimator

The following example distributes a `tf.keras.Model` across multiple GPUs on a
single machine.

First, define a simple model:

```python
model = keras.Sequential()
model.add(keras.layers.Dense(16, activation='relu', input_shape=(10,)))
model.add(keras.layers.Dense(1, activation='sigmoid'))

optimizer = tf.train.GradientDescentOptimizer(0.2)

model.compile(loss='binary_crossentropy', optimizer=optimizer)
model.summary()
```

Convert the Keras model to a `tf.estimator.Estimator` instance:

```python
keras_estimator = keras.estimator.model_to_estimator(
  keras_model=model,
  config=config,
  model_dir='/tmp/model_dir')
```

Define an *input pipeline*. The `input_fn` returns a `tf.data.Dataset` object
used to distribute the data across multiple devices—with each device processing
a slice of the input batch.

```python
def input_fn():
  x = np.random.random((1024, 10))
  y = np.random.randint(2, size=(1024, 1))
  x = tf.cast(x, tf.float32)
  dataset = tf.data.Dataset.from_tensor_slices((x, y))
  dataset = dataset.repeat(10)
  dataset = dataset.batch(32)
  return dataset
```

Next, create a `tf.estimator.RunConfig` and set the `train_distribute` argument
to the `tf.contrib.distribute.MirroredStrategy` instance. When creating
`MirroredStrategy`, you can specify a list of devices or set the `num_gpus`
argument. The default uses all available GPUs, like the following:

```python
strategy = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=strategy)
```

Finally, train the `Estimator` instance by providing the `input_fn` and `steps`
arguments:

```python
keras_estimator.train(input_fn=input_fn, steps=10)
```