aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/performance/benchmarks.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/docs_src/performance/benchmarks.md')
-rw-r--r--tensorflow/docs_src/performance/benchmarks.md128
1 files changed, 77 insertions, 51 deletions
diff --git a/tensorflow/docs_src/performance/benchmarks.md b/tensorflow/docs_src/performance/benchmarks.md
index 8c0cff138d..bfb47d9f90 100644
--- a/tensorflow/docs_src/performance/benchmarks.md
+++ b/tensorflow/docs_src/performance/benchmarks.md
@@ -1,17 +1,17 @@
-# TensorFlow Performance Benchmarks
+# Benchmarks
## Overview
A selection of image classification models were tested across multiple platforms
to create a point of reference for the TensorFlow community. The methodology,
-links to the scripts, and commands to reproduce the results are in the
-[appendix](#appendix).
+links to the benchmark scripts, and commands to reproduce the results are in the
+[Appendix](#appendix).
## Results for image classification models
-InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)),
-ResNet-50 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)),
-ResNet-152 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
+InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), ResNet-50
+([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), ResNet-152
+([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
([arXiv:1409.1556](https://arxiv.org/abs/1409.1556)), and
[AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
were tested using the [ImageNet](http://www.image-net.org/) data set. Tests were
@@ -27,32 +27,32 @@ input pipeline and the underlying disk I/O are saturating the compute units.
### Training with NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
-<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:100%" src="../images/perf_summary_p100_single_server.png">
+<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
+ <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
</div>
Details and additional results are in the [Details for NVIDIA® DGX-1™ (NVIDIA®
-Tesla® P100)](#details-for-nvidia®-dgx-1™-nvidia®-tesla®-p100) section.
+Tesla® P100)](#details_for_nvidia_dgx-1tm_nvidia_tesla_p100) section.
### Training with NVIDIA® Tesla® K80
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:100%" src="../images/perf_summary_k80_single_server.png">
+ <img style="width:80%" src="../images/perf_summary_k80_single_server.png">
</div>
Details and additional results are in the [Details for Google Compute Engine
-(NVIDIA® Tesla® K80)](#details-for-google-compute-engine-nvidia®-tesla®-k80) and
+(NVIDIA® Tesla® K80)](#details_for_google_compute_engine_nvidia_tesla_k80) and
[Details for Amazon EC2 (NVIDIA® Tesla®
-K80)](#details-for-amazon-ec2-nvidia®-tesla®-k80) sections.
+K80)](#details_for_amazon_ec2_nvidia_tesla_k80) sections.
### Distributed training with NVIDIA® Tesla® K80
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:100%" src="../images/perf_summary_k80_aws_distributed.png">
+ <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
</div>
Details and additional results are in the [Details for Amazon EC2 Distributed
-(NVIDIA® Tesla® K80)](#details-for-amazon-ec2-distributed-nvidia®-tesla®-k80)
+(NVIDIA® Tesla® K80)](#details_for_amazon_ec2_distributed_nvidia_tesla_k80)
section.
### Compare synthetic with real data training
@@ -82,12 +82,15 @@ section.
* **TensorFlow GitHub hash:** b1e174e
* **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
//tensorflow/tools/pip_package:build_pip_package`
-* **Disk:** local SSD
+* **Disk:** Local SSD
* **DataSet:** ImageNet
-Batch size and optimizer used for each model.
+Batch size and optimizer used for each model are listed in the table below. In
+addition to the batch sizes listed in the table, InceptionV3, ResNet-50,
+ResNet-152, and VGG16 were tested with a batch size of 32. Those results are in
+the *other results* section.
- | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
------------------ | ----------- | --------- | ---------- | ------- | -----
Batch size per GPU | 64 | 64 | 64 | 512 | 64
Optimizer | sgd | sgd | sgd | sgd | sgd
@@ -104,10 +107,8 @@ VGG16 | replicated (with NCCL) | n/a
### Results
-Batch size and optimizer used for each model are listed in the table below.
-
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:100%" src="../images/perf_summary_p100_single_server.png">
+ <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
</div>
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
@@ -136,6 +137,28 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
Training AlexNet with real data on 8 GPUs was excluded from the graph and table
above due to it maxing out the input pipeline.
+### Other Results
+
+The results below are all with a batch size of 32.
+
+**Training synthetic data**
+
+GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
+---- | ----------- | --------- | ---------- | -----
+1 | 128 | 210 | 85.3 | 124
+2 | 259 | 412 | 166 | 241
+4 | 520 | 827 | 330 | 470
+8 | 995 | 1623 | 643 | 738
+
+**Training real data**
+
+GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
+---- | ----------- | --------- | ---------- | -----
+1 | 130 | 208 | 85.0 | 124
+2 | 257 | 403 | 163 | 221
+4 | 507 | 814 | 325 | 401
+8 | 966 | 1525 | 641 | 619
+
## Details for Google Compute Engine (NVIDIA® Tesla® K80)
### Environment
@@ -156,7 +179,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
tested with a batch size of 32. Those results are in the *other results*
section.
- | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
------------------ | ----------- | --------- | ---------- | ------- | -----
Batch size per GPU | 64 | 64 | 32 | 512 | 32
Optimizer | sgd | sgd | sgd | sgd | sgd
@@ -184,10 +207,10 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
---- | ----------- | --------- | ---------- | ------- | -----
-1 | 30.5 | 56.7 | 20.7 | 639 | 30.2
-2 | 57.8 | 107 | 39 | 1136 | 55.5
-4 | 115 | 211 | 77.3 | 2067 | 106
-8 | 225 | 418 | 150 | 4056 | 213
+ 1 | 30.6 | 56.7 | 20.7 | 639 | 30.2
+ 2 | 58.4 | 107 | 39.0 | 1136 | 55.5
+ 4 | 115 | 211 | 77.3 | 2067 | 106
+ 8 | 225 | 422 | 151 | 4056 | 213
### Other Results
@@ -204,10 +227,10 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
---- | --------------------------- | -------------------------
-1 | 29.3 | 53.6
-2 | 55 | 102
-4 | 109 | 200
-8 | 215 | 387
+ 1 | 29.5 | 53.6
+ 2 | 55.4 | 102
+ 4 | 110 | 201
+ 8 | 216 | 387
## Details for Amazon EC2 (NVIDIA® Tesla® K80)
@@ -230,7 +253,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
tested with a batch size of 32. Those results are in the *other results*
section.
- | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
------------------ | ----------- | --------- | ---------- | ------- | -----
Batch size per GPU | 64 | 64 | 32 | 512 | 32
Optimizer | sgd | sgd | sgd | sgd | sgd
@@ -289,7 +312,7 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
---- | --------------------------- | -------------------------
1 | 30.0 | 53.6
-2 | 57.5 | 101
+2 | 57.5 | 102
4 | 113 | 202
8 | 212 | 379
@@ -313,7 +336,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
tested with a batch size of 32. Those results are in the *other results*
section.
- | InceptionV3 | ResNet-50 | ResNet-152
+Options | InceptionV3 | ResNet-50 | ResNet-152
------------------ | ----------- | --------- | ----------
Batch size per GPU | 64 | 64 | 32
Optimizer | sgd | sgd | sgd
@@ -337,7 +360,7 @@ used with the following exceptions:
### Results
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:95%" src="../images/perf_summary_k80_aws_distributed.png">
+ <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
</div>
<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
@@ -374,34 +397,37 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
### Executing benchmark tests
-The code for the benchmarks was created to be both used for benchmarking
-TensorFlow as well as used as a tool to test hardware platforms. The benchmark
-code includes modes such as `trivial` that run a virtually empty model that is
-useful for testing the maximum possibly samples/sec for the input pipeline among
-other things. Not only does this test TensorFlow but also the throughput of the
-underlying systems. There are two ways to execute the benchmarks in
-[tf_cnn_benchmarks.py](TODO: LINK TO GITHUB):
+The [benchmark code](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
+was created to be used for benchmarking TensorFlow as well as used as a tool to
+test hardware platforms. Techniques used in the benchmark scripts are detailed
+in @{$performance_models$High-Performance Models}.
+
+There are two ways to execute the benchmark code:
-1. Execute [tf_cnn_benchmarks.py](TODO: LINK TO GITHUB) directly
-2. Utilize the [small wrapper](TODO: LINK TO GITHUB) that helps pick the
- correct config
+1. Execute [tf_cnn_benchmarks.py](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py)
+ directly.
+2. Utilize the [scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/main.py)
+ that helps pick the correct config for each platform executes
+ `tf_cnn_benchmarks.py`.
The wrapper is suggested as a starting point. Then investigate the variety of
-options available in `tf_cnn_benchmarks.py`. While the wrapper extensive
-examples, below are a couple highlights.
+options available in `tf_cnn_benchmarks.py`. Below are a couple examples of
+using the wrapper.
-Run ResNet-50 on a single instance with 8 GPUs. The `system` argument is used to
-determine the optimal configuration. The supported values are gce, aws, and
-dgx1. If `system` is not passeed, the best config for the most widely available
-hardware is used.
+**Single Server**
+This example illustrates training ResNet-50 on a single instance with 8 GPUs.
+The `system` flag is used to determine the optimal configuration. The
+supported values are gce, aws, and dgx1. If `system` is not passed, the best
+config for the most widely available hardware is used.
```bash
python main.py --model=resnet50 --num_gpus=8
python main.py --system=aws --model=resnet50 --num_gpus=8
```
-Run ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1) and host_1 (10.0.0.2), with 8
-GPUs each on aws.
+**Distributed**
+This example illustrates training ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1)
+and host_1 (10.0.0.2), with 8 GPUs each on AWS (Amazon EC2).
```bash
# Run the following commands on host_0 (10.0.0.1):