aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/performance/benchmarks.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/docs_src/performance/benchmarks.md')
-rw-r--r--tensorflow/docs_src/performance/benchmarks.md412
1 files changed, 0 insertions, 412 deletions
diff --git a/tensorflow/docs_src/performance/benchmarks.md b/tensorflow/docs_src/performance/benchmarks.md
deleted file mode 100644
index a5fa551dd4..0000000000
--- a/tensorflow/docs_src/performance/benchmarks.md
+++ /dev/null
@@ -1,412 +0,0 @@
-# Benchmarks
-
-## Overview
-
-A selection of image classification models were tested across multiple platforms
-to create a point of reference for the TensorFlow community. The
-[Methodology](#methodology) section details how the tests were executed and has
-links to the scripts used.
-
-## Results for image classification models
-
-InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), ResNet-50
-([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), ResNet-152
-([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
-([arXiv:1409.1556](https://arxiv.org/abs/1409.1556)), and
-[AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
-were tested using the [ImageNet](http://www.image-net.org/) data set. Tests were
-run on Google Compute Engine, Amazon Elastic Compute Cloud (Amazon EC2), and an
-NVIDIA® DGX-1™. Most of the tests were run with both synthetic and real data.
-Testing with synthetic data was done by using a `tf.Variable` set to the same
-shape as the data expected by each model for ImageNet. We believe it is
-important to include real data measurements when benchmarking a platform. This
-load tests both the underlying hardware and the framework at preparing data for
-actual training. We start with synthetic data to remove disk I/O as a variable
-and to set a baseline. Real data is then used to verify that the TensorFlow
-input pipeline and the underlying disk I/O are saturating the compute units.
-
-### Training with NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
-</div>
-
-Details and additional results are in the [Details for NVIDIA® DGX-1™ (NVIDIA®
-Tesla® P100)](#details_for_nvidia_dgx-1tm_nvidia_tesla_p100) section.
-
-### Training with NVIDIA® Tesla® K80
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:80%" src="../images/perf_summary_k80_single_server.png">
-</div>
-
-Details and additional results are in the [Details for Google Compute Engine
-(NVIDIA® Tesla® K80)](#details_for_google_compute_engine_nvidia_tesla_k80) and
-[Details for Amazon EC2 (NVIDIA® Tesla®
-K80)](#details_for_amazon_ec2_nvidia_tesla_k80) sections.
-
-### Distributed training with NVIDIA® Tesla® K80
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
-</div>
-
-Details and additional results are in the [Details for Amazon EC2 Distributed
-(NVIDIA® Tesla® K80)](#details_for_amazon_ec2_distributed_nvidia_tesla_k80)
-section.
-
-### Compare synthetic with real data training
-
-**NVIDIA® Tesla® P100**
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:35%" src="../images/perf_summary_p100_data_compare_inceptionv3.png">
- <img style="width:35%" src="../images/perf_summary_p100_data_compare_resnet50.png">
-</div>
-
-**NVIDIA® Tesla® K80**
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:35%" src="../images/perf_summary_k80_data_compare_inceptionv3.png">
- <img style="width:35%" src="../images/perf_summary_k80_data_compare_resnet50.png">
-</div>
-
-## Details for NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
-
-### Environment
-
-* **Instance type**: NVIDIA® DGX-1™
-* **GPU:** 8x NVIDIA® Tesla® P100
-* **OS:** Ubuntu 16.04 LTS with tests run via Docker
-* **CUDA / cuDNN:** 8.0 / 5.1
-* **TensorFlow GitHub hash:** b1e174e
-* **Benchmark GitHub hash:** 9165a70
-* **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
- //tensorflow/tools/pip_package:build_pip_package`
-* **Disk:** Local SSD
-* **DataSet:** ImageNet
-* **Test Date:** May 2017
-
-Batch size and optimizer used for each model are listed in the table below. In
-addition to the batch sizes listed in the table, InceptionV3, ResNet-50,
-ResNet-152, and VGG16 were tested with a batch size of 32. Those results are in
-the *other results* section.
-
-Options | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
------------------- | ----------- | --------- | ---------- | ------- | -----
-Batch size per GPU | 64 | 64 | 64 | 512 | 64
-Optimizer | sgd | sgd | sgd | sgd | sgd
-
-Configuration used for each model.
-
-Model | variable_update | local_parameter_device
------------ | ---------------------- | ----------------------
-InceptionV3 | parameter_server | cpu
-ResNet50 | parameter_server | cpu
-ResNet152 | parameter_server | cpu
-AlexNet | replicated (with NCCL) | n/a
-VGG16 | replicated (with NCCL) | n/a
-
-### Results
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
-</div>
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:35%" src="../images/perf_dgx1_synth_p100_single_server_scaling.png">
- <img style="width:35%" src="../images/perf_dgx1_real_p100_single_server_scaling.png">
-</div>
-
-**Training synthetic data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 142 | 219 | 91.8 | 2987 | 154
-2 | 284 | 422 | 181 | 5658 | 295
-4 | 569 | 852 | 356 | 10509 | 584
-8 | 1131 | 1734 | 716 | 17822 | 1081
-
-**Training real data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 142 | 218 | 91.4 | 2890 | 154
-2 | 278 | 425 | 179 | 4448 | 284
-4 | 551 | 853 | 359 | 7105 | 534
-8 | 1079 | 1630 | 708 | N/A | 898
-
-Training AlexNet with real data on 8 GPUs was excluded from the graph and table
-above due to it maxing out the input pipeline.
-
-### Other Results
-
-The results below are all with a batch size of 32.
-
-**Training synthetic data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
----- | ----------- | --------- | ---------- | -----
-1 | 128 | 195 | 82.7 | 144
-2 | 259 | 368 | 160 | 281
-4 | 520 | 768 | 317 | 549
-8 | 995 | 1485 | 632 | 820
-
-**Training real data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
----- | ----------- | --------- | ---------- | -----
-1 | 130 | 193 | 82.4 | 144
-2 | 257 | 369 | 159 | 253
-4 | 507 | 760 | 317 | 457
-8 | 966 | 1410 | 609 | 690
-
-## Details for Google Compute Engine (NVIDIA® Tesla® K80)
-
-### Environment
-
-* **Instance type**: n1-standard-32-k80x8
-* **GPU:** 8x NVIDIA® Tesla® K80
-* **OS:** Ubuntu 16.04 LTS
-* **CUDA / cuDNN:** 8.0 / 5.1
-* **TensorFlow GitHub hash:** b1e174e
-* **Benchmark GitHub hash:** 9165a70
-* **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
- //tensorflow/tools/pip_package:build_pip_package`
-* **Disk:** 1.7 TB Shared SSD persistent disk (800 MB/s)
-* **DataSet:** ImageNet
-* **Test Date:** May 2017
-
-Batch size and optimizer used for each model are listed in the table below. In
-addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
-tested with a batch size of 32. Those results are in the *other results*
-section.
-
-Options | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
------------------- | ----------- | --------- | ---------- | ------- | -----
-Batch size per GPU | 64 | 64 | 32 | 512 | 32
-Optimizer | sgd | sgd | sgd | sgd | sgd
-
-The configuration used for each model was `variable_update` equal to
-`parameter_server` and `local_parameter_device` equal to `cpu`.
-
-### Results
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:35%" src="../images/perf_gce_synth_k80_single_server_scaling.png">
- <img style="width:35%" src="../images/perf_gce_real_k80_single_server_scaling.png">
-</div>
-
-**Training synthetic data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 30.5 | 51.9 | 20.0 | 656 | 35.4
-2 | 57.8 | 99.0 | 38.2 | 1209 | 64.8
-4 | 116 | 195 | 75.8 | 2328 | 120
-8 | 227 | 387 | 148 | 4640 | 234
-
-**Training real data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 30.6 | 51.2 | 20.0 | 639 | 34.2
-2 | 58.4 | 98.8 | 38.3 | 1136 | 62.9
-4 | 115 | 194 | 75.4 | 2067 | 118
-8 | 225 | 381 | 148 | 4056 | 230
-
-### Other Results
-
-**Training synthetic data**
-
-GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
----- | --------------------------- | -------------------------
-1 | 29.3 | 49.5
-2 | 55.0 | 95.4
-4 | 109 | 183
-8 | 216 | 362
-
-**Training real data**
-
-GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
----- | --------------------------- | -------------------------
-1 | 29.5 | 49.3
-2 | 55.4 | 95.3
-4 | 110 | 186
-8 | 216 | 359
-
-## Details for Amazon EC2 (NVIDIA® Tesla® K80)
-
-### Environment
-
-* **Instance type**: p2.8xlarge
-* **GPU:** 8x NVIDIA® Tesla® K80
-* **OS:** Ubuntu 16.04 LTS
-* **CUDA / cuDNN:** 8.0 / 5.1
-* **TensorFlow GitHub hash:** b1e174e
-* **Benchmark GitHub hash:** 9165a70
-* **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
- //tensorflow/tools/pip_package:build_pip_package`
-* **Disk:** 1TB Amazon EFS (burst 100 MiB/sec for 12 hours, continuous 50
- MiB/sec)
-* **DataSet:** ImageNet
-* **Test Date:** May 2017
-
-Batch size and optimizer used for each model are listed in the table below. In
-addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
-tested with a batch size of 32. Those results are in the *other results*
-section.
-
-Options | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
------------------- | ----------- | --------- | ---------- | ------- | -----
-Batch size per GPU | 64 | 64 | 32 | 512 | 32
-Optimizer | sgd | sgd | sgd | sgd | sgd
-
-Configuration used for each model.
-
-Model | variable_update | local_parameter_device
------------ | ------------------------- | ----------------------
-InceptionV3 | parameter_server | cpu
-ResNet-50 | replicated (without NCCL) | gpu
-ResNet-152 | replicated (without NCCL) | gpu
-AlexNet | parameter_server | gpu
-VGG16 | parameter_server | gpu
-
-### Results
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:35%" src="../images/perf_aws_synth_k80_single_server_scaling.png">
- <img style="width:35%" src="../images/perf_aws_real_k80_single_server_scaling.png">
-</div>
-
-**Training synthetic data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 30.8 | 51.5 | 19.7 | 684 | 36.3
-2 | 58.7 | 98.0 | 37.6 | 1244 | 69.4
-4 | 117 | 195 | 74.9 | 2479 | 141
-8 | 230 | 384 | 149 | 4853 | 260
-
-**Training real data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152 | AlexNet | VGG16
----- | ----------- | --------- | ---------- | ------- | -----
-1 | 30.5 | 51.3 | 19.7 | 674 | 36.3
-2 | 59.0 | 94.9 | 38.2 | 1227 | 67.5
-4 | 118 | 188 | 75.2 | 2201 | 136
-8 | 228 | 373 | 149 | N/A | 242
-
-Training AlexNet with real data on 8 GPUs was excluded from the graph and table
-above due to our EFS setup not providing enough throughput.
-
-### Other Results
-
-**Training synthetic data**
-
-GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
----- | --------------------------- | -------------------------
-1 | 29.9 | 49.0
-2 | 57.5 | 94.1
-4 | 114 | 184
-8 | 216 | 355
-
-**Training real data**
-
-GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
----- | --------------------------- | -------------------------
-1 | 30.0 | 49.1
-2 | 57.5 | 95.1
-4 | 113 | 185
-8 | 212 | 353
-
-## Details for Amazon EC2 Distributed (NVIDIA® Tesla® K80)
-
-### Environment
-
-* **Instance type**: p2.8xlarge
-* **GPU:** 8x NVIDIA® Tesla® K80
-* **OS:** Ubuntu 16.04 LTS
-* **CUDA / cuDNN:** 8.0 / 5.1
-* **TensorFlow GitHub hash:** b1e174e
-* **Benchmark GitHub hash:** 9165a70
-* **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
- //tensorflow/tools/pip_package:build_pip_package`
-* **Disk:** 1.0 TB EFS (burst 100 MB/sec for 12 hours, continuous 50 MB/sec)
-* **DataSet:** ImageNet
-* **Test Date:** May 2017
-
-The batch size and optimizer used for the tests are listed in the table. In
-addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
-tested with a batch size of 32. Those results are in the *other results*
-section.
-
-Options | InceptionV3 | ResNet-50 | ResNet-152
------------------- | ----------- | --------- | ----------
-Batch size per GPU | 64 | 64 | 32
-Optimizer | sgd | sgd | sgd
-
-Configuration used for each model.
-
-Model | variable_update | local_parameter_device | cross_replica_sync
------------ | ---------------------- | ---------------------- | ------------------
-InceptionV3 | distributed_replicated | n/a | True
-ResNet-50 | distributed_replicated | n/a | True
-ResNet-152 | distributed_replicated | n/a | True
-
-To simplify server setup, EC2 instances (p2.8xlarge) running worker servers also
-ran parameter servers. Equal numbers of parameter servers and worker servers were
-used with the following exceptions:
-
-* InceptionV3: 8 instances / 6 parameter servers
-* ResNet-50: (batch size 32) 8 instances / 4 parameter servers
-* ResNet-152: 8 instances / 4 parameter servers
-
-### Results
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
-</div>
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:70%" src="../images/perf_aws_synth_k80_distributed_scaling.png">
-</div>
-
-**Training synthetic data**
-
-GPUs | InceptionV3 | ResNet-50 | ResNet-152
----- | ----------- | --------- | ----------
-1 | 29.7 | 52.4 | 19.4
-8 | 229 | 378 | 146
-16 | 459 | 751 | 291
-32 | 902 | 1388 | 565
-64 | 1783 | 2744 | 981
-
-### Other Results
-
-<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
- <img style="width:50%" src="../images/perf_aws_synth_k80_multi_server_batch32.png">
-</div>
-
-**Training synthetic data**
-
-GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
----- | --------------------------- | -------------------------
-1 | 29.2 | 48.4
-8 | 219 | 333
-16 | 427 | 667
-32 | 820 | 1180
-64 | 1608 | 2315
-
-## Methodology
-
-This
-[script](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
-was run on the various platforms to generate the above results.
-
-In order to create results that are as repeatable as possible, each test was run
-5 times and then the times were averaged together. GPUs are run in their default
-state on the given platform. For NVIDIA® Tesla® K80 this means leaving on [GPU
-Boost](https://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/).
-For each test, 10 warmup steps are done and then the next 100 steps are
-averaged.