diff options
Diffstat (limited to 'tensorflow/docs_src/performance/benchmarks.md')
-rw-r--r-- | tensorflow/docs_src/performance/benchmarks.md | 128 |
1 files changed, 77 insertions, 51 deletions
diff --git a/tensorflow/docs_src/performance/benchmarks.md b/tensorflow/docs_src/performance/benchmarks.md index 8c0cff138d..bfb47d9f90 100644 --- a/tensorflow/docs_src/performance/benchmarks.md +++ b/tensorflow/docs_src/performance/benchmarks.md @@ -1,17 +1,17 @@ -# TensorFlow Performance Benchmarks +# Benchmarks ## Overview A selection of image classification models were tested across multiple platforms to create a point of reference for the TensorFlow community. The methodology, -links to the scripts, and commands to reproduce the results are in the -[appendix](#appendix). +links to the benchmark scripts, and commands to reproduce the results are in the +[Appendix](#appendix). ## Results for image classification models -InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), -ResNet-50 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), -ResNet-152 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16 +InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), ResNet-50 +([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), ResNet-152 +([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16 ([arXiv:1409.1556](https://arxiv.org/abs/1409.1556)), and [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) were tested using the [ImageNet](http://www.image-net.org/) data set. Tests were @@ -27,32 +27,32 @@ input pipeline and the underlying disk I/O are saturating the compute units. ### Training with NVIDIA® DGX-1™ (NVIDIA® Tesla® P100) -<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;"> - <img style="width:100%" src="../images/perf_summary_p100_single_server.png"> +<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> + <img style="width:80%" src="../images/perf_summary_p100_single_server.png"> </div> Details and additional results are in the [Details for NVIDIA® DGX-1™ (NVIDIA® -Tesla® P100)](#details-for-nvidia®-dgx-1™-nvidia®-tesla®-p100) section. +Tesla® P100)](#details_for_nvidia_dgx-1tm_nvidia_tesla_p100) section. ### Training with NVIDIA® Tesla® K80 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> - <img style="width:100%" src="../images/perf_summary_k80_single_server.png"> + <img style="width:80%" src="../images/perf_summary_k80_single_server.png"> </div> Details and additional results are in the [Details for Google Compute Engine -(NVIDIA® Tesla® K80)](#details-for-google-compute-engine-nvidia®-tesla®-k80) and +(NVIDIA® Tesla® K80)](#details_for_google_compute_engine_nvidia_tesla_k80) and [Details for Amazon EC2 (NVIDIA® Tesla® -K80)](#details-for-amazon-ec2-nvidia®-tesla®-k80) sections. +K80)](#details_for_amazon_ec2_nvidia_tesla_k80) sections. ### Distributed training with NVIDIA® Tesla® K80 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> - <img style="width:100%" src="../images/perf_summary_k80_aws_distributed.png"> + <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png"> </div> Details and additional results are in the [Details for Amazon EC2 Distributed -(NVIDIA® Tesla® K80)](#details-for-amazon-ec2-distributed-nvidia®-tesla®-k80) +(NVIDIA® Tesla® K80)](#details_for_amazon_ec2_distributed_nvidia_tesla_k80) section. ### Compare synthetic with real data training @@ -82,12 +82,15 @@ section. * **TensorFlow GitHub hash:** b1e174e * **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package` -* **Disk:** local SSD +* **Disk:** Local SSD * **DataSet:** ImageNet -Batch size and optimizer used for each model. +Batch size and optimizer used for each model are listed in the table below. In +addition to the batch sizes listed in the table, InceptionV3, ResNet-50, +ResNet-152, and VGG16 were tested with a batch size of 32. Those results are in +the *other results* section. - | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 +Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 ------------------ | ----------- | --------- | ---------- | ------- | ----- Batch size per GPU | 64 | 64 | 64 | 512 | 64 Optimizer | sgd | sgd | sgd | sgd | sgd @@ -104,10 +107,8 @@ VGG16 | replicated (with NCCL) | n/a ### Results -Batch size and optimizer used for each model are listed in the table below. - <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> - <img style="width:100%" src="../images/perf_summary_p100_single_server.png"> + <img style="width:80%" src="../images/perf_summary_p100_single_server.png"> </div> <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> @@ -136,6 +137,28 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 Training AlexNet with real data on 8 GPUs was excluded from the graph and table above due to it maxing out the input pipeline. +### Other Results + +The results below are all with a batch size of 32. + +**Training synthetic data** + +GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16 +---- | ----------- | --------- | ---------- | ----- +1 | 128 | 210 | 85.3 | 124 +2 | 259 | 412 | 166 | 241 +4 | 520 | 827 | 330 | 470 +8 | 995 | 1623 | 643 | 738 + +**Training real data** + +GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16 +---- | ----------- | --------- | ---------- | ----- +1 | 130 | 208 | 85.0 | 124 +2 | 257 | 403 | 163 | 221 +4 | 507 | 814 | 325 | 401 +8 | 966 | 1525 | 641 | 619 + ## Details for Google Compute Engine (NVIDIA® Tesla® K80) ### Environment @@ -156,7 +179,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were tested with a batch size of 32. Those results are in the *other results* section. - | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 +Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 ------------------ | ----------- | --------- | ---------- | ------- | ----- Batch size per GPU | 64 | 64 | 32 | 512 | 32 Optimizer | sgd | sgd | sgd | sgd | sgd @@ -184,10 +207,10 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 ---- | ----------- | --------- | ---------- | ------- | ----- -1 | 30.5 | 56.7 | 20.7 | 639 | 30.2 -2 | 57.8 | 107 | 39 | 1136 | 55.5 -4 | 115 | 211 | 77.3 | 2067 | 106 -8 | 225 | 418 | 150 | 4056 | 213 + 1 | 30.6 | 56.7 | 20.7 | 639 | 30.2 + 2 | 58.4 | 107 | 39.0 | 1136 | 55.5 + 4 | 115 | 211 | 77.3 | 2067 | 106 + 8 | 225 | 422 | 151 | 4056 | 213 ### Other Results @@ -204,10 +227,10 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32) GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32) ---- | --------------------------- | ------------------------- -1 | 29.3 | 53.6 -2 | 55 | 102 -4 | 109 | 200 -8 | 215 | 387 + 1 | 29.5 | 53.6 + 2 | 55.4 | 102 + 4 | 110 | 201 + 8 | 216 | 387 ## Details for Amazon EC2 (NVIDIA® Tesla® K80) @@ -230,7 +253,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were tested with a batch size of 32. Those results are in the *other results* section. - | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 +Options | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16 ------------------ | ----------- | --------- | ---------- | ------- | ----- Batch size per GPU | 64 | 64 | 32 | 512 | 32 Optimizer | sgd | sgd | sgd | sgd | sgd @@ -289,7 +312,7 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32) GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32) ---- | --------------------------- | ------------------------- 1 | 30.0 | 53.6 -2 | 57.5 | 101 +2 | 57.5 | 102 4 | 113 | 202 8 | 212 | 379 @@ -313,7 +336,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were tested with a batch size of 32. Those results are in the *other results* section. - | InceptionV3 | ResNet-50 | ResNet-152 +Options | InceptionV3 | ResNet-50 | ResNet-152 ------------------ | ----------- | --------- | ---------- Batch size per GPU | 64 | 64 | 32 Optimizer | sgd | sgd | sgd @@ -337,7 +360,7 @@ used with the following exceptions: ### Results <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> - <img style="width:95%" src="../images/perf_summary_k80_aws_distributed.png"> + <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png"> </div> <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;"> @@ -374,34 +397,37 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32) ### Executing benchmark tests -The code for the benchmarks was created to be both used for benchmarking -TensorFlow as well as used as a tool to test hardware platforms. The benchmark -code includes modes such as `trivial` that run a virtually empty model that is -useful for testing the maximum possibly samples/sec for the input pipeline among -other things. Not only does this test TensorFlow but also the throughput of the -underlying systems. There are two ways to execute the benchmarks in -[tf_cnn_benchmarks.py](TODO: LINK TO GITHUB): +The [benchmark code](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) +was created to be used for benchmarking TensorFlow as well as used as a tool to +test hardware platforms. Techniques used in the benchmark scripts are detailed +in @{$performance_models$High-Performance Models}. + +There are two ways to execute the benchmark code: -1. Execute [tf_cnn_benchmarks.py](TODO: LINK TO GITHUB) directly -2. Utilize the [small wrapper](TODO: LINK TO GITHUB) that helps pick the - correct config +1. Execute [tf_cnn_benchmarks.py](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py) + directly. +2. Utilize the [scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/main.py) + that helps pick the correct config for each platform executes + `tf_cnn_benchmarks.py`. The wrapper is suggested as a starting point. Then investigate the variety of -options available in `tf_cnn_benchmarks.py`. While the wrapper extensive -examples, below are a couple highlights. +options available in `tf_cnn_benchmarks.py`. Below are a couple examples of +using the wrapper. -Run ResNet-50 on a single instance with 8 GPUs. The `system` argument is used to -determine the optimal configuration. The supported values are gce, aws, and -dgx1. If `system` is not passeed, the best config for the most widely available -hardware is used. +**Single Server** +This example illustrates training ResNet-50 on a single instance with 8 GPUs. +The `system` flag is used to determine the optimal configuration. The +supported values are gce, aws, and dgx1. If `system` is not passed, the best +config for the most widely available hardware is used. ```bash python main.py --model=resnet50 --num_gpus=8 python main.py --system=aws --model=resnet50 --num_gpus=8 ``` -Run ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1) and host_1 (10.0.0.2), with 8 -GPUs each on aws. +**Distributed** +This example illustrates training ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1) +and host_1 (10.0.0.2), with 8 GPUs each on AWS (Amazon EC2). ```bash # Run the following commands on host_0 (10.0.0.1): |