aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/performance/performance_guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/docs_src/performance/performance_guide.md')
-rw-r--r--tensorflow/docs_src/performance/performance_guide.md16
1 files changed, 8 insertions, 8 deletions
diff --git a/tensorflow/docs_src/performance/performance_guide.md b/tensorflow/docs_src/performance/performance_guide.md
index df70309568..9ea1d6a705 100644
--- a/tensorflow/docs_src/performance/performance_guide.md
+++ b/tensorflow/docs_src/performance/performance_guide.md
@@ -41,7 +41,7 @@ approaches to identifying issues:
utilization is not approaching 80-100%, then the input pipeline may be the
bottleneck.
* Generate a timeline and look for large blocks of white space (waiting). An
- example of generating a timeline exists as part of the @{$jit$XLA JIT}
+ example of generating a timeline exists as part of the [XLA JIT](../performance/xla/jit.md)
tutorial.
* Check CPU usage. It is possible to have an optimized input pipeline and lack
the CPU cycles to process the pipeline.
@@ -68,7 +68,7 @@ the CPU.
#### Using the tf.data API
-The @{$datasets$tf.data API} is replacing `queue_runner` as the recommended API
+The [tf.data API](../guide/datasets.md) is replacing `queue_runner` as the recommended API
for building input pipelines. This
[ResNet example](https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator/cifar10_main.py)
([arXiv:1512.03385](https://arxiv.org/abs/1512.03385))
@@ -78,7 +78,7 @@ training CIFAR-10 illustrates the use of the `tf.data` API along with
The `tf.data` API utilizes C++ multi-threading and has a much lower overhead
than the Python-based `queue_runner` that is limited by Python's multi-threading
performance. A detailed performance guide for the `tf.data` API can be found
-@{$datasets_performance$here}.
+[here](../performance/datasets_performance.md).
While feeding data using a `feed_dict` offers a high level of flexibility, in
general `feed_dict` does not provide a scalable solution. If only a single GPU
@@ -174,7 +174,7 @@ faster using `NHWC` than the normally most efficient `NCHW`.
### Common fused Ops
Fused Ops combine multiple operations into a single kernel for improved
-performance. There are many fused Ops within TensorFlow and @{$xla$XLA} will
+performance. There are many fused Ops within TensorFlow and [XLA](../performance/xla/index.md) will
create fused Ops when possible to automatically improve performance. Collected
below are select fused Ops that can greatly improve performance and may be
overlooked.
@@ -257,7 +257,7 @@ the CPU in use. Speedups for training and inference on CPU are documented below
in [Comparing compiler optimizations](#comparing-compiler-optimizations).
To install the most optimized version of TensorFlow,
-@{$install_sources$build and install} from source. If there is a need to build
+[build and install](../install/install_sources.md) from source. If there is a need to build
TensorFlow on a platform that has different hardware than the target, then
cross-compile with the highest optimizations for the target platform. The
following command is an example of using `bazel` to compile for a specific
@@ -298,7 +298,7 @@ each of the towers. How each tower gets the updated variables and how the
gradients are applied has an impact on the performance, scaling, and convergence
of the model. The rest of this section provides an overview of variable
placement and the towering of a model on multiple GPUs.
-@{$performance_models$High-Performance Models} gets into more details regarding
+[High-Performance Models](../performance/performance_models.md) gets into more details regarding
more complex methods that can be used to share and update variables between
towers.
@@ -307,7 +307,7 @@ and even how the hardware has been configured. An example of this, is that two
systems can be built with NVIDIA Tesla P100s but one may be using PCIe and the
other [NVLink](http://www.nvidia.com/object/nvlink.html). In that scenario, the
optimal solution for each system may be different. For real world examples, read
-the @{$performance/benchmarks$benchmark} page which details the settings that
+the [benchmark](../performance/benchmarks.md) page which details the settings that
were optimal for a variety of platforms. Below is a summary of what was learned
from benchmarking various platforms and configurations:
@@ -433,7 +433,7 @@ scenarios.
## Optimizing for CPU
CPUs, which includes Intel® Xeon Phi™, achieve optimal performance when
-TensorFlow is @{$install_sources$built from source} with all of the instructions
+TensorFlow is [built from source](../install/install_sources.md) with all of the instructions
supported by the target CPU.
Beyond using the latest instruction sets, Intel® has added support for the