blob: 131d28fa3eb47ff363888934c728e9971283c45d (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
# Performance
Performance is an important consideration when training machine learning
models. Performance speeds up and scales research while
also providing end users with near instant predictions. This section provides
details on the high level APIs to use along with best practices to build
and train high performance models, and quantize models for the least latency
and highest throughput for inference.
* @{$performance_guide$Performance Guide} contains a collection of best
practices for optimizing your TensorFlow code.
* @{$datasets_performance$Data input pipeline guide} describes the tf.data
API for building efficient data input pipelines for TensorFlow.
* @{$performance/benchmarks$Benchmarks} contains a collection of
benchmark results for a variety of hardware configurations.
* For improving inference efficiency on mobile and
embedded hardware, see
@{$quantization$How to Quantize Neural Networks with TensorFlow}, which
explains how to use quantization to reduce model size, both in storage
and at runtime.
* For optimizing inference on GPUs, refer to [NVIDIA TensorRT™
integration with TensorFlow.](
https://medium.com/tensorflow/speed-up-tensorflow-inference-on-gpus-with-tensorrt-13b49f3db3fa)
XLA (Accelerated Linear Algebra) is an experimental compiler for linear
algebra that optimizes TensorFlow computations. The following guides explore
XLA:
* @{$xla$XLA Overview}, which introduces XLA.
* @{$broadcasting$Broadcasting Semantics}, which describes XLA's
broadcasting semantics.
* @{$developing_new_backend$Developing a new back end for XLA}, which
explains how to re-target TensorFlow in order to optimize the performance
of the computational graph for particular hardware.
* @{$jit$Using JIT Compilation}, which describes the XLA JIT compiler that
compiles and runs parts of TensorFlow graphs via XLA in order to optimize
performance.
* @{$operation_semantics$Operation Semantics}, which is a reference manual
describing the semantics of operations in the `ComputationBuilder`
interface.
* @{$shapes$Shapes and Layout}, which details the `Shape` protocol buffer.
* @{$tfcompile$Using AOT compilation}, which explains `tfcompile`, a
standalone tool that compiles TensorFlow graphs into executable code in
order to optimize performance.
|