aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/tools/tfprof/README.md
blob: 54f3cd62f283de853bb1b14e61c96c81f77702b5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# tfprof: TensorFlow Profiler and Beyond

### Features

* Profile model architectures
  * parameters, tensor shapes, float operations, device placement, etc.
* Profile model performance
  * execution time, memory consumption
  * Profile multiple steps.
* Auto detect and advise. (Experimental)

### Interfaces

* Python API
* Command Line
* Visualization
* C++ API (Not public, contact us if needed.)

### Views and Options

tfprof provides 4 different views to organize the statistics.

    *  code view: operations are grouped by Python codes that generate them.
    *  op view: operations are grouped by operation type (E.g. MatMul, Conv2D).
    *  scope view: operations are organized based on name scope hierarchies.
    *  graph view: operations are organized based on input/output.

tfprof provides options to help user select, filter and order statistics.
See [Options](g3doc/options.md) for detail instructions.

```
-max_depth                  10
-min_bytes                  0
-min_micros                 0
-min_params                 0
-min_float_ops              0
-min_occurrence             0
-step                       -1
-order_by                   name
-account_type_regexes       .*
-start_name_regexes         .*
-trim_name_regexes
-show_name_regexes          .*
-hide_name_regexes
-account_displayed_op_only  false
-select                     params
-output                     stdout:
```

### Tutorials

*  [Python API](g3doc/python_api.md)
*  [Command Line Interface](g3doc/command_line.md)
*  [Profile Time](g3doc/profile_time.md)
*  [Profile Memory](g3doc/profile_memory.md)
*  [Profile Model Architecture](g3doc/profile_model_architecture.md)
*  [Auto Detect and Advise](g3doc/advise.md)
*  [Options](g3doc/options.md)

## Demo

### Attribute TensorFlow graph running time to your Python codes.
```shell
tfprof> code -max_depth 1000 -show_name_regexes .*model_analyzer.*py.* -select micros -account_type_regexes .* -order_by micros
_TFProfRoot (0us/22.44ms)
  model_analyzer_test.py:149:run_filename_as_m...:none (0us/22.44ms)
    model_analyzer_test.py:33:_run_code_in_main:none (0us/22.44ms)
      model_analyzer_test.py:208:<module>:test.main() (0us/22.44ms)
        model_analyzer_test.py:132:testComplexCodeView:x = lib.BuildFull... (0us/22.44ms)
          model_analyzer_testlib.py:63:BuildFullModel:return sgd_op.min... (0us/21.83ms)
          model_analyzer_testlib.py:58:BuildFullModel:cell, array_ops.c... (0us/333us)
          model_analyzer_testlib.py:54:BuildFullModel:seq.append(array_... (0us/254us)
            model_analyzer_testlib.py:42:BuildSmallModel:x = nn_ops.conv2d... (0us/134us)
            model_analyzer_testlib.py:46:BuildSmallModel:initializer=init_... (0us/40us)
            ...
          model_analyzer_testlib.py:61:BuildFullModel:loss = nn_ops.l2_... (0us/28us)
          model_analyzer_testlib.py:60:BuildFullModel:target = array_op... (0us/0us)
        model_analyzer_test.py:134:testComplexCodeView:sess.run(variable... (0us/0us)
```

### Show your model variables and the number of parameters.
```
tfprof> scope -account_type_regexes VariableV2 -max_depth 4 -select params
_TFProfRoot (--/930.58k params)
  global_step (1/1 params)
  init/init_conv/DW (3x3x3x16, 432/864 params)
  pool_logit/DW (64x10, 640/1.28k params)
    pool_logit/DW/Momentum (64x10, 640/640 params)
  pool_logit/biases (10, 10/20 params)
    pool_logit/biases/Momentum (10, 10/10 params)
  unit_last/final_bn/beta (64, 64/128 params)
  unit_last/final_bn/gamma (64, 64/128 params)
  unit_last/final_bn/moving_mean (64, 64/64 params)
  unit_last/final_bn/moving_variance (64, 64/64 params)
```

### Show the most expensive operation types.
```
tfprof> op -select micros,bytes,occurrence -order_by micros
SoftmaxCrossEntropyWithLogits      36.58MB (100.00%, 0.05%),      1.37sec (100.00%, 23.56%),         30
MatMul                        2720.57MB (99.95%, 3.66%),      988.90ms (76.44%, 17.05%),       3450
ConcatV2                       741.37MB (96.29%, 1.00%),       421.44ms (59.38%, 7.27%),       6098
Mul                           3957.24MB (95.29%, 5.33%),       418.90ms (52.12%, 7.22%),       9427
Add                            740.05MB (89.96%, 1.00%),       335.26ms (44.89%, 5.78%),       2180
Sub                             32.46MB (88.97%, 0.04%),       216.44ms (39.11%, 3.73%),       4372
AddN                           733.21MB (88.92%, 0.99%),       208.46ms (35.38%, 3.59%),       5481
Slice                          708.07MB (87.94%, 0.95%),       205.27ms (31.78%, 3.54%),       7277
Fill                           954.27MB (86.98%, 1.28%),       154.50ms (28.24%, 2.66%),       9686
Select                         312.33MB (85.70%, 0.42%),       123.04ms (25.58%, 2.12%),       5746
Sigmoid                        152.57MB (85.28%, 0.21%),        96.66ms (23.46%, 1.67%),       2970
```

### Visualize time and memory.
<left>
[CodeTimeline](g3doc/graph_timeline.png)
</left>

### Teams

* Xin Pan (xpan@google.com, github: panyx0718)
* Jon Shlens
* Yao Zhang