tensorflow/contrib/lite/toco/g3doc/cmdline_reference.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255

# TensorFlow Lite Optimizing Converter command-line reference

This page is complete reference of command-line flags. It is complemented by the
following other documents:

*   [README](../README.md)
*   [Command-line examples](cmdline_examples.md)

Table of contents:

[TOC]

## High-level overview

A full list and detailed specification of all flags is given in the next
section. For now we focus on a higher-level description of command lines:

```
toco \
  --input_format=... \
  --output_format=... \
  --input_file=... \
  --output_file=... \
  [model flags...] \
  [transformation flags...] \
  [logging flags...]
```

In other words, the converter requires at least the following mandatory flags:
`--input_format`, `--output_format`, `--input_file`, `--output_file`. Depending
on the input and output formats, additional flags may be allowed or mandatory:

*   *Model flags* provide additional information about the model stored in the
    input file.
    *   `--output_array` or `--output_arrays` specify which arrays in the input
        file are to be considered the output activations.
    *   `--input_array` or `--input_arrays` specify which arrays in the input
        file are to be considered the input activations.
    *   `--input_shape` or `--input_shapes` specify the shapes of the input
        arrays.
    *   `--input_data_type` or `--input_data_types` specify the data types of
        input arrays, which can be used if the input file does not already
        specify them.
    *   `--mean_value` or `--mean_values`, and `--std_value` or `--std_values`,
        give the dequantization parameters of the input arrays, for the case
        when the output file will accept quantized input arrays.
*   *Transformation flags* specify options of the transformations to be applied
    to the graph, i.e. they specify requested properties that the output file
    should have.
    *   `--inference_type` specifies the type of real-numbers arrays in the
        output file. This only affects arrays of real numbers and allows to
        control their quantization or dequantization, effectively switching
        between floating-point and quantized arithmetic for the inference
        workload, as far as real numbers are concerned. Other data types are
        unaffected (e.g. plain integers, and strings).
    *   `--inference_input_type` is like `--inference_type` but specifically
        controlling input arrays, separately from other arrays. If not
        specified, then `--inference_type` is used. The use case for specifying
        `--inference_input_type` is when one wants to perform floating-point
        inference on a quantized input, as is common in image models operating
        on bitmap image inputs.
    *   Some transformation flags allow to carry on with quantization when the
        input graph is not properly quantized: `--default_ranges_min`,
        `--default_ranges_max`, `--drop_fake_quant`,
        `--reorder_across_fake_quant`.
*   *Logging flags* described below.

## Command-line flags complete reference

### Mandatory flags

*   `--input_format`. Type: string. Specifies the format of the input file.
    Allowed values:
    *   `TENSORFLOW_GRAPHDEF` &mdash; The TensorFlow GraphDef format. Both
        binary and text proto formats are allowed.
    *   `TFLITE` &mdash; The TensorFlow Lite flatbuffers format.
*   `--output_format`. Type: string. Specifies the format of the output file.
    Allowed values:
    *   `TENSORFLOW_GRAPHDEF` &mdash; The TensorFlow GraphDef format. Always
        produces a file in binary (not text) proto format.
    *   `TFLITE` &mdash; The TensorFlow Lite flatbuffers format.
        *   Whether a float or quantized TensorFlow Lite file will be produced
            depends on the `--inference_type` flag.
    *   `GRAPHVIZ_DOT` &mdash; The GraphViz `.dot` format. This asks the
        converter to generate a reasonable graphical representation of the graph
        after simplification by a generic set of transformation.
        *   A typical `dot` command line to view the resulting graph might look
            like: `dot -Tpdf -O file.dot`.
        *   Note that since passing this `--output_format` means losing the
            information of which output format you actually care about, and
            since the converter's transformations depend on the specific output
            format, the resulting visualization may not fully reflect what you
            would get on the actual output format that you are using. To avoid
            that concern, and generally to get a visualization of exactly what
            you get in your actual output format as opposed to just a merely
            plausible visualization of a model, consider using `--dump_graphviz`
            instead and keeping your true `--output_format`.
*   `--input_file`. Type: string. Specifies the path of the input file. This may
    be either an absolute or a relative path.
*   `--output_file`. Type: string. Specifies the path of the output file.

### Model flags

*   `--output_array`. Type: string. Specifies a single array as the output
    activations. Incompatible with `--output_arrays`.
*   `--output_arrays`. Type: comma-separated list of strings. Specifies a list
    of arrays as the output activations, for models with multiple outputs.
    Incompatible with `--output_array`.
*   `--input_array`. Type: string. Specifies a single array as the input
    activations. Incompatible with `--input_arrays`.
*   `--input_arrays`. Type: comma-separated list of strings. Specifies a list of
    arrays as the input activations, for models with multiple inputs.
    Incompatible with `--input_array`.

When `--input_array` is used, the following flags are available to provide
additional information about the single input array:

*   `--input_shape`. Type: comma-separated list of integers. Specifies the shape
    of the input array, in TensorFlow convention: starting with the outer-most
    dimension (the dimension corresponding to the largest offset stride in the
    array layout), ending with the inner-most dimension (the dimension along
    which array entries are typically laid out contiguously in memory).
    *   For example, a typical vision model might pass
        `--input_shape=1,60,80,3`, meaning a batch size of 1 (no batching), an
        input image height of 60, an input image width of 80, and an input image
        depth of 3, for the typical case where the input image is a RGB bitmap
        (3 channels, depth=3) stored by horizontal scanlines (so 'width' is the
        next innermost dimension after 'depth').
*   `--mean_value` and `--std_value`. Type: floating-point. The decimal point
    character is always the dot (`.`) regardless of the locale. These specify
    the (de-)quantization parameters of the input array, when it is quantized.
    *   The meaning of mean_value and std_value is as follows: each quantized
        value in the quantized input array will be interpreted as a mathematical
        real number (i.e. as an input activation value) according to the
        following formula:
        *   `real_value = (quantized_input_value - mean_value) / std_value`.
    *   When performing float inference (`--inference_type=FLOAT`) on a
        quantized input, the quantized input would be immediately dequantized by
        the inference code according to the above formula, before proceeding
        with float inference.
    *   When performing quantized inference
        (`--inference_type=QUANTIZED_UINT8`), no dequantization is ever to be
        performed by the inference code; however, the quantization parameters of
        all arrays, including those of the input arrays as specified by
        mean_value and std_value, all participate in the determination of the
        fixed-point multipliers used in the quantized inference code.

When `--input_arrays` is used, the following flags are available to provide
additional information about the multiple input arrays:

*   `--input_shapes`. Type: colon-separated list of comma-separated lists of
    integers. Each comma-separated list of integer gives the shape of one of the
    input arrays specified in `--input_arrays`, in the same order. See
    `--input_shape` for details.
    *   Example: `--input_arrays=foo,bar --input_shapes=2,3:4,5,6` means that
        there are two input arrays. The first one, "foo", has shape [2,3]. The
        second one, "bar", has shape [4,5,6].
*   `--mean_values`, `--std_values`. Type: comma-separated lists of
    floating-point numbers. Each number gives the corresponding value for one of
    the input arrays specified in `--input_arrays`, in the same order. See
    `--mean_value`, `--std_value` for details.

### Transformation flags

*   `--inference_type`. Type: string. Sets the type of real-number arrays in the
    output file, that is, controls the representation (quantization) of real
    numbers in the output file, except for input arrays, which are controlled by
    `--inference_input_type`.

    This flag only impacts real-number arrays. By "real-number" we mean float
    arrays, and quantized arrays. This excludes plain integer arrays, strings
    arrays, and every other data type.

    For real-number arrays, the impact of this flag is to allow the output file
    to choose a different real-numbers representation (quantization) from what
    the input file used. For any other types of arrays, changing the data type
    would not make sense.

    Specifically:

    *   If `FLOAT`, then real-numbers arrays will be of type float in the output
        file. If they were quantized in the input file, then they get
        dequantized.
    *   If `QUANTIZED_UINT8`, then real-numbers arrays will be quantized as
        uint8 in the output file. If they were float in the input file, then
        they get quantized.
    *   If not set, then all real-numbers arrays retain the same type in the
        output file as they have in the input file.

*   `--inference_input_type`. Type: string. Similar to inference_type, but
    allows to control specifically the quantization of input arrays, separately
    from other arrays.

    If not set, then the value of `--inference_type` is implicitly used, i.e. by
    default input arrays are quantized like other arrays.

    Like `--inference_type`, this only affects real-number arrays. By
    "real-number" we mean float arrays, and quantized arrays. This excludes
    plain integer arrays, strings arrays, and every other data type.

    The typical use for this flag is for vision models taking a bitmap as input,
    typically with uint8 channels, yet still requiring floating-point inference.
    For such image models, the uint8 input is quantized, i.e. the uint8 values
    are interpreted as real numbers, and the quantization parameters used for
    such input arrays are their `mean_value`, `std_value` parameters.

*   `--default_ranges_min`, `--default_ranges_max`. Type: floating-point. The
    decimal point character is always the dot (`.`) regardless of the locale.
    These flags enable what is called "dummy quantization". If defined, their
    effect is to define fallback (min, max) range values for all arrays that do
    not have a properly specified (min, max) range in the input file, thus
    allowing to proceed with quantization of non-quantized or
    incorrectly-quantized input files. This enables easy performance prototyping
    ("how fast would my model run if I quantized it?") but should never be used
    in production as the resulting quantized arithmetic is inaccurate.

*   `--drop_fake_quant`. Type: boolean. Default: false. Causes fake-quantization
    nodes to be dropped from the graph. This may be used to recover a plain
    float graph from a fake-quantized graph.

*   `--reorder_across_fake_quant`. Type: boolean. Default: false. Normally,
    fake-quantization nodes must be strict boundaries for graph transformations,
    in order to ensure that quantized inference has the exact same arithmetic
    behavior as quantized training --- which is the whole point of quantized
    training and of FakeQuant nodes in the first place. However, that entails
    subtle requirements on where exactly FakeQuant nodes must be placed in the
    graph. Some quantized graphs have FakeQuant nodes at unexpected locations,
    that prevent graph transformations that are necessary in order to generate a
    well-formed quantized representation of these graphs. Such graphs should be
    fixed, but as a temporary work-around, setting this
    reorder_across_fake_quant flag allows the converter to perform necessary
    graph transformations on them, at the cost of no longer faithfully matching
    inference and training arithmetic.

### Logging flags

The following are standard Google logging flags:

*   `--logtostderr` redirects Google logging to standard error, typically making
    it visible in a terminal.
*   `--v` sets verbose logging levels (for debugging purposes). Defined levels:
    *   `--v=1`: log all graph transformations that did make a change on the
        graph.
    *   `--v=2`: log all graph transformations that did *not* make a change on
        the graph.

The following flags allow to generate graph visualizations of the actual graph
at various points during transformations:

*   `--dump_graphviz=/path` enables dumping of the graphs at various stages of
    processing as GraphViz `.dot` files. Generally preferred over
    `--output_format=GRAPHVIZ_DOT` as this allows you to keep your actually
    relevant `--output_format`.
*   `--dump_graphviz_video` enables dumping of the graph after every single
    graph transformation (for debugging purposes).