aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/api_guides/python/python_io.md
blob: a5444408fe8f276028b6cedd5044947051043d31 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Data IO (Python functions)
[TOC]

A TFRecords file represents a sequence of (binary) strings.  The format is not
random access, so it is suitable for streaming large amounts of data but not
suitable if fast sharding or other non-sequential access is desired.

*   @{tf.python_io.TFRecordWriter}
*   @{tf.python_io.tf_record_iterator}
*   @{tf.python_io.TFRecordCompressionType}
*   @{tf.python_io.TFRecordOptions}

- - -

## TFRecords Format Details

A TFRecords file contains a sequence of strings with CRC hashes.  Each record
has the format

    uint64 length
    uint32 masked_crc32_of_length
    byte   data[length]
    uint32 masked_crc32_of_data

and the records are concatenated together to produce the file.  The CRC32s
are [described here](https://en.wikipedia.org/wiki/Cyclic_redundancy_check),
and the mask of a CRC is

    masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul