阅读 Parse TFRecord 文件

Created: November-22, 2018

TFRecord 文件是用于存储数据（张量）的本机张量流二进制格式。要读取文件，你可以使用类似于 CSV 示例的代码：

import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file.tfrecord"], num_epochs=1)
reader = tf.TFRecordReader()
key, serialized_example = reader.read(filename_queue)

然后，你需要解析 serialized_example Queue 中的示例。你可以使用 tf.parse_example，这需要以前的批处理，但更快或 tf.parse_single_example：

batch = tf.train.batch([serialized_example], batch_size=100)
parsed_batch = tf.parse_example(batch, features={
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.train.batch 将形状 [x, y, z] 的给定张量的连续值连接到形状 [batch_size, x, y, z] 的张量。features 字典映射到的 tensorflow 的定义的功能名称的功能。你以类似的方式使用 parse_single_example：

parsed_example = tf.parse_single_example(serialized_example, {
  "feature_name_1": tf.FixedLenFeature(shape=[1], tf.int64),
  "feature_name_2": tf.FixedLenFeature(shape=[1], tf.float32)
})

tf.parse_example 和 tf.parse_single_example 返回一个字典，将特征名称映射到具有值的张量。

要批量来自 parse_single_example 的示例，你应该从字典中提取张量并像以前一样使用 tf.train.batch：

parsed_batch = dict(zip(parsed_example.keys(),
    tf.train.batch(parsed_example.values(), batch_size=100)

你像以前一样阅读数据，通过所有张量的列表来评估 sess.run：

with tf.Session() as sess:
  sess.run(tf.initialize_local_variables())
  tf.train.start_queue_runners()
  try:
    while True:
      data_batch = sess.run(parsed_batch.values())
      # process data
  except tf.errors.OutOfRangeError:
    pass