tensorflow - 从TFRecord保存和读取可变大小列表

将稀疏向量存储到TFRecord的最佳方法是什么？我的稀疏向量仅包含1和0，因此我决定将索引保存在“1”所在的位置，如下所示:

example = tf.train.Example(
        features=tf.train.Features(
            feature={
                'label': self._int64_feature(label),
                'features' : self._int64_feature_list(values)
            }
        )
    )

在这里，values是包含“ones”索引的列表。这个values数组有时包含数百个元素，有时根本不包含任何元素。之后，我只需将序列化的示例保存到tfrecord。稍后，我正在阅读tfrecord，如下所示:

features = tf.parse_single_example(
    serialized_example,
    features={
        # We know the length of both fields. If not the
        # tf.VarLenFeature could be used
        'label': tf.FixedLenFeature([], dtype=tf.int64),
        'features': tf.VarLenFeature(dtype=tf.int64)
    }
)

label = features['label']
values = features['features']

这是行不通的，因为values数组被识别为稀疏数组，并且我无法获取已保存的数据。将稀疏张量存储在tfrecords中的最佳方法是什么，以及如何读取它？

最佳答案

如果您只是序列化1s的位置，则应该可以通过一些技巧来找出正确的稀疏张量:

解析后的稀疏张量features['features']将如下所示:
features['features'].indices: [[batch_id, position]...]
其中position是无用的枚举。

但您确实希望feature['features']看起来像[[batch_id, one_position], ...]
其中one_position是您在稀疏张量中指定的实际值。

所以:

indices = features['features'].indices
indices = tf.transpose(indices)
# Now looks like [[batch_id, batch_id, ...], [position, position, ...]]
indices = tf.stack([indices[0], features['features'].values])
# Now looks like [[batch_id, batch_id, ...], [one_position, one_position, ...]]
indices = tf.transpose(indices)
# Now looks like [[batch_id, one_position], [batch_id, one_position], ...]]
features['features'] = tf.SparseTensor(
   indices=indices,
   values=tf.ones(shape=tf.shape(indices)[:1])
   dense_shape=1 + tf.reduce_max(indices, axis=[0])
)

瞧! features['features']现在表示一个矩阵，该矩阵是您的一系列稀疏矢量的串联。

注意:如果要将其视为密集张量，则必须执行tf.sparse_to_dense，并且密集张量将具有[None, None]的形状(这使它很难使用。)。如果您知道最大可能的矢量长度，则可能想要对其进行硬编码:dense_shape=[batch_size, max_vector_length]

关于tensorflow - 从TFRecord保存和读取可变大小列表，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/37270697/