将张量转换为 ctc_loss 的稀疏张量

本文介绍了将张量转换为 ctc_loss 的稀疏张量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法将密集张量转换为稀疏张量?显然，Tensorflow 的 Estimator.fit 不接受 SparseTensors 作为标签.我想将 SparseTensors 传递到 Tensorflow 的 Estimator.fit 的原因之一是能够使用 tensorflow ctc_loss.代码如下:

Is there a way to convert a dense tensor into a sparse tensor? Apparently, Tensorflow's Estimator.fit doesn't accept SparseTensors as labels. One reason I would like to pass SparseTensors into Tensorflow's Estimator.fit is to be able to use tensorflow ctc_loss. Here's the code:

import dataset_utils
import tensorflow as tf
import numpy as np

from tensorflow.contrib import grid_rnn, learn, layers, framework

def grid_rnn_fn(features, labels, mode):
    input_layer = tf.reshape(features["x"], [-1, 48, 1596])
    indices = tf.where(tf.not_equal(labels, tf.constant(0, dtype=tf.int32)))
    values = tf.gather_nd(labels, indices)
    sparse_labels = tf.SparseTensor(indices, values, dense_shape=tf.shape(labels, out_type=tf.int64))

    cell_fw = grid_rnn.Grid2LSTMCell(num_units=128)
    cell_bw = grid_rnn.Grid2LSTMCell(num_units=128)
    bidirectional_grid_rnn = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, input_layer, dtype=tf.float32)
    outputs = tf.reshape(bidirectional_grid_rnn[0], [-1, 256])

    W = tf.Variable(tf.truncated_normal([256,
                                     80],
                                    stddev=0.1, dtype=tf.float32), name='W')
    b = tf.Variable(tf.constant(0., dtype=tf.float32, shape=[80], name='b'))

    logits = tf.matmul(outputs, W) + b
    logits = tf.reshape(logits, [tf.shape(input_layer)[0], -1, 80])
    logits = tf.transpose(logits, (1, 0, 2))

    loss = None
    train_op = None

    if mode != learn.ModeKeys.INFER:
        #Error occurs here
        loss = tf.nn.ctc_loss(inputs=logits, labels=sparse_labels, sequence_length=320)

    ... # returning ModelFnOps

def main(_):
    image_paths, labels = dataset_utils.read_dataset_list('../test/dummy_labels_file.txt')
    data_dir = "../test/dummy_data/"
    images = dataset_utils.read_images(data_dir=data_dir, image_paths=image_paths, image_extension='png')
    print('Done reading images')
    images = dataset_utils.resize(images, (1596, 48))
    images = dataset_utils.transpose(images)
    labels = dataset_utils.encode(labels)
    x_train, x_test, y_train, y_test = dataset_utils.split(features=images, test_size=0.5, labels=labels)

    train_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={"x": np.array(x_train)},
        y=np.array(y_train),
        num_epochs=1,
        shuffle=True,
        batch_size=1
    )

    classifier = learn.Estimator(model_fn=grid_rnn_fn, model_dir="/tmp/grid_rnn_ocr_model")
    classifier.fit(input_fn=train_input_fn)

更新:

事实证明，这个解决方案来自这里将密集张量转换为稀疏张量:

It turns out, this solution from here converts the dense tensor into a sparse one:

indices = tf.where(tf.not_equal(labels, tf.constant(0, dtype=tf.int32)))
values = tf.gather_nd(labels, indices)
sparse_labels = tf.SparseTensor(indices, values, dense_shape=tf.shape(labels, out_type=tf.int64))

但是，我现在遇到了 ctc_loss 引发的这个错误:

However, I encounter this error now raised by ctc_loss:

ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,?,80], [?,2], [?], [].

我有将密集标签转换为稀疏标签的代码:

I have this code that converts dense labels to sparse:

def convert_to_sparse(labels, dtype=np.int32):
    indices = []
    values = []

    for n, seq in enumerate(labels):
        indices.extend(zip([n] * len(seq), range(len(seq))))
        values.extend(seq)

    indices = np.asarray(indices, dtype=dtype)
    values = np.asarray(values, dtype=dtype)
    shape = np.asarray([len(labels), np.asarray(indices).max(0)[1] + 1], dtype=dtype)

    return indices, values, shape

我将 y_train 转换为稀疏标签，并将值放在 SparseTensor 中:

I converted y_train to sparse labels, and place the values inside a SparseTensor:

sparse_y_train = convert_to_sparse(y_train)
print(tf.SparseTensor(
    indices=sparse_y_train[0],
    values=sparse_y_train[1],
    dense_shape=sparse_y_train
))

并将其与 grid_rnn_fn 内部创建的 SparseTensor 进行比较:

And compared it to the SparseTensor created inside the grid_rnn_fn:

indices = tf.where(tf.not_equal(labels, tf.constant(0, dtype=tf.int32)))
values = tf.gather_nd(labels, indices)
sparse_labels = tf.SparseTensor(indices, values, dense_shape=tf.shape(labels, out_type=tf.int64))

这是我得到的:

对于sparse_y_train:

SparseTensor(indices=Tensor("SparseTensor/indices:0", shape=(33, 2), dtype=int64), values=Tensor("SparseTensor/values:0", shape=(33,), dtype=int32), dense_shape=Tensor("SparseTensor/dense_shape:0", shape=(2,), dtype=int64))

对于sparse_labels:

SparseTensor(indices=Tensor("Where:0", shape=(?, 2), dtype=int64), values=Tensor("GatherNd:0", shape=(?,), dtype=int32), dense_shape=Tensor("Shape:0", shape=(2,), dtype=int64))

这让我认为 ctc_loss 似乎无法将 SparseTensors 作为具有动态形状的标签处理.

Which leads me to think that ctc_loss can't seem to handle SparseTensors as labels with dynamic shapes.

将张量转换为

将张量转换为 ctc_loss 的稀疏张量

问题描述

推荐答案