本文介绍了使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 contrib 包 (tf.contrib.ctc.ctc_loss) 下使用 Tensorflow 的 CTC 实现,但没有成功.

I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.

  • 首先,有人知道我在哪里可以阅读好的分步教程吗?Tensorflow 的文档在这个主题上非常糟糕.
  • 我是否必须向 ctc_loss 提供带有交错空白标签的标签?
  • 即使使用长度为 1 的训练数据集超过 200 个时期,我也无法过度拟合我的网络.:(
  • 如何使用 tf.edit_distance 计算标签错误率?

这是我的代码:

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1,
  # [batch_size, max_time, cell.output_size]) and the final state of shape
  # [batch_size, cell.state_size]
  y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), #  num_proj=num_classes
    data,
    dtype=tf.float32,
    sequence_length=data_length,
  )

  #  For sequence labelling, we want a prediction for each timestamp.
  #  However, we share the weights for the softmax layer across all timesteps.
  #  How do we do that? By flattening the first two dimensions of the output tensor.
  #  This way time steps look the same as examples in the batch to the weight matrix.
  #  Afterwards, we reshape back to the desired shape


  # Reshaping
  logits = tf.transpose(y_rnn1, perm=(1, 0, 2))

  #  Get the loss by calculating ctc_loss
  #  Also calculates
  #  the gradient.  This class performs the softmax operation for you, so    inputs
  #  should be e.g. linear projections of outputs by an LSTM.
  loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))

  #  Define our optimizer with learning rate
  optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

  #  Decoding using beam search
  decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

谢谢!

更新(06/29/2016)

谢谢@jihyeon-seo!所以,我们在 RNN 的输入端有类似 [num_batch, max_time_step, num_features] 的东西.我们使用 dynamic_rnn 执行给定输入的循环计算,输出形状为 [num_batch, max_time_step, num_hidden] 的张量.之后,我们需要在每个 tilmestep 中做一个权重共享的仿射投影,所以我们必须重塑为 [num_batch*max_time_step, num_hidden],乘以形状为 [num_hidden, num_classes] 的权重矩阵,求和一个偏差,撤消reshape,transpose(所以我们会有[max_time_steps, num_batch, num_classes] 作为ctc loss的输入),这个结果就是ctc_loss函数的输入.我做的一切都正确吗?

Thank you, @jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?

这是代码:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新 (07/11/2016)

谢谢@Xiv.这是修复错误后的代码:

Thank you @Xiv. Here is the code after the bug fix:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
    self._logits = tf.transpose(self._logits, (1,0,2))

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新 (07/25/16)

我在 GitHub 上发布我的代码的一部分,使用一个话语.放心使用!:)

I published on GitHub part of my code, working with one utterance. Feel free to use! :)

推荐答案

我正在尝试做同样的事情.这是我发现您可能感兴趣的内容.

I'm trying to do the same thing.Here's what I found you may be interested in.

真的很难找到 CTC 的教程,但是 这个例子很有帮助.

It was really hard to find the tutorial for CTC, but this example was helpful.

对于空白标签,CTC层假设空白索引为num_classes - 1,所以需要为空白标签提供一个额外的类.

And for the blank label, CTC layer assumes that the blank index is num_classes - 1, so you need to provide an additional class for the blank label.

此外,CTC 网络执行 softmax 层.在您的代码中,RNN 层连接到 CTC 损失层.RNN层的输出是内部激活的,所以需要再增加一层没有激活函数的隐藏层(也可以是输出层),然后再增加CTC损失层.

Also, CTC network performs softmax layer. In your code, RNN layer is connected to CTC loss layer. Output of RNN layer is internally activated, so you need to add one more hidden layer (it could be output layer) without activation function, then add CTC loss layer.

这篇关于使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 06:28