本文介绍了如何在 Keras 中使用 return_sequences 选项和 TimeDistributed 层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像下面这样的对话语料库.我想实现一个预测系统动作的 LSTM 模型.系统动作被描述为位向量.一个用户输入被计算为一个词嵌入,它也是一个位向量.

I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector.

t1: user: "Do you know an apple?", system: "no"(action=2)
t2: user: "xxxxxx", system: "yyyy" (action=0)
t3: user: "aaaaaa", system: "bbbb" (action=5)

所以我想实现的是多对多(2)"模型.当我的模型接收到用户输入时,它必须输出一个系统操作.但是我无法理解 LSTM 之后的 return_sequences 选项和 TimeDistributed 层.要实现多对多(2)",需要return_sequences==True并在LSTMs后添加TimeDistributed?如果您能对它们进行更多描述,我将不胜感激.

So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action.But I cannot understand return_sequences option and TimeDistributed layer after LSTM. To realize "many-to-many (2)", return_sequences==True and adding a TimeDistributed after LSTMs are required? I appreciate if you would give more description of them.

return_sequences:布尔值.是返回输出序列中的最后一个输出,还是完整的序列.

TimeDistributed:此包装器允许将一个层应用于输入的每个时间切片.

TimeDistributed: This wrapper allows to apply a layer to every temporal slice of an input.

更新于 2017/03/13 17:40

我想我能理解 return_sequence 选项.但我仍然不确定 TimeDistributed.如果我在 LSTM 之后添加 TimeDistributed,模型是否与下面的my many-to-many(2)"相同?所以我认为每个输出都应用了密集层.

Updated 2017/03/13 17:40

I think I could understand the return_sequence option. But I am not still sure about TimeDistributed. If I add a TimeDistributed after LSTMs, is the model the same as "my many-to-many(2)" below? So I think Dense layers are applied for each output.

推荐答案

LSTM 层和 TimeDistributed 包装器是获得所需多对多"关系的两种不同方式.

The LSTM layer and the TimeDistributed wrapper are two different ways to get the "many to many" relationship that you want.

  1. LSTM 会一个一个吃掉你句子中的单词,你可以通过return_sequence"选择在每一步(每个单词处理后)输出一些东西(状态),或者只在最后一个单词被吃掉后输出一些东西.因此,当 return_sequence=TRUE 时,输出将是相同长度的序列,当 return_sequence=FALSE 时,输出将只是一个向量.
  2. 时间分布.此包装器允许您独立将一层(例如 Dense)应用到序列的每个元素.该层对于每个元素将具有完全相同的权重,它会应用于每个单词,当然,它将返回独立处理的单词序列.
  1. LSTM will eat the words of your sentence one by one, you can chose via "return_sequence" to outuput something (the state) at each step (after each word processed) or only output something after the last word has been eaten. So with return_sequence=TRUE, the output will be a sequence of the same length, with return_sequence=FALSE, the output will be just one vector.
  2. TimeDistributed. This wrapper allows you to apply one layer (say Dense for example) to every element of your sequence independently. That layer will have exactly the same weights for every element, it's the same that will be applied to each words and it will, of course, return the sequence of words processed independently.

如您所见,两者的区别在于LSTM通过序列传播信息,它会吃掉一个词,更新其状态并返回或不返回它.然后它会继续下一个词,而仍然携带来自之前的信息......就像在 TimeDistributed 中一样,单词将按照相同的方式自行处理,就好像它们在筒仓中一样,并且同一层适用于它们中的每一个.

As you can see, the difference between the two is that the LSTM "propagates the information through the sequence, it will eat one word, update its state and return it or not. Then it will go on with the next word while still carrying information from the previous ones.... as in the TimeDistributed, the words will be processed in the same way on their own, as if they were in silos and the same layer applies to every one of them.

所以你不必连续使用 LSTM 和 TimeDistributed,你可以做任何你想做的事,只要记住它们各自做了什么.

So you dont have to use LSTM and TimeDistributed in a row, you can do whatever you want, just keep in mind what each of them do.

我希望它更清楚?

在您的情况下,时间分布将密集层应用于 LSTM 输出的每个元素.

The time distributed, in your case, applies a dense layer to every element that was output by the LSTM.

举个例子:

您有一个嵌入在 emb_size 维度中的 n_words 个单词序列.所以你的输入是一个形状为 (n_words, emb_size)

You have a sequence of n_words words that are embedded in emb_size dimensions. So your input is a 2D tensor of shape (n_words, emb_size)

首先应用输出维度 = lstm_outputreturn_sequence = True 的 LSTM.输出仍然是一个序列,因此它将是一个形状为 (n_words, lstm_output) 的二维张量.所以你有 n_words 个长度为 lstm_output 的向量.

First you apply an LSTM with output dimension = lstm_output and return_sequence = True. The output will still be a squence so it will be a 2D tensor of shape (n_words, lstm_output).So you have n_words vectors of length lstm_output.

现在你应用一个 TimeDistributed 密集层,比如 3 维输出作为密集的参数.所以时间分布(密集(3)).这会将 Dense(3) n_words 次应用到序列中每个大小为 lstm_output 的向量上……它们都将成为长度为 3 的向量.您的输出仍将是一个序列,因此现在是形状为 的 2D 张量(n_words, 3).

Now you apply a TimeDistributed dense layer with say 3 dimensions output as parameter of the Dense. So TimeDistributed(Dense(3)).This will apply Dense(3) n_words times, to every vectors of size lstm_output in your sequence independently... they will all become vectors of length 3. Your output will still be a sequence so a 2D tensor, of shape now (n_words, 3).

是不是更清楚了?:-)

Is it clearer? :-)

这篇关于如何在 Keras 中使用 return_sequences 选项和 TimeDistributed 层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:26