本文介绍了1个num_layers = 2的LSTM与pytorch中的2个LSTM之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是深度学习的新手,目前正在致力于使用LSTM进行语言建模.我在看pytorch文档时被它弄糊涂了.

I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.

如果我创建

nn.LSTM(input_size, hidden_size, num_layers)

其中hidden_​​size = 4且num_layers = 2,我想我将拥有类似的架构:

where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:

op0    op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0     x1 .....

如果我做类似的事情

nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)

我认为网络架构将与上面完全一样.我错了吗?如果是的话,这两者之间有什么区别?

I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?

推荐答案

多层LSTM被称为堆叠LSTM,其中多层LSTM相互堆叠.

The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.

您的理解是正确的.堆叠式LSTM的以下两个定义是相同的.

Your understanding is correct. The following two definitions of stacked LSTM are same.

nn.LSTM(input_size, hidden_size, 2)

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

这里,输入被馈送到LSTM的最低层,然后最低层的输出被转发到下一层,依此类推.请注意,最低的LSTM层的输出大小和其余LSTM层的输入大小为 hidden_​​size .

Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.

但是,您可能已经看到人们以以下方式定义了堆叠式LSTM:

However, you may have seen people defined stacked LSTM in the following way:

rnns = nn.ModuleList()
for i in range(nlayers):
    input_size = input_size if i == 0 else hidden_size
    rnns.append(nn.LSTM(input_size, hidden_size, 1))

人们有时使用上述方法的原因是,如果使用前两种方法创建堆叠的LSTM,则无法获取每个单独层的隐藏状态.查看PyTorch中返回的 LSTM 返回的内容

The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.

因此,如果要使中间层处于隐藏状态,则必须将每个单独的LSTM层声明为单个LSTM,并通过循环来模拟多层LSTM操作.例如:

So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:

outputs = []
for i in range(nlayers):
    if i != 0:
        sent_variable = F.dropout(sent_variable, p=0.2, training=True)
    output, hidden = rnns[i](sent_variable)
    outputs.append(output)
    sent_variable = output

最后,输出将包含每个LSTM层的所有隐藏状态.

In the end, outputs will contain all the hidden states of each individual LSTM layer.

这篇关于1个num_layers = 2的LSTM与pytorch中的2个LSTM之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:01