问题描述
我确实从概念上理解了LSTM或GRU应该是什么(由于这个问题)但是当我检查GRU h_n
和output
的输出时,它们之间是不一样的应该是...
I do understand conceptually what an LSTM or GRU should (thanks to this question What's the difference between "hidden" and "output" in PyTorch LSTM?) BUT when I inspect the output of the GRU h_n
and output
are NOT the same while they should be...
(Pdb) rnn_output
tensor([[[ 0.2663, 0.3429, -0.0415, ..., 0.1275, 0.0719, 0.1011],
[-0.1272, 0.3096, -0.0403, ..., 0.0589, -0.0556, -0.3039],
[ 0.1064, 0.2810, -0.1858, ..., 0.3308, 0.1150, -0.3348],
...,
[-0.0929, 0.2826, -0.0554, ..., 0.0176, -0.1552, -0.0427],
[-0.0849, 0.3395, -0.0477, ..., 0.0172, -0.1429, 0.0153],
[-0.0212, 0.1257, -0.2670, ..., -0.0432, 0.2122, -0.1797]]],
grad_fn=<StackBackward>)
(Pdb) hidden
tensor([[[ 0.1700, 0.2388, -0.4159, ..., -0.1949, 0.0692, -0.0630],
[ 0.1304, 0.0426, -0.2874, ..., 0.0882, 0.1394, -0.1899],
[-0.0071, 0.1512, -0.1558, ..., -0.1578, 0.1990, -0.2468],
...,
[ 0.0856, 0.0962, -0.0985, ..., 0.0081, 0.0906, -0.1234],
[ 0.1773, 0.2808, -0.0300, ..., -0.0415, -0.0650, -0.0010],
[ 0.2207, 0.3573, -0.2493, ..., -0.2371, 0.1349, -0.2982]],
[[ 0.2663, 0.3429, -0.0415, ..., 0.1275, 0.0719, 0.1011],
[-0.1272, 0.3096, -0.0403, ..., 0.0589, -0.0556, -0.3039],
[ 0.1064, 0.2810, -0.1858, ..., 0.3308, 0.1150, -0.3348],
...,
[-0.0929, 0.2826, -0.0554, ..., 0.0176, -0.1552, -0.0427],
[-0.0849, 0.3395, -0.0477, ..., 0.0172, -0.1429, 0.0153],
[-0.0212, 0.1257, -0.2670, ..., -0.0432, 0.2122, -0.1797]]],
grad_fn=<StackBackward>)
它们彼此互为换位...为什么?
they are some transpose of each other...why?
推荐答案
它们不是完全相同.考虑我们具有以下单向 GRU模型:
They are not really the same. Consider that we have the following Unidirectional GRU model:
import torch.nn as nn
import torch
gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True)
请确保您仔细观察输入形状.
Please make sure you observe the input shape carefully.
inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)
肯定是
torch.equal(out, hn)
False
帮助我了解输出与隐藏状态的最有效方法之一是将hn
视为hn.view(num_layers, num_directions, batch, hidden_size)
,其中num_directions = 2
用于双向递归网络(以及另一种方式,即,我们的案例).因此,
One of the most efficient way that helped me to understand the output vs. hidden states was to view the hn
as hn.view(num_layers, num_directions, batch, hidden_size)
where num_directions = 2
for bidirectional recurrent networks (and 1 other wise, i.e., our case). Thus,
hn_conceptual_view = hn.view(3, 1, 1024, 50)
文档说明为(请注意斜体和粗体):
在我们的例子中,它包含时间步长t = 112
的隐藏矢量,其中:
In our case, this contains the hidden vector for the timestep t = 112
, where the:
因此,一个人可以做:
torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
True
说明:我将out[:, -1]
中所有批次的最后一个序列与hn[-1, 0, :, :]
Explanation: I compare the last sequence from all batches in out[:, -1]
to the last layer hidden vectors from hn[-1, 0, :, :]
对于双向 GRU(需要先阅读单向):
For Bidirectional GRU (requires reading the unidirectional first):
gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True bidirectional = True)
inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)
视图更改为(因为我们有两个方向):
View is changed to (since we have two directions):
hn_conceptual_view = hn.view(3, 2, 1024, 50)
如果您尝试使用确切的代码:
If you try the exact code:
torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
False
说明:这是因为我们甚至在比较错误的形状;
Explanation: This is because we are even comparing wrong shapes;
out[:, 0].shape
torch.Size([1024, 100])
hn_conceptual_view[-1, 0, :, :].shape
torch.Size([1024, 50])
请记住,对于双向网络,隐藏状态会在每个时间步级联,其中第一个hidden_state
大小(即out[:, 0,
:50
]
)是前向网络以及其他hidden_state
大小用于后向(即out[:, 0,
50:
]
).这样,转发网络的正确比较是:
Remember that for bidirectional networks, hidden states get concatenated at each time step where the first hidden_state
size (i.e., out[:, 0,
:50
]
) are the the hidden states for the forward network, and the other hidden_state
size are for the backward (i.e., out[:, 0,
50:
]
). The correct comparison for the forward network is then:
torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 0, :, :])
True
如果您想要后向网络的隐藏状态,并且因为后向网络处理了时间步n ... 1
中的序列.您比较序列的第一时间步长,但最后一个hidden_state
大小,并将hn_conceptual_view
方向更改为1
:
If you want the hidden states for the backward network, and since a backward network processes the sequence from time step n ... 1
. You compare the first timestep of the sequence but the last hidden_state
size and changing the hn_conceptual_view
direction to 1
:
torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 1, :, :])
True
简而言之,一般来说:
In a nutshell, generally speaking:
单向:
rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 1, B, H)
其中RECURRENT_MODULE
是GRU或LSTM(在撰写本文时),B
是批处理大小,S
序列长度和E
嵌入大小.
Where RECURRENT_MODULE
is either GRU or LSTM (at the time of writing this post), B
is the batch size, S
sequence length, and E
embedding size.
torch.equal(output[:, S, :], hn_conceptual_view[-1, 0, :, :])
True
同样,我们使用S
,因为rnn_module
是向前的(即单向),并且最后一个时间步长以序列长度S
存储.
Again we used S
since the rnn_module
is forward (i.e., unidirectional) and the last timestep is stored at the sequence length S
.
双向:
rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True, bidirectional = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 2, B, H)
比较
torch.equal(output[:, S, :H], hn_conceptual_view[-1, 0, :, :])
True
以上是前向网络比较,我们使用:H
是因为前向网络在每个时间步长将其隐藏矢量存储在前H
个元素中.
Above is the forward network comparison, we used :H
because the forward stores its hidden vector in the first H
elements for each timestep.
对于后向网络:
torch.equal(output[:, 0, H:], hn_conceptual_view[-1, 1, :, :])
True
我们将hn_conceptual_view
的方向更改为1
,以获得用于反向网络的隐藏矢量.
We changed the direction in hn_conceptual_view
to 1
to get hidden vectors for the backward network.
对于所有示例,我们都使用hn_conceptual_view[-1, ...]
,因为我们只对最后一层感兴趣.
For all examples we used hn_conceptual_view[-1, ...]
because we are only interested in the last layer.
这篇关于对于Pytorch中的GRU单元,隐藏和输出是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!