对于Pytorch中的GRU单元，隐藏和输出是否相同?

本文介绍了对于Pytorch中的GRU单元，隐藏和输出是否相同?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我确实从概念上理解了LSTM或GRU应该是什么(由于这个问题)但是当我检查GRU h_n和output的输出时，它们之间是不一样的应该是...

I do understand conceptually what an LSTM or GRU should (thanks to this question What's the difference between "hidden" and "output" in PyTorch LSTM?) BUT when I inspect the output of the GRU h_n and output are NOT the same while they should be...

(Pdb) rnn_output
tensor([[[ 0.2663,  0.3429, -0.0415,  ...,  0.1275,  0.0719,  0.1011],
         [-0.1272,  0.3096, -0.0403,  ...,  0.0589, -0.0556, -0.3039],
         [ 0.1064,  0.2810, -0.1858,  ...,  0.3308,  0.1150, -0.3348],
         ...,
         [-0.0929,  0.2826, -0.0554,  ...,  0.0176, -0.1552, -0.0427],
         [-0.0849,  0.3395, -0.0477,  ...,  0.0172, -0.1429,  0.0153],
         [-0.0212,  0.1257, -0.2670,  ..., -0.0432,  0.2122, -0.1797]]],
       grad_fn=<StackBackward>)
(Pdb) hidden
tensor([[[ 0.1700,  0.2388, -0.4159,  ..., -0.1949,  0.0692, -0.0630],
         [ 0.1304,  0.0426, -0.2874,  ...,  0.0882,  0.1394, -0.1899],
         [-0.0071,  0.1512, -0.1558,  ..., -0.1578,  0.1990, -0.2468],
         ...,
         [ 0.0856,  0.0962, -0.0985,  ...,  0.0081,  0.0906, -0.1234],
         [ 0.1773,  0.2808, -0.0300,  ..., -0.0415, -0.0650, -0.0010],
         [ 0.2207,  0.3573, -0.2493,  ..., -0.2371,  0.1349, -0.2982]],

        [[ 0.2663,  0.3429, -0.0415,  ...,  0.1275,  0.0719,  0.1011],
         [-0.1272,  0.3096, -0.0403,  ...,  0.0589, -0.0556, -0.3039],
         [ 0.1064,  0.2810, -0.1858,  ...,  0.3308,  0.1150, -0.3348],
         ...,
         [-0.0929,  0.2826, -0.0554,  ...,  0.0176, -0.1552, -0.0427],
         [-0.0849,  0.3395, -0.0477,  ...,  0.0172, -0.1429,  0.0153],
         [-0.0212,  0.1257, -0.2670,  ..., -0.0432,  0.2122, -0.1797]]],
       grad_fn=<StackBackward>)

它们彼此互为换位...为什么?

they are some transpose of each other...why?

推荐答案

它们不是完全相同.考虑我们具有以下单向 GRU模型:

They are not really the same. Consider that we have the following Unidirectional GRU model:

import torch.nn as nn
import torch

gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True)

请确保您仔细观察输入形状.

Please make sure you observe the input shape carefully.

inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)

肯定是

torch.equal(out, hn)
False

帮助我了解输出与隐藏状态的最有效方法之一是将hn视为hn.view(num_layers, num_directions, batch, hidden_size) ，其中num_directions = 2用于双向递归网络(以及另一种方式，即，我们的案例).因此，

One of the most efficient way that helped me to understand the output vs. hidden states was to view the hn as hn.view(num_layers, num_directions, batch, hidden_size) where num_directions = 2 for bidirectional recurrent networks (and 1 other wise, i.e., our case). Thus,

hn_conceptual_view = hn.view(3, 1, 1024, 50)

文档说明为(请注意斜体和粗体):

在我们的例子中，它包含时间步长t = 112的隐藏矢量，其中:

In our case, this contains the hidden vector for the timestep t = 112, where the:

因此，一个人可以做:

torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
True

说明:我将out[:, -1]中所有批次的最后一个序列与hn[-1, 0, :, :]

Explanation: I compare the last sequence from all batches in out[:, -1] to the last layer hidden vectors from hn[-1, 0, :, :]

对于双向 GRU(需要先阅读单向):

For Bidirectional GRU (requires reading the unidirectional first):

gru = nn.GRU(input_size = 8, hidden_size = 50, num_layers = 3, batch_first = True bidirectional = True)
inp = torch.randn(1024, 112, 8)
out, hn = gru(inp)

视图更改为(因为我们有两个方向):

View is changed to (since we have two directions):

hn_conceptual_view = hn.view(3, 2, 1024, 50)

如果您尝试使用确切的代码:

If you try the exact code:

torch.equal(out[:, -1], hn_conceptual_view[-1, 0, :, :])
False

说明:这是因为我们甚至在比较错误的形状；

Explanation: This is because we are even comparing wrong shapes;

out[:, 0].shape
torch.Size([1024, 100])
hn_conceptual_view[-1, 0, :, :].shape
torch.Size([1024, 50])

请记住，对于双向网络，隐藏状态会在每个时间步级联，其中第一个hidden_state大小(即out[:, 0, :50 ])是前向网络以及其他hidden_state大小用于后向(即out[:, 0, 50: ]).这样，转发网络的正确比较是:

Remember that for bidirectional networks, hidden states get concatenated at each time step where the first hidden_state size (i.e., out[:, 0,:50]) are the the hidden states for the forward network, and the other hidden_state size are for the backward (i.e., out[:, 0,50:]). The correct comparison for the forward network is then:

torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 0, :, :])
True

如果您想要后向网络的隐藏状态，并且因为后向网络处理了时间步n ... 1 中的序列.您比较序列的第一时间步长，但最后一个hidden_state大小，并将hn_conceptual_view方向更改为1:

If you want the hidden states for the backward network, and since a backward network processes the sequence from time step n ... 1. You compare the first timestep of the sequence but the last hidden_state size and changing the hn_conceptual_view direction to 1:

torch.equal(out[:, -1, :50], hn_conceptual_view[-1, 1, :, :])
True

简而言之，一般来说:

In a nutshell, generally speaking:

单向:

rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 1, B, H)

其中RECURRENT_MODULE是GRU或LSTM(在撰写本文时)，B是批处理大小，S序列长度和E嵌入大小.

Where RECURRENT_MODULE is either GRU or LSTM (at the time of writing this post), B is the batch size, S sequence length, and E embedding size.

torch.equal(output[:, S, :], hn_conceptual_view[-1, 0, :, :])
True

同样，我们使用S，因为rnn_module是向前的(即单向)，并且最后一个时间步长以序列长度S存储.

Again we used S since the rnn_module is forward (i.e., unidirectional) and the last timestep is stored at the sequence length S.

双向:

rnn_module = nn.RECURRENT_MODULE(num_layers = X, hidden_state = H, batch_first = True, bidirectional = True)
inp = torch.rand(B, S, E)
output, hn = rnn_module(inp)
hn_conceptual_view = hn.view(X, 2, B, H)

比较

torch.equal(output[:, S, :H], hn_conceptual_view[-1, 0, :, :])
True

以上是前向网络比较，我们使用:H是因为前向网络在每个时间步长将其隐藏矢量存储在前H个元素中.

Above is the forward network comparison, we used :H because the forward stores its hidden vector in the first H elements for each timestep.

对于后向网络:

torch.equal(output[:, 0, H:], hn_conceptual_view[-1, 1, :, :])
True

我们将hn_conceptual_view的方向更改为1，以获得用于反向网络的隐藏矢量.

We changed the direction in hn_conceptual_view to 1 to get hidden vectors for the backward network.

对于所有示例，我们都使用hn_conceptual_view[-1, ...]，因为我们只对最后一层感兴趣.

For all examples we used hn_conceptual_view[-1, ...] because we are only interested in the last layer.

这篇关于对于Pytorch中的GRU单元，隐藏和输出是否相同?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！