我正在尝试在 pytorch 中训练 LSTM 层.我正在使用 4 个 GPU.初始化时,我添加了 .cuda() 函数将隐藏层移动到 GPU.但是当我使用多个 GPU 运行代码时,我收到此运行时错误:
I am trying to train a LSTM layer in pytorch. I am using 4 GPUs. When initializing, I added the .cuda() function move the hidden layer to GPU. But when I run the code with multiple GPUs I am getting this runtime error :
RuntimeError: Input and hidden tensors are not at the same device
我试图通过在前向函数中使用 .cuda() 函数来解决这个问题,如下所示:
I have tried to solve the problem by using .cuda() function in the forward function like below :
self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda())
这条线似乎解决了问题,但它引起了我的担忧,如果在不同的 GPU 中看到更新的隐藏层.我应该在批处理的前向函数结束时将向量移回 cpu 还是有其他方法可以解决问题.
This line seems to solve the problem, but it raises my concern that if the updated hidden layer is seen in different GPUs. Should I move the vector back to cpu at the end of the forward function for a batch or is there any other way to solve the problem.
当你在张量上调用 .cuda()
时,Pytorch 将它移动到 当前 GPU 设备(GPU-0).因此,由于数据并行性,您的数据位于不同的 GPU 中,而您的模型则位于另一个 GPU 中,这会导致您面临运行时错误.
When you call .cuda()
on the tensor, Pytorch moves it to the current GPU device by default (GPU-0). So, due to data parallelism, your data lives in a different GPU while your model goes to another, this results in the runtime error you are facing.
The correct way to implement data parallelism for recurrent neural networks is as follows:
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
class MyModule(nn.Module):
# ... __init__, other methods, etc.
# padded_input is of shape [B x T x *] (batch_first mode) and contains
# the sequences sorted by lengths
# B is the batch size
# T is max sequence length
def forward(self, padded_input, input_lengths):
total_length = padded_input.size(1) # get the max sequence length
packed_input = pack_padded_sequence(padded_input, input_lengths,
packed_output, _ = self.my_lstm(packed_input)
output, _ = pad_packed_sequence(packed_output, batch_first=True,
return output
m = MyModule().cuda()
dp_m = nn.DataParallel(m)
You also need to set the CUDA_VISIBLE_DEVICES
environment variable accordingly for a multi GPU setup.
这篇关于使用多个 GPU 运行 LSTM 会得到“输入和隐藏张量不在同一设备上";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!