问题描述
如何构造卷积自动编码器的解码器部分?假设我有这个
How one construct decoder part of convolutional autoencoder? Suppose I have this
(input -> conv2d -> maxpool2d -> maxunpool2d -> convTranspose2d -> output)
:
# CIFAR images shape = 3 x 32 x 32
class ConvDAE(nn.Module):
def __init__(self):
super().__init__()
# input: batch x 3 x 32 x 32 -> output: batch x 16 x 16 x 16
self.encoder = nn.Sequential(
nn.Conv2d(3, 16, 3, stride=1, padding=1), # batch x 16 x 32 x 32
nn.ReLU(),
nn.BatchNorm2d(16),
nn.MaxPool2d(2, stride=2) # batch x 16 x 16 x 16
)
# input: batch x 16 x 16 x 16 -> output: batch x 3 x 32 x 32
self.decoder = nn.Sequential(
# this line does not work
# nn.MaxUnpool2d(2, stride=2, padding=0), # batch x 16 x 32 x 32
nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1, output_padding=1), # batch x 16 x 32 x 32
nn.ReLU(),
nn.BatchNorm2d(16),
nn.ConvTranspose2d(16, 3, 3, stride=1, padding=1, output_padding=0), # batch x 3 x 32 x 32
nn.ReLU()
)
def forward(self, x):
print(x.size())
out = self.encoder(x)
print(out.size())
out = self.decoder(out)
print(out.size())
return out
Pytorch特定问题::为什么我不能在解码器部分使用MaxUnpool2d.这给了我以下错误:
Pytorch specific question: why can't I use MaxUnpool2d in decoder part. This gives me the following error:
TypeError: forward() missing 1 required positional argument: 'indices'
以及概念性问题:我们不应该在解码器中做与在编码器中所做的相反的事情吗?我看到了一些实现,似乎它们只关心解码器的输入和输出的尺寸. 此处和此处是一些示例.
And the conceptual question: Shouldn't we do in decoder inverse of whatever we did in encoder? I saw some implementations and it seems they only care about the dimensions of input and output of decoder. Here and here are some examples.
推荐答案
对于问题的火炬手部分,unpool模块具有将从池模块返回的索引作为必需的位置自变量,这些索引将由return_indices=True
返回.所以你可以做
For the torch part of the question, unpool modules have as a required positional argument the indices returned from the pooling modules which will be returned with return_indices=True
. So you could do
class ConvDAE(nn.Module):
def __init__(self):
super().__init__()
# input: batch x 3 x 32 x 32 -> output: batch x 16 x 16 x 16
self.encoder = nn.Sequential(
nn.Conv2d(3, 16, 3, stride=1, padding=1), # batch x 16 x 32 x 32
nn.ReLU(),
nn.BatchNorm2d(16),
nn.MaxPool2d(2, stride=2, return_indices=True)
)
self.unpool = nn.MaxUnpool2d(2, stride=2, padding=0)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.ConvTranspose2d(16, 3, 3, stride=1, padding=1, output_padding=0),
nn.ReLU()
)
def forward(self, x):
print(x.size())
out, indices = self.encoder(x)
out = self.unpool(out, indices)
out = self.decoder(out)
print(out.size())
return out
对于问题的一般部分,我认为现有技术不是使用对称解码器部分,因为已经证明,devonvolution/转置卷积会产生棋盘效应,并且许多方法都倾向于使用上采样模块反而.您可以通过PyTorch渠道更快地找到更多信息.
As for the general part of the question, I don't think state of the art is to use a symmetric decoder part, as it has been shown that devonvolution/transposed convolution produces checkerboard effects and many approaches tend to use upsampling modules instead. You will find more info faster through PyTorch channels.
这篇关于Pytorch卷积自动编码器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!