我正在尝试使用Pytorch通过2D向量(嘈杂的语音数据的帧序列)的回归来预测1D向量(干净的语音数据的帧)-之前已经做过。帧序列为帧提供时间上下文,以更准确地预测干净帧。可以将矢量视为类似于2D灰度图像和1D灰度图像。

当批大小为64,窗口长度为5,帧长度为257时,输入张量具有[64、1、5、257]的形状,目标张量具有[64、1、1、257]的形状。

TensorFlow中有一些完成此操作的示例,但是使用Pytorch找不到任何示例。到目前为止,这是我最好的尝试来复制本文(https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1465.PDF)。

def __init__(self, window_length, frame_length, batch_size):
        super(Net, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
            # nn.BatchNorm2d(12),
            nn.ReLU())
        self.layer2 = nn.Sequential(
            nn.Conv2d(12, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
            # nn.BatchNorm2d(16),
            nn.ReLU())
        self.layer3 = nn.Sequential(
            nn.Conv2d(16, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
            # nn.BatchNorm2d(20),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(20, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
            # nn.BatchNorm2d(24),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(24, 32, kernel_size=(1,7), stride=1, padding=(0,3)),
            # nn.BatchNorm2d(32),
            nn.ReLU())
        self.layer6 = nn.Sequential(
            nn.Conv2d(32, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
            # nn.BatchNorm2d(24),
            nn.ReLU())
        self.layer7 = nn.Sequential(
            nn.Conv2d(24, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
            # nn.BatchNorm2d(20),
            nn.ReLU())
        self.layer8 = nn.Sequential(
            nn.Conv2d(20, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
            # nn.BatchNorm2d(16),
            nn.ReLU())
        self.layer9 = nn.Sequential(
            nn.Conv2d(16, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
            # nn.BatchNorm2d(12),
            nn.ReLU())
        self.conv_out = nn.Sequential(
            nn.Conv2d(12, 1, kernel_size=(1,1), stride=1, padding=(0,0)),
            )
        self.fc1 = nn.Linear(batch_size * window_length * frame_length, frame_length)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = self.layer6(out)
        out = self.layer7(out)
        out = self.layer8(out)
        out = self.layer9(out)
        out = self.conv_out(out)
        out = self.fc1(out)
        return out


在此网络上调用.forward()会导致以下错误消息:

RuntimeError: size mismatch, m1: [320 x 257], m2: [82240 x 257]

如何将输出层减少到每个样本1x257以匹配目标(长度为257的单个帧)?

最佳答案

这是您的代码的有效示例。

class Net(nn.Module):
  def __init__(self, window_length, frame_length, batch_size):
      super(Net, self).__init__()
      self.layer1 = nn.Sequential(
          nn.Conv2d(1, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
          # nn.BatchNorm2d(12),
          nn.ReLU())
      self.layer2 = nn.Sequential(
          nn.Conv2d(12, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
          # nn.BatchNorm2d(16),
          nn.ReLU())
      self.layer3 = nn.Sequential(
          nn.Conv2d(16, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
          # nn.BatchNorm2d(20),
          nn.ReLU())
      self.layer4 = nn.Sequential(
          nn.Conv2d(20, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
          # nn.BatchNorm2d(24),
          nn.ReLU())
      self.layer5 = nn.Sequential(
          nn.Conv2d(24, 32, kernel_size=(1,7), stride=1, padding=(0,3)),
          # nn.BatchNorm2d(32),
          nn.ReLU())
      self.layer6 = nn.Sequential(
          nn.Conv2d(32, 24, kernel_size=(1,7), stride=1, padding=(0,3)),
          # nn.BatchNorm2d(24),
          nn.ReLU())
      self.layer7 = nn.Sequential(
          nn.Conv2d(24, 20, kernel_size=(1,9), stride=1, padding=(0,4)),
          # nn.BatchNorm2d(20),
          nn.ReLU())
      self.layer8 = nn.Sequential(
          nn.Conv2d(20, 16, kernel_size=(1,11), stride=1, padding=(0,5)),
          # nn.BatchNorm2d(16),
          nn.ReLU())
      self.layer9 = nn.Sequential(
          nn.Conv2d(16, 12, kernel_size=(1,13), stride=1, padding=(0,6)),
          # nn.BatchNorm2d(12),
          nn.ReLU())
      self.conv_out = nn.Sequential(
          nn.Conv2d(12, 1, kernel_size=(1,1), stride=1, padding=(0,0)),
          )
      self.fc1 = nn.Linear(window_length * frame_length, frame_length)

  def forward(self, x):

      out = self.layer1(x)
      out = self.layer2(out)
      out = self.layer3(out)
      out = self.layer4(out)
      out = self.layer5(out)
      out = self.layer6(out)
      out = self.layer7(out)
      out = self.layer8(out)
      out = self.layer9(out)
      out = self.conv_out(out)
      out = out.view(-1, 5*257)
      out = self.fc1(out)
      return out


我所做的更改:


更改了“线性”层的定义。在Pytorch中,您不会在模型公式中明确使用批量大小,您的模型公式应能够处理任何批量大小值。此外,基于batch_size定义线性层是错误的。线性层仅定义输入的真实形状。
使用视图在卷积层后重塑输出


我引用了以下张量流代码RCED进行了这些更改。尽管我确实发现它与论文描述有所不同,论文描述说它使用完全卷积层。但是我走了,他们几乎都是相似的。

关于python - Pytorch:如何从2D矢量/图像预测1D矢量?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59569971/

10-11 23:39