pytorch cnn 模型停在 loss.backward() 没有任何提示?

本文介绍了pytorch cnn 模型停在 loss.backward() 没有任何提示?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是做一个五类文本分类

My aim is to make a five-category text classification

我正在使用 cnnbase 模型运行 bert 微调，但我的项目在 loss.backward() 处停止，cmd 中没有任何提示.

I am running bert fine tuning with cnnbase model but my project stops at loss.backward() without any prompt in cmd.

我的程序在rnn base中成功运行，例如lstm和rcnn.

My program runs successfully in rnn base such as lstm and rcnn.

但是当我运行一些 cnnbase 模型时，会出现一个奇怪的错误.

But when I am running some cnnbase model a strange bug appears.

我的cnn模型代码:

import torch
import torch.nn as nn
import torch.nn.functional as F
# from ..Models.Conv import Conv1d
from transformers.modeling_bert import BertPreTrainedModel, BertModel
n_filters = 200
filter_sizes = [2,3,4]
class BertCNN(BertPreTrainedModel):
    def __init__(self, config):
        super(BertPreTrainedModel, self).__init__(config)
        self.num_filters = n_filters
        self.filter_sizes = filter_sizes
        self.bert = BertModel(config)
        for param in self.bert.parameters():
            param.requires_grad = True
        self.convs = nn.ModuleList(
            [nn.Conv2d(1, self.num_filters, (k, config.hidden_size))
                for k in self.filter_sizes])
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.fc_cnn = nn.Linear(self.num_filters *
                                len(self.filter_sizes), config.num_labels)

    def conv_and_pool(self, x, conv):
        x = F.relu(conv(x)).squeeze(3)
        x = F.max_pool1d(x, x.size(2)).squeeze(2)
        return x

    def forward(self, input_ids,
                attention_mask=None, token_type_ids=None, head_mask=None):
        outputs = self.bert(input_ids,
                            attention_mask=attention_mask,
                            token_type_ids=token_type_ids,
                            head_mask=head_mask)
        encoder_out, text_cls = outputs
        out = encoder_out.unsqueeze(1)
        out = torch.cat([self.conv_and_pool(out, conv)
                         for conv in self.convs], 1)
        out = self.dropout(out)
        out = self.fc_cnn(out)
        return out

我的火车代码:

        for step, batch in enumerate(data):
            self.model.train()
            batch = tuple(t.to(self.device) for t in batch)
            input_ids, input_mask, segment_ids, label_ids = batch
            print("input_ids, input_mask, segment_ids, label_ids SIZE: \n")
            print(input_ids.size(), input_mask.size(),segment_ids.size(), label_ids.size())
            # torch.Size([2, 80]) torch.Size([2, 80]) torch.Size([2, 80]) torch.Size([2])
            logits = self.model(input_ids, segment_ids, input_mask)
            print("logits and label ids size: ",logits.size(), label_ids.size())
            # torch.Size([2, 5]) torch.Size([2])
            loss = self.criterion(output=logits, target=label_ids)
            if len(self.n_gpu) >= 2:
                loss = loss.mean()
            if self.gradient_accumulation_steps > 1:
                loss = loss / self.gradient_accumulation_steps
            if self.fp16:
                with amp.scale_loss(loss, self.optimizer) as scaled_loss:
                    scaled_loss.backward()
                clip_grad_norm_(amp.master_params(self.optimizer), self.grad_clip)
            else:
                loss.backward() # I debug find that the program stop at this line without any error prompt

将批量大小更改为 1问题依旧存在

change the batchsize to 1the bug still occured

step1 登录:

logits 张量([[ 0.8831, -0.0368, -0.2206, -2.3484, -1.3595]], device='cuda:1',grad_fn=)

logits tensor([[ 0.8831, -0.0368, -0.2206, -2.3484, -1.3595]], device='cuda:1',grad_fn=)

step1 损失:

张量(1.5489, device='cuda:1', grad_fn=NllLossBackward>)

tensor(1.5489, device='cuda:1', grad_fn=NllLossBackward>)

但为什么不能 loss.backward()?

but why can't loss.backward()?

backward

pytorch cnn 模型停在 loss.backward() 没有任何提示?

问题描述

推荐答案