问题描述
我的目标是做一个五类文本分类
My aim is to make a five-category text classification
我正在使用 cnnbase
模型运行 bert 微调,但我的项目在 loss.backward()
处停止,cmd
中没有任何提示.
I am running bert fine tuning with cnnbase
model but my project stops at loss.backward()
without any prompt in cmd
.
我的程序在rnn base
中成功运行,例如lstm
和rcnn
.
My program runs successfully in rnn base
such as lstm
and rcnn
.
但是当我运行一些 cnnbase
模型时,会出现一个奇怪的错误.
But when I am running some cnnbase
model a strange bug appears.
我的cnn模型代码:
import torch
import torch.nn as nn
import torch.nn.functional as F
# from ..Models.Conv import Conv1d
from transformers.modeling_bert import BertPreTrainedModel, BertModel
n_filters = 200
filter_sizes = [2,3,4]
class BertCNN(BertPreTrainedModel):
def __init__(self, config):
super(BertPreTrainedModel, self).__init__(config)
self.num_filters = n_filters
self.filter_sizes = filter_sizes
self.bert = BertModel(config)
for param in self.bert.parameters():
param.requires_grad = True
self.convs = nn.ModuleList(
[nn.Conv2d(1, self.num_filters, (k, config.hidden_size))
for k in self.filter_sizes])
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.fc_cnn = nn.Linear(self.num_filters *
len(self.filter_sizes), config.num_labels)
def conv_and_pool(self, x, conv):
x = F.relu(conv(x)).squeeze(3)
x = F.max_pool1d(x, x.size(2)).squeeze(2)
return x
def forward(self, input_ids,
attention_mask=None, token_type_ids=None, head_mask=None):
outputs = self.bert(input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
head_mask=head_mask)
encoder_out, text_cls = outputs
out = encoder_out.unsqueeze(1)
out = torch.cat([self.conv_and_pool(out, conv)
for conv in self.convs], 1)
out = self.dropout(out)
out = self.fc_cnn(out)
return out
我的火车代码:
for step, batch in enumerate(data):
self.model.train()
batch = tuple(t.to(self.device) for t in batch)
input_ids, input_mask, segment_ids, label_ids = batch
print("input_ids, input_mask, segment_ids, label_ids SIZE: \n")
print(input_ids.size(), input_mask.size(),segment_ids.size(), label_ids.size())
# torch.Size([2, 80]) torch.Size([2, 80]) torch.Size([2, 80]) torch.Size([2])
logits = self.model(input_ids, segment_ids, input_mask)
print("logits and label ids size: ",logits.size(), label_ids.size())
# torch.Size([2, 5]) torch.Size([2])
loss = self.criterion(output=logits, target=label_ids)
if len(self.n_gpu) >= 2:
loss = loss.mean()
if self.gradient_accumulation_steps > 1:
loss = loss / self.gradient_accumulation_steps
if self.fp16:
with amp.scale_loss(loss, self.optimizer) as scaled_loss:
scaled_loss.backward()
clip_grad_norm_(amp.master_params(self.optimizer), self.grad_clip)
else:
loss.backward() # I debug find that the program stop at this line without any error prompt
将批量大小更改为 1问题依旧存在
change the batchsize to 1the bug still occured
step1 登录:
logits 张量([[ 0.8831, -0.0368, -0.2206, -2.3484, -1.3595]], device='cuda:1',grad_fn=)
logits tensor([[ 0.8831, -0.0368, -0.2206, -2.3484, -1.3595]], device='cuda:1',grad_fn=)
step1 损失:
张量(1.5489, device='cuda:1', grad_fn=NllLossBackward>)
tensor(1.5489, device='cuda:1', grad_fn=NllLossBackward>)
但为什么不能 loss.backward()?
but why can't loss.backward()?
推荐答案
我尝试在linux平台上运行我的程序,它运行成功.
I tried to run my program on linux platform, and it ran successfully.
因此,很有可能是操作系统不同造成的
Therefore, it is very likely that it is caused by different os
上一个操作系统:win 10
Previous os:win 10
这篇关于pytorch cnn 模型停在 loss.backward() 没有任何提示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!