问题描述
我正在使用迁移学习为 Stanford Cars 数据集构建 ResNet-18
分类模型.我想实现标签平滑,以惩罚过度自信的预测并提高概括性.
I'm building a ResNet-18
classification model for the Stanford Cars dataset using transfer learning. I would like to implement label smoothing to penalize overconfident predictions and improve generalization.
TensorFlow
在 CrossEntropyLoss
.有没有人为我可以即插即用的 PyTorch
构建类似的功能?
TensorFlow
has a simple keyword argument in CrossEntropyLoss
. Has anyone built a similar function for PyTorch
that I could plug-and-play with?
推荐答案
通过使用 标签上的硬目标的>加权平均值和均匀分布.以这种方式对标签进行平滑处理可以防止网络变得过于自信,并且在许多最新模型(包括图像分类,语言翻译和语音识别)中都使用了标签平滑处理.
The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation, and speech recognition.
标签平滑已在交叉熵损失函数的 Tensorflow
中实现. BinaryCrossentropy , CategoricalCrossentropy .但是目前,在 PyTorch
中还没有官方的 Label Smoothing 实现.但是,对此进行了积极的讨论,希望它将提供一个官方软件包.这是该讨论线程:问题#7455 .
Label Smoothing is already implemented in Tensorflow
within the cross-entropy loss functions. BinaryCrossentropy, CategoricalCrossentropy. But currently, there is no official implementation of Label Smoothing in PyTorch
. However, there is going an active discussion on it and hopefully, it will be provided with an official package. Here is that discussion thread: Issue #7455.
在这里,我们将为您提供 PyTorch
专业人员的标签平滑(LS)的一些最佳实现.基本上,有许多方法可以实现 LS .请参考对此的具体讨论,一个是这里,以及另一个在这里.在这里,我们将以 2 独特的方式带来实现,每种实现都有两个版本.因此总计 4 .
Here We will bring some available best implementation of Label Smoothing (LS) from PyTorch
practitioner. Basically, there are many ways to implement the LS. Please refer to this specific discussion on this, one is here, and another here. Here we will bring implementation in 2 unique ways with two versions of each; so total 4.
通过这种方式,它接受 one-hot
目标向量.用户必须手动平滑其目标向量.可以在中使用torch.no_grad()
范围完成此操作,因为它将所有 requires_grad
标志暂时设置为false.
In this way, it accepts the one-hot
target vector. The user must manually smooth their target vector. And it can be done within with torch.no_grad()
scope, as it temporarily sets all of the requires_grad
flags to false.
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss
class LabelSmoothingLoss(nn.Module):
def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):
"""if smoothing == 0, it's one-hot method
if 0 < smoothing < 1, it's smooth method
"""
super(LabelSmoothingLoss, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
self.weight = weight
self.cls = classes
self.dim = dim
def forward(self, pred, target):
assert 0 <= self.smoothing < 1
pred = pred.log_softmax(dim=self.dim)
if self.weight is not None:
pred = pred * self.weight.unsqueeze(0)
with torch.no_grad():
true_dist = torch.zeros_like(pred)
true_dist.fill_(self.smoothing / (self.cls - 1))
true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
此外,我们在 self上添加了一个断言复选标记.平滑
,并为此实现增加了损失加权支持.
Additionally, we've added an assertion checkmark on self. smoothing
and added loss weighting support on this implementation.
Shital已在此处发布了答案.这里我们要指出这个实现类似于 Devin Yang的上述实现.但是,在这里我们提到的是他的代码,它尽量减少了 code语法
.
Shital already posted the answer here. Here we're pointing out that this implementation is similar to Devin Yang's above implementation. However, here we're mentioning his code with minimizing a bit of code syntax
.
class SmoothCrossEntropyLoss(_WeightedLoss):
def __init__(self, weight=None, reduction='mean', smoothing=0.0):
super().__init__(weight=weight, reduction=reduction)
self.smoothing = smoothing
self.weight = weight
self.reduction = reduction
def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):
with torch.no_grad():
targets = torch.empty(size=(targets.size(0), n_classes),
device=targets.device) \
.fill_(smoothing /(n_classes-1)) \
.scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
return targets
def reduce_loss(self, loss):
return loss.mean() if self.reduction == 'mean' else loss.sum() \
if self.reduction == 'sum' else loss
def forward(self, inputs, targets):
assert 0 <= self.smoothing < 1
targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
log_preds = F.log_softmax(inputs, -1)
if self.weight is not None:
log_preds = log_preds * self.weight.unsqueeze(0)
return self.reduce_loss(-(targets * log_preds).sum(dim=-1))
检查
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss
if __name__=="__main__":
# 1. Devin Yang
crit = LabelSmoothingLoss(classes=5, smoothing=0.5)
predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
[0, 0.9, 0.2, 0.2, 1],
[1, 0.2, 0.7, 0.9, 1]])
v = crit(Variable(predict),
Variable(torch.LongTensor([2, 1, 0])))
print(v)
# 2. Shital Shah
crit = SmoothCrossEntropyLoss(smoothing=0.5)
predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
[0, 0.9, 0.2, 0.2, 1],
[1, 0.2, 0.7, 0.9, 1]])
v = crit(Variable(predict),
Variable(torch.LongTensor([2, 1, 0])))
print(v)
tensor(1.4178)
tensor(1.4178)
选项2:LabelSmoothingCrossEntropyLoss
通过这种方式,它可以接受目标向量,并且不会手动对目标向量进行平滑处理,而是内置模块负责标签平滑处理.它允许我们根据 F.nll_loss .
(a). Wangleiofficial :来源-(AFAIK),原始海报
(a). Wangleiofficial: Source - (AFAIK), Original Poster
(b). Datasaurus: Source - Added Weighting Support
此外,我们稍微减少了代码编写量,以使其更加简洁.
Further, we slightly minimize the coding write-up to make it more concise.
class LabelSmoothingLoss(torch.nn.Module):
def __init__(self, smoothing: float = 0.1,
reduction="mean", weight=None):
super(LabelSmoothingLoss, self).__init__()
self.smoothing = smoothing
self.reduction = reduction
self.weight = weight
def reduce_loss(self, loss):
return loss.mean() if self.reduction == 'mean' else loss.sum() \
if self.reduction == 'sum' else loss
def linear_combination(self, x, y):
return self.smoothing * x + (1 - self.smoothing) * y
def forward(self, preds, target):
assert 0 <= self.smoothing < 1
if self.weight is not None:
self.weight = self.weight.to(preds.device)
n = preds.size(-1)
log_preds = F.log_softmax(preds, dim=-1)
loss = self.reduce_loss(-log_preds.sum(dim=-1))
nll = F.nll_loss(
log_preds, target, reduction=self.reduction, weight=self.weight
)
return self.linear_combination(loss / n, nll)
class LabelSmoothing(nn.Module):
"""NLL loss with label smoothing.
"""
def __init__(self, smoothing=0.0):
"""Constructor for the LabelSmoothing module.
:param smoothing: label smoothing factor
"""
super(LabelSmoothing, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
def forward(self, x, target):
logprobs = torch.nn.functional.log_softmax(x, dim=-1)
nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
nll_loss = nll_loss.squeeze(1)
smooth_loss = -logprobs.mean(dim=-1)
loss = self.confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
检查
if __name__=="__main__":
# Wangleiofficial
crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")
predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
[0, 0.9, 0.2, 0.2, 1],
[1, 0.2, 0.7, 0.9, 1]])
v = crit(Variable(predict),
Variable(torch.LongTensor([2, 1, 0])))
print(v)
# NVIDIA
crit = LabelSmoothing(smoothing=0.3)
predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
[0, 0.9, 0.2, 0.2, 1],
[1, 0.2, 0.7, 0.9, 1]])
v = crit(Variable(predict),
Variable(torch.LongTensor([2, 1, 0])))
print(v)
tensor(1.3883)
tensor(1.3883)
这篇关于PyTorch中的标签平滑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!