本文介绍了具有密集连接层的 Dropout的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的一个项目中使用了密集网络模型,但在使用正则化时遇到了一些困难.

Iam using a densenet model for one of my projects and have some difficulties using regularization.

没有任何正则化,验证和训练损失 (MSE) 都会减少.然而,训练损失下降得更快,导致最终模型的一些过度拟合.

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

所以我决定使用dropout来避免过拟合.使用 Dropout 时,验证和训练损失在第一个 epoch 期间都减少到大约 0.13,并在大约 10 个 epoch 内保持不变.

So I decided to use dropout to avoid overfitting. When using Dropout, both validation and training loss decrease to about 0.13 during the first epoch and remain constant for about 10 epochs.

此后,两个损失函数都以与没有 dropout 相同的方式下降,导致再次过拟合.最终的损失值与没有 dropout 的情况大致相同.

After that both loss functions decrease in the same way as without dropout, resulting in overfitting again. The final loss value is in about the same range as without dropout.

所以对我来说,辍学似乎并没有真正起作用.

So for me it seems like dropout is not really working.

如果我切换到 L2 正则化,我可以避免过度拟合,但我更愿意使用 Dropout 作为正则化器.

If I switch to L2 regularization though, Iam able to avoid overfitting, but I would rather use Dropout as a regularizer.

现在我想知道是否有人经历过这种行为?

Now Iam wondering if anyone has experienced that kind of behaviour?

我在密集块(瓶颈层)和过渡块(丢失率 = 0.5)中都使用了 dropout:

I use dropout in both the dense block (bottleneck layer) and in the transition block (dropout rate = 0.5):

def bottleneck_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=4 * self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[3,3], layer_name=scope+'_conv2')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        return x

def transition_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)
        x = Average_pooling(x, pool_size=[2,2], stride=2)

        return x

推荐答案

不是过拟合.

当你的验证损失开始增加,而你的训练损失继续减少时,过度拟合就开始了;这是它的标志性签名:

Overfitting starts when your validation loss starts increasing, while your training loss continues decreasing; here is its telltale signature:

该图像改编自 维基百科关于过度拟合的条目 - 不同的事情可能在于水平轴,例如提升树的深度或数量、神经网络拟合迭代次数等.

The image is adapted from the Wikipedia entry on overfitting - diferent things may lie in the horizontal axis, e.g. depth or number of boosted trees, number of neural net fitting iterations etc.

训练和验证损失之间的(通常预期的)差异是完全不同的,称为 泛化差距:

The (generally expected) difference between training and validation loss is something completely different, called the generalization gap:

理解泛化的一个重要概念是泛化差距,即模型在训练数据上的表现与其在来自同一分布的未见数据上的表现之间的差异.

实际上,验证数据实际上是看不见的数据.

where, practically speaking, validation data is unseen data indeed.

所以对我来说,辍学似乎并没有真正起作用.

很可能是这种情况 - 辍学是不是期望 始终有效并解决所有问题.

It can very well be the case - dropout is not expected to work always and for every problem.

这篇关于具有密集连接层的 Dropout的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-21 15:26