CategoricalCrossentropy

CategoricalCrossentropy

本文介绍了from_logits = True和from_logits = False对于UNet获得tf.losses.CategoricalCrossentropy的不同训练结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我为最后一层设置Softmax Activation,我正在用unet进行图像语义分割作业:

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

,然后使用loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)即使仅一张训练图像,训练也将不收敛.

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)The training will not converge even for only one training image.

但是,如果我没有为最后一层设置Softmax Activation,则这样:

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

,然后使用loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)训练将收敛以获得一张训练图像.

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)The training will converge for one training image.

我的groundtruth数据集是这样生成的:

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

为什么?我的用法有问题吗?

Why? Is there something wrong for my usage?

这是我的git实验代码: https://github.com/honeytidy/unet 您可以签出并运行(可以在cpu上运行).您可以更改激活层和CategoricalCrossentropy的from_logits并查看我说的话.

This is my experiment code of git: https://github.com/honeytidy/unetYou can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

推荐答案

将"softmax"激活推入交叉熵损失层可大大简化损失计算并使其在数值上更稳定.
在您的示例中,可能存在这样的情况:数字问题足够严重,以致于训练过程对于from_logits=False选项无效.

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

您可以在这篇文章.该推导说明了将softmax与交叉熵损失结合使用时避免的数值问题.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

这篇关于from_logits = True和from_logits = False对于UNet获得tf.losses.CategoricalCrossentropy的不同训练结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!