问题描述
我正在使用 unet 进行图像语义分割工作,如果我像这样为最后一层设置 Softmax Activation
:
I am doing the image semantic segmentation job with unet, if I set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...
然后使用 loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
即使只有一张训练图像,训练也不会收敛.
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
The training will not converge even for only one training image.
但是如果我没有像这样为最后一层设置Softmax Activation
:
But if I do not set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...
然后使用 loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
训练将收敛一张训练图像.
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
The training will converge for one training image.
我的groundtruth数据集是这样生成的:
My groundtruth dataset is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
为什么?我的用法有问题吗?
Why? Is there something wrong for my usage?
这是我的git实验代码:https://github.com/honeytidy/unet您可以结帐并运行(可以在 cpu 上运行).您可以更改 CategoricalCrossentropy 的 Activation 层和 from_logits,看看我说了什么.
This is my experiment code of git: https://github.com/honeytidy/unetYou can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.
推荐答案
将softmax"激活推入交叉熵损失层可显着简化损失计算并使其在数值上更加稳定.
可能的情况是,在您的示例中,数值问题足以使 from_logits=False
选项的训练过程无效.
Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False
option.
您可以在这篇文章.这个推导说明了在将 softmax 与交叉熵损失相结合时可以避免的数值问题.
You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.
这篇关于from_logits=True 和 from_logits=False 为 UNet 的 tf.losses.CategoricalCrossentropy 获得不同的训练结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!