keras中多标签图像的一种热编码 | keras中多标签图像的一种热编码

本文介绍了keras中多标签图像的一种热编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用PASCAL VOC 2012数据集进行图像分类.一些图像具有多个标签，其中一些图像具有单个标签，如下所示.

I am using PASCAL VOC 2012 dataset for image classification. A few images have multiple labels where as a few of them have single labels as shown below.

    0  2007_000027.jpg               {'person'}
    1  2007_000032.jpg  {'aeroplane', 'person'}
    2  2007_000033.jpg            {'aeroplane'}
    3  2007_000039.jpg            {'tvmonitor'}
    4  2007_000042.jpg                {'train'}

我想对这些标签进行一次热编码以训练模型.但是，我不能使用keras.utils.to_categorical，因为这些标签不是整数，而pandas.get_dummies没有给我预期的结果.get_dummies给出了以下不同的类别，即，将标签的每个唯一组合作为一个类别.

I want to do one-hot encoding of these labels to train the model. However, I couldn't use keras.utils.to_categorical, as these labels are not integers and pandas.get_dummies is not giving me the results as expected. get_dummies is giving different categories as below, i.e. it is taking each unique combination of labels as one category.

 {'aeroplane', 'bus', 'car'}  {'aeroplane', 'bus'}  {'tvmonitor', 'sofa'}  {'tvmonitor'} ...

对这些标签进行一次热编码的最佳方法是什么，因为我们没有为每个图像指定特定数量的标签.

What is the best way to one-hot encode these labels as we don't have specific number of labels for each image.

推荐答案

如果第二栏中可能有 set ，请使用 MultiLabelBinarizer :

If there are sets in second column is possible use MultiLabelBinarizer:

print (df)
                 a                        b
0  2007_000027.jpg               {'person'}
1  2007_000032.jpg  {'aeroplane', 'person'}
2  2007_000033.jpg            {'aeroplane'}
3  2007_000039.jpg            {'tvmonitor'}
4  2007_000042.jpg                {'train'}

从sklearn.preprocessing导入

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['b']),columns=mlb.classes_)
print (df)
   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

或 系列.str.join 与 Series.str.get_dummies ，但在大型DataFrame中，它应该更慢:

Or Series.str.join with Series.str.get_dummies, but it should be slowier in large DataFrame:

df = df['b'].str.join('|').str.get_dummies()
print (df)

   aeroplane  person  train  tvmonitor
0          0       1      0          0
1          1       1      0          0
2          1       0      0          0
3          0       0      0          1
4          0       0      1          0

这篇关于keras中多标签图像的一种热编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！