python - 如何为在ImageDataGenerator中生成的样本分配标签

我是卷积神经网络的新手，我将构建我的第一个ConvNet，它是一个多类图像分类ConvNet。

型号说明

假设我有两个图像文件夹，一个包含数千种特定类型的叶子（叶子A）（图像集X）的图像，另一个文件夹包含相同数量的相似类型的叶子（叶子B）（图像集Y））。因此，我需要训练我的模型以区分这两种类型。

问题背景

由于我有两类输出叶子A和叶子B，对于给定的叶子A类或叶子B类图像，我有0,1作为输出或有1,0作为输出。

                            Leaves A | Leaves B
If Input is a Class A Leaf,     1         0
If Input is a Class B Leaf,     0         1

问题

因此，为了做到这一点，我必须将图像集X标记为输出1,0，将图像集Y标记为输出0,1。另外，由于我需要扩充图像以具有更多训练样本，因此我使用了ImageDataGenerator。

training_imGen.flow_from_directory(
                                'path/to/image_folder_X',
                                target_size=(1100,180),
                                batch_size=batchSize,
                                color_mode='rgb',
                                class_mode='categorical'
                                )

但是在这里我不能分配标签。不像我使用training_imGen.flow时。但是我发现classes参数可以在flow_from_directory下调用，

classes: optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric).

但是我不知道如何在其中指定两个类标签，因为我只给出了image set X文件夹的路径。任何想法如何做到这一点？

更新资料

training_imGen.flow_from_directory(
                                '/Users/user/database/',
                                target_size=(1100,180),
                                batch_size=batchSize,
                                color_mode='rgb',
                                class_mode='categorical',
                                classes=['Set_A', 'Set_B']
                                )

在/Users/user/database/路径下，有两个文件夹分别称为Set_A和Set_B。如前所述，每个文件夹都包含相关的png图像文件。

最佳答案

查看DirectoryIterator的实现方式。这是一个非常简单的课程。

ImageDataGenerator#flow_from_directory只是DirectoryIterator对象构造的包装。您不必手动指定标签，因为DirectoryIterator将自动假定每个样本都与以该样本的父文件夹命名的类相关联。
因此，只要叶子A的所有样本都在名为A的同一文件夹内，叶子B的样本在不同的文件夹内，它们将被正确分配给各自的类。

此外，正如您将class_mode定义为categorical一样，迭代器的输出已经是一键编码的：

g = ImageDataGenerator()
train = g.flow_from_directory('/path/to/dataset/train/',
                              batch_size=32,
                              target_size=(1100, 180))

x_batch, y_batch = next(train)
assert x_batch.shape == (32, 1100, 180, 3)
assert y_batch.shape == (32, 2)
print(y_batch)
[[0. 1.],
 [1. 0.],
 [1. 0.],
 ...
 [0. 1.]]

classes参数不用于设置每个样本的标签，而是用于指定directory子文件夹的列表，这些子文件夹应被该迭代器视为类（例如['A', 'B']）。如果保留默认的None，则directory的所有子文件夹均被视为有效类，并且其中的所有图像均是该集合的潜在样本。当您只想使用标签的子集，调试代码或延迟类时，这很有用。

如果要覆盖默认标签，则只需替换DirectoryIterator#classes中的内容，该内容在ith元素中包含与ith样本关联的类。例如，假设您要添加没有关联文件夹的第三类叶子：

train = g.flow_from_directory(...)
train.classes = np.asarray([0., 1., 2., ..., 0., 1.])
train.class_indices = {'A': 0, 'B': 1, 'C': 2}
train.num_classes = 3

关于python - 如何为在ImageDataGenerator中生成的样本分配标签，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/50105143/