python - 改组如何与ImageDataGenerator在机器学习中一起使用？

我正在使用Inception V3创建图像分类模型，并且有两个类。我将数据集和标签分为两个numpy数组，数据分别以trainX和testY作为图像，trainY和testY作为对应的标签。

data = np.array(data, dtype="float")/255.0
labels = np.array(labels,dtype ="uint8")

(trainX, testX, trainY, testY) = train_test_split(
                                data,labels,
                                test_size=0.2,
                                random_state=42)

train_datagen = keras.preprocessing.image.ImageDataGenerator(
          zoom_range = 0.1,
          width_shift_range = 0.2,
          height_shift_range = 0.2,
          horizontal_flip = True,
          fill_mode ='nearest')

val_datagen = keras.preprocessing.image.ImageDataGenerator()


train_generator = train_datagen.flow(
        trainX,
        trainY,
        batch_size=batch_size,
        shuffle=True)

validation_generator = val_datagen.flow(
                testX,
                testY,
                batch_size=batch_size)

当我使用ImageDataGenerator改组train_generator时，图像是否仍与相应的标签匹配？验证数据集也应该改组吗？

最佳答案

是的，图像仍将与相应的标签匹配，因此您可以安全地将shuffle设置为True。在引擎盖下，其工作方式如下。在.flow()上调用ImageDataGenerator将返回一个NumpyArrayIterator对象，该对象实现了以下用于改组索引的逻辑:

def _set_index_array(self):
    self.index_array = np.arange(self.n)
    if self.shuffle: # if shuffle==True, shuffle the indices
        self.index_array = np.random.permutation(self.n)

然后，使用self.index_array生成图像(x)和标签(y)(为了可读性而截断的代码):

def _get_batches_of_transformed_samples(self, index_array):
    batch_x = np.zeros(tuple([len(index_array)] + list(self.x.shape)[1:]),
                       dtype=self.dtype)
    # use index_array to get the x's
    for i, j in enumerate(index_array):
        x = self.x[j]
        ... # data augmentation is done here
        batch_x[i] = x
     ...
     # use the same index_array to fetch the labels
     output += (self.y[index_array],)

    return output

自己检查source code，可能比您想象的要容易理解。

改组验证数据应该没什么大不了的。改组的主要目的是在训练过程中引入一些额外的随机性。