问题描述
由于大量图像,我遇到了内存错误,当我直接从数据帧中的给定路径加载所有图像时会发生这种情况.
I have got a memory error due to a huge amount of images, that happens when I directly load all the images from their given paths in a dataframe.
dataframe(df_train_data
) 的训练集格式如下:
dataframe(df_train_data
)'s format for training set is like below:
class_id ID uu vv
Abnormal 1001 1001_05.png 1001_06.png
Abnormal 1002 1002_05.png 1002_06.png
Abnormal 1003 1003_05.png 1003_06.png
Normal 1554 1554_05.png 1554_06.png
Normal 1555 1555_05.png 1555_06.png
Normal 1556 1556_05.png 1556_06.png
...
注意Normal
类实例在所有Abnormal
类实例之后,它们都是这样排序的.
Note that Normal
class instances come after all Abnormal
class instances, they are all ordered in that way.
我正在阅读以下格式的图像及其 ID:
I am reading the images and their IDs in the following form:
X_uu_train = read_imgs(df_train_data.uu.values, img_height, img_width, channels)
X_vv_train = read_imgs(df_train_data.vv.values, img_height, img_width, channels)
train_labels = df_train_data.ID.values
其中 read_imgs
返回 numpy
数组中的所有图像.
where read_imgs
returns all of the images in numpy
array.
Memory
错误发生在 X_uu_train = read_imgs(df_train_data.uu.values, img_height, img_width, channels)
.
我见过一些建议使用 ImageDataGenerator
批量加载图像的解决方案.但是,我没有从大多数网站上显示的目录加载图像.原来有一种加载方式来自类似于 .flow_from_dataframe
.
I have seen some solutions where it is recommended to use ImageDataGenerator
to load images as batches. However, I am not loading images from a directory as shown on most sites. Turns out that there is a way to load images from data frames that goes like .flow_from_dataframe
.
这是训练阶段:
hist = base_model.fit([X_uu_train, X_vv_train], train_labels,
batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=([X_uu_val, X_vv_val], val_labels), shuffle=True)
preds = base_model.predict([X_uu_val, X_vv_val])
问题是它只用一个输入来完成,但我的生成器应该为双输入带来图像批次.
The thing is it does it only with a single input, but my generator should bring image batches for dual input.
有人可以帮我构建一个 ImageDataGenerator
以便我可以加载图像而不会遇到 MemoryError
Could someone help me construct an ImageDataGenerator
so that I can hopefully load images without running into MemoryError
当从 uu
和 vv
列加载时,图像应该以随机顺序输入到网络中.
While loading from uu
and vv
columns, images should be input into the network with their corresponding pairs in a shuffled order.
附言如有必要,我可能会提供更多信息
P.S. I may provide more info if necessary
谢谢.
<BatchDataset shapes: (((None, 224, 224, 3), (None, 224, 224, 3)), (None,)), types: ((tf.float32, tf.float32), tf.int32)>
EDIT-2:
AttributeError Traceback (most recent call last)
<ipython-input-18-4ae4c12b2b76> in <module>
43
44 base_model = combined_net()
---> 45 hist = base_model.fit(ds_train, epochs=epochs, verbose=1, validation_data=ds_val, shuffle=True)
46
47 preds = base_model.predict(ds_val)
~Anaconda3libsite-packageskerasengine raining.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
1152 sample_weight=sample_weight,
1153 class_weight=class_weight,
-> 1154 batch_size=batch_size)
1155
1156 # Prepare validation data.
~Anaconda3libsite-packageskerasengine raining.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
577 feed_input_shapes,
578 check_batch_axis=False, # Don't enforce the batch size.
--> 579 exception_prefix='input')
580
581 if y is not None:
~Anaconda3libsite-packageskerasengine raining_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
97 data = data.values if data.__class__.__name__ == 'DataFrame' else data
98 data = [data]
---> 99 data = [standardize_single_array(x) for x in data]
100
101 if len(data) != len(names):
~Anaconda3libsite-packageskerasengine raining_utils.py in <listcomp>(.0)
97 data = data.values if data.__class__.__name__ == 'DataFrame' else data
98 data = [data]
---> 99 data = [standardize_single_array(x) for x in data]
100
101 if len(data) != len(names):
~Anaconda3libsite-packageskerasengine raining_utils.py in standardize_single_array(x)
32 'Got tensor with shape: %s' % str(shape))
33 return x
---> 34 elif x.ndim == 1:
35 x = np.expand_dims(x, 1)
36 return x
AttributeError: 'BatchDataset' object has no attribute 'ndim'
推荐答案
ImageDataGenerator
创建一个 tf.data.Dataset
对象,因此您可以直接使用它以获得更大的灵活性.您可以传递文件名列表,它只会以迭代方式加载它们.
ImageDataGenerator
creates a tf.data.Dataset
object, so you can use that directly for more flexibility. You can pass a list of filenames and it will only load them iteratively.
import pandas as pd
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
df = pd.read_clipboard()
x = df.uu
y = df.vv
z = df.class_id
def load(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, size=(100, 100))
return img
ds = tf.data.Dataset.from_tensor_slices((x, y, z)).
map(lambda xx, yy, zz: (load(xx), load(yy), zz)).
batch(4)
next(iter(ds))
这是一个完整的示例,从文件列表开始(当您有数据框时很容易),一直到模型训练.
Here's a complete example starting from a list of files (it's easy when you have a data frame), all the way to model training.
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import numpy as np
import cv2
from skimage import data
import tensorflow as tf
coffee = data.coffee()
cat = data.chelsea()
for image, name in zip([coffee, cat], ['coffee', 'cat']):
for i in range(5):
cv2.imwrite(f'{name}_{i}.png', image)
cat_files = list(filter(lambda x: x.startswith('cat'), os.listdir()))
coffee_files = list(filter(lambda x: x.startswith('coffee'), os.listdir()))
def load(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, size=(100, 100))
return img
def label(string):
return tf.cast(tf.equal(string, 'abnormal'), tf.int32)
x = cat_files
y = coffee_files
z = np.random.choice(['normal', 'abnormal'], 5)
inputs = tf.data.Dataset.from_tensor_slices((x, y)).map(lambda x, y: (load(x), load(y)))
labels = tf.data.Dataset.from_tensor_slices(z).map(lambda x: label(x))
ds = tf.data.Dataset.zip((inputs, labels)).batch(4)
next(iter(ds))
inputs1 = tf.keras.layers.Input(shape=(100, 100, 3), name='input1')
inputs2 = tf.keras.layers.Input(shape=(100, 100, 3), name='input2')
xx = tf.keras.layers.Flatten()(inputs1)
yy = tf.keras.layers.Flatten()(inputs2)
x = tf.keras.layers.Concatenate()([xx, yy])
x = tf.keras.layers.Dense(32, activation='relu')(x)
output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[inputs1, inputs2], outputs=output)
model.compile(loss='binary_crossentropy', optimizer='adam')
history = model.fit(ds)
Train for 2 steps
1/2 [==============>...............] - ETA: 0s - loss: 0.7527
2/2 [==============================] - 1s 251ms/step - loss: 5.5188
那么你也可以预测:
model.predict(ds)
array([[4.7391814e-26],
[4.7391814e-26],
[4.7391814e-26],
[4.7391814e-26],
[4.7390730e-26]], dtype=float32)
这篇关于如何使用数据生成器通过数据帧列中的路径加载图像以进行双重输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!