问题描述
我正在遵循 tensorflow 对象检测 api 说明,并尝试使用我自己的具有 50 个类的数据集来训练现有的对象检测模型(faster_rcnn_resnet101_coco").
I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.
所以根据我自己的数据集,我创建了
So according to my own dataset, I created
- TFRecord(分别用于训练、评估和测试)
- labelmap.pbtxt
接下来,我只为 model-faster_rcnn-num_classes(90 -> 50(我的类的数量)编辑了 model.config自己的数据集)、train_config-batch_size(1 -> 10)、train_config-num_steps(200000 -> 100)、train_input_reader-tf_record_input_reader-input_path(到TFRecord所在的路径)和train_input_reader-label_map_path(到labelmap.pbtxt所在的路径)).
Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).
最后,我运行命令
python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"
我遇到了以下错误:
InvalidArgumentError(回溯见上文):ConcatOp:维度输入应该匹配:shape[0] = [1,890,600,3] vs. shape[1] =[1,766,600,3] [[节点:concat_1 = ConcatV2[N=10, T=DT_FLOAT,Tidx=DT_INT32,_device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub,Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub,Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub,concat_1/axis)]]
似乎是输入图像的尺寸,所以可能是由于未调整原始图像数据的大小造成的.
It seems like the dimension of input images so it may be caused by not resizing the raw image data.
但据我所知,模型会自动调整输入图像的大小以进行训练(不是吗?)
But As I know, model automatically resizes the input image to train (isn't it?)
然后我就被这个问题困住了.
Then I'm stuck with this issue.
如果有解决方案,我会很感激你的回答.谢谢.
If there is solution, I'll appreciate it for your answer.Thanks.
更新
当我将batch_size字段从10更新为1(原始)
时,它似乎训练没有任何问题
...但我没有明白为什么...
推荐答案
TaeWoo 说的对,你要设置batch_size
为 1 以训练 Faster RCNN.
TaeWoo is right, you have to set batch_size
to 1 in order to train Faster RCNN.
这是因为FRCNN使用了一个keep_aspect_ratio_resizer
,这反过来意味着如果你有不同尺寸的图像,它们在预处理后也会有不同的尺寸.这实际上使批处理变得不可能,因为批处理张量具有 [num_batch, height, width, channels]
形状.当 (height, width)
与下一个示例不同时,您可以看到这是一个问题.
This is because FRCNN uses a keep_aspect_ratio_resizer
, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]
. You can see this is a problem when (height, width)
differ from one example to the next.
这与 SSD 模型形成对比,后者使用正常"缩放器,即无论输入图像如何,所有预处理示例最终都将具有相同的大小,这允许它们一起分批.
This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.
现在,如果您有不同尺寸的图像,您实际上有两种使用批处理的方法:
Now, if you have images of different sizes, you practically have two ways of using batching:
- 使用 Faster RCNN 并在之前填充您的图像,在训练前一次,或作为预处理步骤连续填充.我建议前者,因为这种类型的预处理似乎会减慢学习速度
- 使用 SSD,但要确保您的对象不会受到失真的太大影响.这应该不是什么大问题,它可以作为数据增强的一种方式.
这篇关于带有对象检测 API 的 Tensorflow ConcatOp 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!