本文介绍了带有对象检测 API 的 Tensorflow ConcatOp 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在遵循 tensorflow 对象检测 api 说明,并尝试使用我自己的具有 50 个类的数据集来训练现有的对象检测模型(faster_rcnn_resnet101_coco").

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.

所以根据我自己的数据集,我创建了

So according to my own dataset, I created

  1. TFRecord(分别用于训练、评估和测试)
  2. labelmap.pbtxt

接下来,我只为 model-faster_rcnn-num_classes(90 -> 50(我的类的数量)编辑了 model.config自己的数据集)、train_config-batch_size(1 -> 10)、train_config-num_steps(200000 -> 100)、train_input_reader-tf_record_input_reader-input_path(到TFRecord所在的路径)和train_input_reader-label_map_path(到labelmap.pbtxt所在的路径)).

Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).

最后,我运行命令

python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"

我遇到了以下错误:

InvalidArgumentError(回溯见上文):ConcatOp:维度输入应该匹配:shape[0] = [1,890,600,3] vs. shape[1] =[1,766,600,3] [[节点:concat_1 = ConcatV2[N=10, T=DT_FLOAT,Tidx=DT_INT32,_device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub,Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub,Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub,concat_1/axis)]]

似乎是输入图像的尺寸,所以可能是由于未调整原始图像数据的大小造成的.

It seems like the dimension of input images so it may be caused by not resizing the raw image data.

但据我所知,模型会自动调整输入图像的大小以进行训练(不是吗?)

But As I know, model automatically resizes the input image to train (isn't it?)

然后我就被这个问题困住了.

Then I'm stuck with this issue.

如果有解决方案,我会很感激你的回答.谢谢.

If there is solution, I'll appreciate it for your answer.Thanks.

更新

当我将batch_size字段从10更新为1(原始)时,它似乎训练没有任何问题...但我没有明白为什么...

推荐答案

TaeWoo 说的对,你要设置batch_size 为 1 以训练 Faster RCNN.

TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.

这是因为FRCNN使用了一个keep_aspect_ratio_resizer,这反过来意味着如果你有不同尺寸的图像,它们在预处理后也会有不同的尺寸.这实际上使批处理变得不可能,因为批处理张量具有 [num_batch, height, width, channels] 形状.当 (height, width) 与下一个示例不同时,您可以看到这是一个问题.

This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.

这与 SSD 模型形成对比,后者使用正常"缩放器,即无论输入图像如何,所有预处理示例最终都将具有相同的大小,这允许它们一起分批.

This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.

现在,如果您有不同尺寸的图像,您实际上有两种使用批处理的方法:

Now, if you have images of different sizes, you practically have two ways of using batching:

  • 使用 Faster RCNN 并在之前填充您的图像,在训练前一次,或作为预处理步骤连续填充.我建议前者,因为这种类型的预处理似乎会减慢学习速度
  • 使用 SSD,但要确保您的对象不会受到失真的太大影响.这应该不是什么大问题,它可以作为数据增强的一种方式.

这篇关于带有对象检测 API 的 Tensorflow ConcatOp 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 16:34
查看更多