如何在Tensorflow Object Detection API中查找边界框坐标

本文介绍了如何在Tensorflow Object Detection API中查找边界框坐标的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Tensorflow对象检测API代码。我训练了模型，并获得了很高的检测率。我一直在尝试获取边界框的坐标，但它一直在打印出100个奇异数组的列表。

I'm using Tensorflow object detection API code. I trained my model and got great detection percentages. I have been trying to get the bounding boxes coordinates but it keeps on printing out a list of 100 bizarre arrays.

在网上进行了广泛搜索之后，我发现了数组中的数字是什么意思（边界框的坐标相对于宽度和高度的浮点数为[0.0，1.0]底层图像。）但是，我的数组与在线示例中显示的数组有很大不同。另一个怪异的事情是，我用不到100张图像测试了我的模块，因此，如何甚至有100个边界框坐标的数据。

after a wide search online I found out what the numbers in the arrays meant (The bounding box coordinates are floats in [0.0, 1.0] relative to the width and height of the underlying image.) But still, my arrays are very different than the ones shown in examples online. Another weird thing is that I tested my module with a lot less than 100 images so how can there even be data of 100 bounding boxes coordinate.

我得到的数组；

 [[3.13721418e-01 4.65148419e-01 7.11575747e-01 6.85783863e-01]
 [9.78936195e-01 6.50490820e-03 9.97096300e-01 1.82596639e-01]
 [9.51383412e-01 0.00000000e+00 1.00000000e+00 3.88432704e-02]
 [9.85813320e-01 8.96016136e-02 9.97273505e-01 3.15960884e-01]
 [9.88873005e-01 2.13812709e-01 1.00000000e+00 4.14675951e-01]

 ......
 [4.42647263e-02 9.90755498e-01 2.57772505e-01 1.00000000e+00]
 [2.69711018e-05 5.21758199e-02 6.37509704e-01 6.62899792e-01]
 [0.00000000e+00 3.00989419e-01 9.92376506e-02 1.00000000e+00]
 [1.87531322e-01 2.66501214e-04 4.50700432e-01 1.23927500e-02]
 [9.36755657e-01 4.61095899e-01 9.92406607e-01 7.62619019e-01]]

执行检测并获取边界框坐标的函数。

The function that does the detection and gets the bounding boxes coordinates. output_dict['detection_boxes'] is where the array above is held.

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[1], image.shape[2])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: image})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.int64)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

我希望输出结果是边界框的常规x，y坐标。

I expect the output to be regular x,y coordinates of the bounding boxes.

推荐答案

output_dict ['detection_boxes'] 中的值确实采用规范化格式。通过检查您提供的数组中的值，这些值都在0到1之间，因此它们是合理的。

The values in output_dict['detection_boxes'] are indeed in normalized format. By checking the values in the array you provided, those values are all between 0 and 1, therefore they are reasonable.

有100个框，因为模型始终输出相同的值边界框的数量。（它等于配置文件中的 max_total_detections ）。但是并非所有这些都总是有意义的，您需要根据置信度分数过滤掉一些框，置信度分数存储在 output_dict ['scores'] 中。

There are 100 boxes because the model always output the same number of bounding boxes. (It is equal to max_total_detections in the config file ). But not all of them are always meaningful, you need to filter some boxes out according to the confidence score, which is stored in output_dict['scores'].

获取常规边界框。您可以执行以下操作：

To get regular bounding boxes. you can do as following:

boxes = np.squeeze(output_dict['detection_boxes'])
scores = np.squeeze(output_dict['detection_scores'])
#set a min thresh score, say 0.8
min_score_thresh = 0.8
bboxes = boxes[scores > min_score_thresh]

#get image size
im_width, im_height = image.size
final_box = []
for box in bboxes:
    ymin, xmin, ymax, xmax = box
    final_box.append([xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height])

这篇关于如何在Tensorflow Object Detection API中查找边界框坐标的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！