问题描述
我正在调整图像 init_input
以及该图像中要执行的目标对象 g
的边界框对象本地化。我目前正在调整边界框的大小,方法是确定将向上采样的像素数,并将其中一半添加到每个边界框坐标。
I am resizing an image init_input
along with a bounding box of a target object g
within that image to perform object localisation. I am currently resizing the bounding box by determining how many pixels will be up-sampled and adding half of them to each of the bounding box coordinates.
我的代码如下:
def next_state(init_input, b_prime, g):
"""
Returns the observable region of the next state.
Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.
:param init_input:
The initial input volume of the current episode.
:param b_prime:
The subsequent state's bounding box. RED
:param g:
The ground truth box of the target object. YELLOW
"""
# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)
# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224))
# Difference between crop region and image dimensions
x1_diff = x1
y1_diff = y1
x2_diff = IMG_SIZE - x2
y2_diff = IMG_SIZE - y2
# Resize ground truth box
g[0] = int(g[0] - 0.5 * x1_diff) # x1
g[1] = int(g[1] - 0.5 * y1_diff) # y1
g[2] = int(g[2] + 0.5 * x2_diff) # x2
g[3] = int(g[3] + 0.5 * y2_diff) # y2
return observable_region, g
我遇到的问题是这种方法不准确。如下例所示,边界框仍处于关闭状态。我的想法是,这是由于插值在图像大小调整中的作用方式(因此在所拍摄的像素之间存在偏差,而不是 0.5
)。任何关于如何解决这个问题的建议将非常感激。
The problem I am having is that this method is not accurate. As evident in the example below, the bounding box is still off. My idea is that this is due to the way interpolation works within the resizing of the image (therefore there is a skew between the pixels taken which is not 0.5
). Any advice on how to fix this would be much appreciated.
推荐答案
基本上,这是一个好主意在两个维度上均匀地缩放,以保持圆形和方形形状不受挤压。首先,你必须找到规模。你可以通过找到最大尺寸的边界框并添加32(两边16像素)来做到这一点,所以:
Basically, it's a good idea to scale equally in both dimensions, to preserve the round and square shapes from squashing. So first, you have to find the scale. You do that by finding the largest size of your bounding box and adding 32 (16 pixels on both sides), so:
longest = max( x_size, y_size) + 32
scale = 224.0 / longest
然后你会发现通过计算边界框的中心并在所有方向上添加最长
的一半来计算您的角落:
Then you find your corners by calculating the center of the bounding box and adding half of the longest
in all directions:
center_x = (x1 + x2) / 2
center_y = (y1 + y2) / 2
org_x1 = center_x - longest/2
org_x2 = center_x + longest/2
org_y1 = center_y - longest/2
org_y2 = center_y + longest/2
然后将带坐标(org_x1,org_y1,org_x2,org_y2)的矩形重新缩放为(224,224)矩形,边界框的角将为 16.0 * scale
从图像角落偏移。
Then you rescale the rectangle with coordinates (org_x1, org_y1, org_x2, org_y2) into (224,224) rectangle and the corners of your bounding box will be 16.0 * scale
offsets from the image corners.
好的,据我所知,你调整大小 init_input [y1:y2,x1:x2]
进入(224,224)
并且想知道,其中g圆形真理区域将会是。好吧,最初的地面实况矩形距离角落是16像素,所以你必须找到这些新的偏移,你就完成了。
ok, as far as I can see, you resize init_input[y1:y2, x1:x2]
into (224,224)
and wonder, where ground truth region is going to be. Well, originally ground truth rectangle was 16 pixels from the corners, so you have to find these new offsets and you're done.
x_offset = 16.0 * 224.0 / (x2-x1)
y_offset = 16.0 * 224.0 / (y2-y1)
然后地面实况矩形将在(x_offset,y_offset)和(( - 224 - x_offset),(224 - y_offset))下方左上方。
then the ground truth rectangle will have left top at (x_offset, y_offset) and bottom right at ((224 - x_offset), (224 - y_offset))
你可以忽略我在divider上面写的其余代码,它是假设你保留了x / y比率,你不是=)
You may ignore the rest of my code written above divider, it was written in assumption that you're preserving the x/y ratio, which you are not =)
这是第三次尝试弄清楚你在做什么......如果你缩放 init_input [y1:y2,x1:x2]
进入(224,224)
,转换后任意随机点(x,y)的坐标可以计算为:
Here's the third attempt to figure out what you're doing... if you scale init_input[y1:y2, x1:x2]
into (224,224)
, the coordinates of any random point (x,y) after the transformation can be calculated as:
x_new = (x - x1) * 224.0 / (x2 - x1)
y_new = (y - y1) * 224.0 / (y2 - y1)
min / m可能是个好主意根据图像大小设置新值,这样就不会掉落图像边框:
It might be a good idea to min/max new values against image size, so you don't fall off the image border:
x_new = max( 0, min( 224, x_new))
y_new = max( 0, min( 224, y_new))
这篇关于Python OpenCV调整大小(插值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!