

我正在开发一个应用程序,其中我使用SIFT + RANSAC和Homography来查找对象(OpenCV C ++,Java)。我面对的问题是,有很多离群值RANSAC表现不佳。

I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.


For this reasons I would like to try what the author of SIFT said to be pretty good: voting.


I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:

  • 位置[x,y](有人说Traslation)

  • 缩放

  • $ b
  • Location [x, y] (someone says Traslation)
  • Scale
  • Orientation

虽然使用opencv很容易得到匹配 scale code>与:

While with opencv is easy to get the match scale and orientation with:



I am having hard time to understand how I can calculate the location.


I have found an interesting slide where with only one match we are able to draw a bounding box:


But I don't get how I could draw that bounding box with just one match. Any help?


您正在寻找适合从图像1到图像2的几何变换的最大的匹配特征集合。在这种情况下,是相似变换,它有4个参数:translation (dx,dy),缩放变化 ds 和旋转 d_theta

You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.

假设你已经匹配到特征:图像1的f1和图像的f2 2.让(x1,y1)是图像1中f1的位置,让它 s1 theta1 是它的方向。同样,你有(x2,y2) s2 theta2 for f2。

Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.


The translation between two features is (dx,dy) = (x2-x1, y2-y1).

两个特征之间的比例变化是 ds = s2 / s1

The scale change between two features is ds = s2 / s1.

两个特征之间的旋转 d_theta = theta2-theta1

因此, dx dy ds code> d_theta 是您的Hough空间的尺寸。每个bin对应于一个相似变换。

So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.


Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.


07-22 16:31