我一直在实现 Viola-Jones' face detection algorithm 的改编。该技术依赖于在图像中放置一个 24x24 像素的子帧,然后将矩形特征放置在每个可能大小的位置的每个位置。
这些特征可以由两个、三个或四个矩形组成。提供了以下示例。
他们声称详尽的集合超过 180k(第 2 节):
以下陈述未在论文中明确说明,因此它们是我的假设:
基于这些假设,我计算了详尽的集合:
const int frameSize = 24;
const int features = 5;
// All five feature types:
const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
int count = 0;
// Each feature:
for (int i = 0; i < features; i++) {
int sizeX = feature[i][0];
int sizeY = feature[i][1];
// Each position:
for (int x = 0; x <= frameSize-sizeX; x++) {
for (int y = 0; y <= frameSize-sizeY; y++) {
// Each size fitting within the frameSize:
for (int width = sizeX; width <= frameSize-x; width+=sizeX) {
for (int height = sizeY; height <= frameSize-y; height+=sizeY) {
count++;
}
}
}
}
}
结果是 162,336 。
我发现接近 Viola & Jones 所说的“超过 180,000”的唯一方法是放弃假设 #4 并在代码中引入错误。这涉及将四行分别更改为:
for (int width = 0; width < frameSize-x; width+=sizeX)
for (int height = 0; height < frameSize-y; height+=sizeY)
结果是 180,625 。 (请注意,这将有效地防止功能接触子框架的右侧和/或底部。)
现在当然是问题:他们在实现过程中犯了错误吗?考虑表面为零的特征是否有意义?还是我看错了?
最佳答案
仔细一看,你的代码在我看来是正确的;这不禁让人怀疑原作者是否有一个逐一的错误。我想有人应该看看 OpenCV 如何实现它!
尽管如此,一个更容易理解的建议是通过首先遍历所有大小,然后遍历给定大小的可能位置来翻转 for 循环的顺序:
#include <stdio.h>
int main()
{
int i, x, y, sizeX, sizeY, width, height, count, c;
/* All five shape types */
const int features = 5;
const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
const int frameSize = 24;
count = 0;
/* Each shape */
for (i = 0; i < features; i++) {
sizeX = feature[i][0];
sizeY = feature[i][1];
printf("%dx%d shapes:\n", sizeX, sizeY);
/* each size (multiples of basic shapes) */
for (width = sizeX; width <= frameSize; width+=sizeX) {
for (height = sizeY; height <= frameSize; height+=sizeY) {
printf("\tsize: %dx%d => ", width, height);
c=count;
/* each possible position given size */
for (x = 0; x <= frameSize-width; x++) {
for (y = 0; y <= frameSize-height; y++) {
count++;
}
}
printf("count: %d\n", count-c);
}
}
}
printf("%d\n", count);
return 0;
}
结果与之前的
162336
相同为了验证它,我测试了 4x4 窗口的情况并手动检查了所有情况(易于计数,因为 1x2/2x1 和 1x3/3x1 形状相同,仅旋转 90 度):
2x1 shapes:
size: 2x1 => count: 12
size: 2x2 => count: 9
size: 2x3 => count: 6
size: 2x4 => count: 3
size: 4x1 => count: 4
size: 4x2 => count: 3
size: 4x3 => count: 2
size: 4x4 => count: 1
1x2 shapes:
size: 1x2 => count: 12 +-----------------------+
size: 1x4 => count: 4 | | | | |
size: 2x2 => count: 9 | | | | |
size: 2x4 => count: 3 +-----+-----+-----+-----+
size: 3x2 => count: 6 | | | | |
size: 3x4 => count: 2 | | | | |
size: 4x2 => count: 3 +-----+-----+-----+-----+
size: 4x4 => count: 1 | | | | |
3x1 shapes: | | | | |
size: 3x1 => count: 8 +-----+-----+-----+-----+
size: 3x2 => count: 6 | | | | |
size: 3x3 => count: 4 | | | | |
size: 3x4 => count: 2 +-----------------------+
1x3 shapes:
size: 1x3 => count: 8 Total Count = 136
size: 2x3 => count: 6
size: 3x3 => count: 4
size: 4x3 => count: 2
2x2 shapes:
size: 2x2 => count: 9
size: 2x4 => count: 3
size: 4x2 => count: 3
size: 4x4 => count: 1
关于algorithm - Viola-Jones 的人脸检测声称拥有 18 万个功能,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1707620/