我一直在实现 Viola-Jones' face detection algorithm 的改编。该技术依赖于在图像中放置一个 24x24 像素的子帧,然后将矩形特征放置在每个可能大小的位置的每个位置。

这些特征可以由两个、三个或四个矩形组成。提供了以下示例。

algorithm - Viola-Jones 的人脸检测声称拥有 18 万个功能-LMLPHP

他们声称详尽的集合超过 180k(第 2 节):



以下陈述未在论文中明确说明,因此它们是我的假设:

  • 只有2个二矩形特征,2个三矩形特征和1个四矩形特征。这背后的逻辑是我们正在观察突出显示的矩形之间的差异,而不是明确的颜色或亮度或任何类似的东西。
  • 我们不能将特征类型 A 定义为 1x1 像素块;它必须至少为 1x2 像素。此外,类型 D 必须至少为 2x2 像素,并且此规则相应地适用于其他特征。
  • 我们不能将特征类型A定义为1x3像素块,因为中间像素无法分割,从自身减去它就等于1x2像素块;此特征类型仅针对偶数宽度定义。此外,要素类型 C 的宽度必须能被 3 整除,此规则也适用于其他要素。
  • 我们无法定义宽度和/或高度为 0 的特征。因此,我们将 x 和 y 迭代为 24 减去特征的大小。

  • 基于这些假设,我计算了详尽的集合:
    const int frameSize = 24;
    const int features = 5;
    // All five feature types:
    const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
    
    int count = 0;
    // Each feature:
    for (int i = 0; i < features; i++) {
        int sizeX = feature[i][0];
        int sizeY = feature[i][1];
        // Each position:
        for (int x = 0; x <= frameSize-sizeX; x++) {
            for (int y = 0; y <= frameSize-sizeY; y++) {
                // Each size fitting within the frameSize:
                for (int width = sizeX; width <= frameSize-x; width+=sizeX) {
                    for (int height = sizeY; height <= frameSize-y; height+=sizeY) {
                        count++;
                    }
                }
            }
        }
    }
    

    结果是 162,336

    我发现接近 Viola & Jones 所说的“超过 180,000”的唯一方法是放弃假设 #4 并在代码中引入错误。这涉及将四行分别更改为:
    for (int width = 0; width < frameSize-x; width+=sizeX)
    for (int height = 0; height < frameSize-y; height+=sizeY)
    

    结果是 180,625 。 (请注意,这将有效地防止功能接触子框架的右侧和/或底部。)

    现在当然是问题:他们在实现过程中犯了错误吗?考虑表面为零的特征是否有意义?还是我看错了?

    最佳答案

    仔细一看,你的代码在我看来是正确的;这不禁让人怀疑原作者是否有一个逐一的错误。我想有人应该看看 OpenCV 如何实现它!

    尽管如此,一个更容易理解的建议是通过首先遍历所有大小,然后遍历给定大小的可能位置来翻转 for 循环的顺序:

    #include <stdio.h>
    int main()
    {
        int i, x, y, sizeX, sizeY, width, height, count, c;
    
        /* All five shape types */
        const int features = 5;
        const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
        const int frameSize = 24;
    
        count = 0;
        /* Each shape */
        for (i = 0; i < features; i++) {
            sizeX = feature[i][0];
            sizeY = feature[i][1];
            printf("%dx%d shapes:\n", sizeX, sizeY);
    
            /* each size (multiples of basic shapes) */
            for (width = sizeX; width <= frameSize; width+=sizeX) {
                for (height = sizeY; height <= frameSize; height+=sizeY) {
                    printf("\tsize: %dx%d => ", width, height);
                    c=count;
    
                    /* each possible position given size */
                    for (x = 0; x <= frameSize-width; x++) {
                        for (y = 0; y <= frameSize-height; y++) {
                            count++;
                        }
                    }
                    printf("count: %d\n", count-c);
                }
            }
        }
        printf("%d\n", count);
    
        return 0;
    }
    

    结果与之前的 162336 相同

    为了验证它,我测试了 4x4 窗口的情况并手动检查了所有情况(易于计数,因为 1x2/2x1 和 1x3/3x1 形状相同,仅旋转 90 度):
    2x1 shapes:
            size: 2x1 => count: 12
            size: 2x2 => count: 9
            size: 2x3 => count: 6
            size: 2x4 => count: 3
            size: 4x1 => count: 4
            size: 4x2 => count: 3
            size: 4x3 => count: 2
            size: 4x4 => count: 1
    1x2 shapes:
            size: 1x2 => count: 12             +-----------------------+
            size: 1x4 => count: 4              |     |     |     |     |
            size: 2x2 => count: 9              |     |     |     |     |
            size: 2x4 => count: 3              +-----+-----+-----+-----+
            size: 3x2 => count: 6              |     |     |     |     |
            size: 3x4 => count: 2              |     |     |     |     |
            size: 4x2 => count: 3              +-----+-----+-----+-----+
            size: 4x4 => count: 1              |     |     |     |     |
    3x1 shapes:                                |     |     |     |     |
            size: 3x1 => count: 8              +-----+-----+-----+-----+
            size: 3x2 => count: 6              |     |     |     |     |
            size: 3x3 => count: 4              |     |     |     |     |
            size: 3x4 => count: 2              +-----------------------+
    1x3 shapes:
            size: 1x3 => count: 8                  Total Count = 136
            size: 2x3 => count: 6
            size: 3x3 => count: 4
            size: 4x3 => count: 2
    2x2 shapes:
            size: 2x2 => count: 9
            size: 2x4 => count: 3
            size: 4x2 => count: 3
            size: 4x4 => count: 1
    

    关于algorithm - Viola-Jones 的人脸检测声称拥有 18 万个功能,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1707620/

    10-13 06:56