我正在寻找一个好的文本拼接库。我尝试了OpenCVOpenPano。它们在普通照片上都能很好地工作,但在文字上却不能。例如,我需要缝合以下3张图像:

opencv - 文字的全景拼接-LMLPHP
opencv - 文字的全景拼接-LMLPHP
opencv - 文字的全景拼接-LMLPHP

图像之间相互重叠约45%。

如果可以选择使提到的一个库在文本图像上运行良好,而不是寻找另一个库,那就太好了。

  • 我需要该库才能在linux arm上工作。
  • 最佳答案

    OpenPano无法缝合文本,因为它无法检索足够的特征点(或关键点)来进行缝合过程。
    文字拼接不需要一种对旋转稳定的匹配方法,而仅对翻译有用。 OpenCV方便地提供了这样的功能。它称为:Template Matching
    我将开发的解决方案基于此OpenCV的功能。

    管道
    现在,我将解释我的解决方案的主要步骤(有关更多详细信息,请参见下面提供的代码)。
    匹配过程
    为了匹配两个连续的图像(在matchImages函数中完成,请参见下面的代码):

  • 我们通过获取第一张图像的45%(H_templ_ratio)来创建模板图像,如下所示:

  • opencv - 文字的全景拼接-LMLPHP
    这一步是在我的代码中通过函数genTemplate完成的。
  • 我们在第二个图像(我们要在其中找到模板)上添加黑色边距。如果文本在输入图像中为且未对齐,则此步骤是必需的(尽管在这些示例图像中就是这种情况)。这是边距处理后图像的外观。如您所见,仅在图像下面和下面需要边距:

  • opencv - 文字的全景拼接-LMLPHP
    理论上,可以在该空白图像的任何位置找到模板图像。此过程在addBlackMargins函数中完成。
  • 我们在模板图像和我们要查找的图像上都应用了canny filter(在Mat2Edges函数内部完成)。这会将健壮性添加到匹配过程中。这是一个示例:

  • opencv - 文字的全景拼接-LMLPHP
  • 我们使用 matchTemplate 将模板与图像匹配,并使用 minMaxLoc 函数检索最佳匹配位置。

  • 计算最终图像尺寸
    此步骤包括计算最终矩阵大小,在此我们将所有图像拼接在一起。如果所有输入图像的高度都不相同,则特别需要此设置。
    此步骤在calcFinalImgSize函数内部完成。我不会在这里讨论太多细节,因为尽管它看起来有点复杂(至少对我而言),但这只是简单的数学运算(加法,减法,乘法)。如果您想了解这些公式,请使用笔和纸。
    拼接过程
    获得每个输入图像的匹配位置后,我们只需做简单的数学运算就可以在最终图像的正确位置复制输入图像。再次,我建议您检查代码以了解实现的详细信息(请参见stitchImages函数)。

    结果
    这是输入图像的结果:
    opencv - 文字的全景拼接-LMLPHP
    如您所见,结果不是“完美像素”,但对于OCR应该足够好了。
    这是另一个输入图像不同高度的结果:
    opencv - 文字的全景拼接-LMLPHP

    程式码(Python)
    我的程序是用Python编写的,并使用 cv2 (OpenCV)和 numpy 模块。但是,可以(轻松)将其移植为其他语言,例如C++,Java和C#。
    import numpy as np
    import cv2
    
    def genTemplate(img):
        global H_templ_ratio
        # we get the image's width and height
        h, w = img.shape[:2]
        # we compute the template's bounds
        x1 = int(float(w)*(1-H_templ_ratio))
        y1 = 0
        x2 = w
        y2 = h
        return(img[y1:y2,x1:x2]) # and crop the input image
    
    def mat2Edges(img): # applies a Canny filter to get the edges
        edged = cv2.Canny(img, 100, 200)
        return(edged)
    
    def addBlackMargins(img, top, bottom, left, right): # top, bottom, left, right: margins width in pixels
        h, w = img.shape[:2]
        result = np.zeros((h+top+bottom, w+left+right, 3), np.uint8)
        result[top:top+h,left:left+w] = img
        return(result)
    
    # return the y_offset of the first image to stitch and the final image size needed
    def calcFinalImgSize(imgs, loc):
        global V_templ_ratio, H_templ_ratio
        y_offset = 0
        max_margin_top = 0; max_margin_bottom = 0 # maximum margins that will be needed above and bellow the first image in order to stitch all the images into one mat
        current_margin_top = 0; current_margin_bottom = 0
    
        h_init, w_init = imgs[0].shape[:2]
        w_final = w_init
    
        for i in range(0,len(loc)):
            h, w = imgs[i].shape[:2]
            h2, w2 = imgs[i+1].shape[:2]
            # we compute the max top/bottom margins that will be needed (relatively to the first input image) in order to stitch all the images
            current_margin_top += loc[i][1] # here, we assume that the template top-left corner Y-coordinate is 0 (relatively to its original image)
            current_margin_bottom += (h2 - loc[i][1]) - h
            if(current_margin_top > max_margin_top): max_margin_top = current_margin_top
            if(current_margin_bottom > max_margin_bottom): max_margin_bottom = current_margin_bottom
            # we compute the width needed for the final result
            x_templ = int(float(w)*H_templ_ratio) # x-coordinate of the template relatively to its original image
            w_final += (w2 - x_templ - loc[i][0]) # width needed to stitch all the images into one mat
    
        h_final = h_init + max_margin_top + max_margin_bottom
        return (max_margin_top, h_final, w_final)
    
    # match each input image with its following image (1->2, 2->3)
    def matchImages(imgs, templates_loc):
        for i in range(0,len(imgs)-1):
            template = genTemplate(imgs[i])
            template = mat2Edges(template)
            h_templ, w_templ = template.shape[:2]
            # Apply template Matching
            margin_top = margin_bottom = h_templ; margin_left = margin_right = 0
            img = addBlackMargins(imgs[i+1],margin_top, margin_bottom, margin_left, margin_right) # we need to enlarge the input image prior to call matchTemplate (template needs to be strictly smaller than the input image)
            img = mat2Edges(img)
            res = cv2.matchTemplate(img,template,cv2.TM_CCOEFF) # matching function
            _, _, _, templ_pos = cv2.minMaxLoc(res) # minMaxLoc gets the best match position
            # as we added margins to the input image we need to subtract the margins width to get the template position relatively to the initial input image (without the black margins)
            rectified_templ_pos = (templ_pos[0]-margin_left, templ_pos[1]-margin_top)
            templates_loc.append(rectified_templ_pos)
            print("max_loc", rectified_templ_pos)
    
    def stitchImages(imgs, templates_loc):
        y_offset, h_final, w_final = calcFinalImgSize(imgs, templates_loc) # we calculate the "surface" needed to stitch all the images into one mat (and y_offset, the Y offset of the first image to be stitched)
        result = np.zeros((h_final, w_final, 3), np.uint8)
    
        #initial stitch
        h_init, w_init = imgs[0].shape[:2]
        result[y_offset:y_offset+h_init, 0:w_init] = imgs[0]
        origin = (y_offset, 0) # top-left corner of the last stitched image (y,x)
        # stitching loop
        for j in range(0,len(templates_loc)):
            h, w = imgs[j].shape[:2]
            h2, w2 = imgs[j+1].shape[:2]
            # we compute the coordinates where to stitch imgs[j+1]
            y1 = origin[0] - templates_loc[j][1]
            y2 = origin[0] - templates_loc[j][1] + h2
            x_templ = int(float(w)*(1-H_templ_ratio)) # x-coordinate of the template relatively to its original image's right side
            x1 = origin[1] + x_templ - templates_loc[j][0]
            x2 = origin[1] + x_templ - templates_loc[j][0] + w2
            result[y1:y2, x1:x2] = imgs[j+1] # we copy the input image into the result mat
            origin = (y1,x1) # we update the origin point with the last stitched image
    
        return(result)
    
    if __name__ == '__main__':
    
        # input images
        part1 = cv2.imread('part1.jpg')
        part2 = cv2.imread('part2.jpg')
        part3 = cv2.imread('part3.jpg')
        imgs = [part1, part2, part3]
    
        H_templ_ratio = 0.45 # H_templ_ratio: horizontal ratio of the input that we will keep to create a template
        templates_loc = [] # templates location
    
        matchImages(imgs, templates_loc)
    
        result = stitchImages(imgs, templates_loc)
    
        cv2.imshow("result", result)
    

    关于opencv - 文字的全景拼接,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45612933/

    10-11 23:00
    查看更多