我正在尝试从 python 调用 this cpp function:

TESS_API BOOL TESS_CALL TessBaseAPIProcessPages(TessBaseAPI* handle, const char* filename,
  const char* retry_config, int timeout_millisec, TessResultRenderer* renderer)
{
    if (handle->ProcessPages(filename, retry_config, timeout_millisec, renderer))
        return TRUE;
    else
        return FALSE;
}

该函数的最后一个参数是 TessResultRenderer 。有 another cpp function 用于创建 TessResultRenderer
TESS_API TessResultRenderer* TESS_CALL TessTextRendererCreate(const char* outputbase)
{
    return new TessTextRenderer(outputbase);
}

现在,在从我的 python 中调用它时,我执行了以下操作:
outputbase = "stdout"
renderer = tesseract.TessTextRendererCreate(outputbase)
text_out = tesseract.TessBaseAPIProcessPages(api,
     ctypes.create_string_buffer(path),
     None, 0, renderer) //Segmentation fault (core dumped) error on this line

但我不断收到 Segmentation fault 错误。

我的问题是如何从 Python 调用 TessBaseAPIProcessPages

代码库中的更多引用链接:

referer api

Implementation of processPages(...)

编辑

在尝试了评论的建议后,我执行了以下操作,但出现错误:item 1 in _argtypes_ has no from_param method
PTessResultRenderer = ctypes.POINTER(TessResultRenderer)
self.tesseract.TessTextRendererCreate.restype = PTessResultRenderer
outputbase = "stdout"
self.tesseract.TessTextRendererCreate.argtypes = [outputbase] #error here
self.tesseract.TessTextRendererCreate

ReturnVal = ctypes.c_bool
self.tesseract.TessBaseAPIProcessPages.argtypes = [self.api, path, None, 0, PTessResultRenderer]
self.tesseract.TessBaseAPIProcessPages.restype = ReturnVal
self.tesseracto.TessBaseAPIProcessPages

class TessResultRenderer(ctypes.Structure):
    pass

最佳答案

在 contrib 文件夹中有一个使用来自 ctypes 的 tesseract C-API 的示例。不过好像有点过时了。 contrib/tesseract-c_api-demo.py

您需要为一些方法设置 restypeargtypes。另外,不要忘记在处理程序上调用 init 函数。下面的例子对我有用。它将英文名为“test.bmp”的文件中的文本读取到 text 变量中。

from ctypes import *
from ctypes.util import find_library

lang = b"eng"
filename = b"test.bmp"
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata"

path = find_library("libtesseract.dylib")
tesseract = CDLL(path)

class TessBaseAPI(Structure):
    pass
class TessResultRenderer(Structure):
    pass

tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI)
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p]
tesseract.TessBaseAPIInit3.restype = c_bool
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)]
tesseract.TessBaseAPIProcessPages.restype = c_bool
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)]
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p

api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang);
if (rc):
    tesseract.TessBaseAPIDelete(api)
    print("Could not initialize tesseract.\n")
    exit(3)

success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None)

if success:
    text = tesseract.TessBaseAPIGetUTF8Text(api)
    print("="*78)
    print(text.decode("utf-8").strip())
    print("="*78)

输出如下所示:
==============================================================================
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.
==============================================================================

编辑:按照 eryksun 的建议,将 c_void_p 的使用替换为不透明类型。谢谢!

关于python - 从 Python 调用 cpp 函数时出现段错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36871072/

10-11 22:30
查看更多