问题描述
我有一个tiff文件,其中包含一些由制表符分隔的文本(4个空格).但是,当我从该tiff图像文件中提取文本时,我总是在两列之间得到一个空格.一个示例示例:
I had a tiff file, which contain some text separated by tabs (4 spaces). But when I extract text out of this tiff image file, i always get a single space between two columns. A sample example:
TIFF IMAGE:
col-a col-b col-c
desired output:
col-a col-b col-c
but I am getting the following:
col-a col-b col-c
我尝试使用相同格式的多个图像进行此操作,但结果始终相同.如何解决此问题?我可以训练tesseract了解这一点吗?
I tried this with multiple images of same format, but the result is always the same.How do I fix this issue ? Can I train tesseract to understand this?
推荐答案
Tesseract将连续的空格压缩为一个.您将需要修改baseapi.cpp
以保留空格.可以在以下帖子中找到代码更改:
Tesseract compresses consecutive spaces into one. You would need to modify baseapi.cpp
to preserve the spaces. The code change can be found in the following posts:
这篇关于Tesseract-空间和制表符中的歧义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!