问题描述
我想使用tesseract
仅识别数字.问题是我混合了数字和数字.字母以及当我使用SetVariable("tessedit_char_whitelist", "0123456789")
时tesseract的每个符号返回错误的数字.
I want to use tesseract
to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789")
for every symbol tesseract returns wrong digit.
我可以设置一个阈值,以使tesseract
省略相似度低的符号吗?
Can I set a threshold value so that tesseract
omits the symbols with low resemblance?
注意:我将tesseract
设置为仅识别数字,因此O和0之间没有混淆.
NOTE: I set tesseract
to recognize only digits so there is no confusion between O and 0.
推荐答案
在 tesseract常见问题解答页面.请参阅该页面以获取更多信息,但是如果您具有版本3软件包,则已经设置了配置文件.您只需在命令行上指定:
Recognizing only numbers is actually answered on the tesseract FAQ page. See that page for more info, but if you have the version 3 package, the config files are already set up. You just specify on the commandline:
tesseract image.tif outputbase nobatch digits
关于阈值,我不确定您的意思.如果您输入的是不寻常的字体,也许您可以使用输入样本进行重新训练.另一种方法是更改tesseract的修剪阈值.常见问题解答中也提到了这两个选项.
As for the threshold value, I'm not sure which you mean. If your input is an unusual font, perhaps you might retrain with a sample of your input. An alternative is to change tesseract's pruning threshold. Both options are also mentioned in the FAQ.
这篇关于当数字与字母混合在一起时,如何使tesseract只识别数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!