本文介绍了当数字与字母混合在一起时,如何使tesseract只识别数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用tesseract仅识别数字.问题是我混合了数字和数字.字母以及当我使用SetVariable("tessedit_char_whitelist", "0123456789")
时tesseract的每个符号返回错误的数字.

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789")
for every symbol tesseract returns wrong digit.

我可以设置一个阈值,以使tesseract省略相似度低的符号吗?

Can I set a threshold value so that tesseract omits the symbols with low resemblance?

注意:我将tesseract设置为仅识别数字,因此O和0之间没有混淆.

NOTE: I set tesseract to recognize only digits so there is no confusion between O and 0.

推荐答案

tesseract常见问题解答页面.请参阅该页面以获取更多信息,但是如果您具有版本3软件包,则已经设置了配置文件.您只需在命令行上指定:

Recognizing only numbers is actually answered on the tesseract FAQ page. See that page for more info, but if you have the version 3 package, the config files are already set up. You just specify on the commandline:

tesseract image.tif outputbase nobatch digits

关于阈值,我不确定您的意思.如果您输入的是不寻常的字体,也许您可​​以使用输入样本进行重新训练.另一种方法是更改​​tesseract的修剪阈值.常见问题解答中也提到了这两个选项.

As for the threshold value, I'm not sure which you mean. If your input is an unusual font, perhaps you might retrain with a sample of your input. An alternative is to change tesseract's pruning threshold. Both options are also mentioned in the FAQ.

这篇关于当数字与字母混合在一起时,如何使tesseract只识别数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-16 10:21