本文介绍了从图像python中识别明文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 pytesseract 从图像中识别文本

i used pytesseract to identify text from image

pytesseract.pytesseract.tesseract_cmd = r'C:Program FilesTesseract-OCR	esseract.exe'

然后我用下面的代码来识别文本

then i used below code to identify text

textImg = pytesseract.image_to_string(Image.open(imgLoc+"/"+imgName))

print(textImg)
text_file = open(imgLoc+"/"+"oriText.txt", "w")
text_file.write(textImg)
text_file.close()

这是我的输入图像

这是我的输出文本文件的图像

有什么办法可以从图片中清楚地识别出文字

is there any way to identify the text clearly from image

推荐答案

您可以尝试通过缩短字符集来改善结果,并且只允许在您的特定语言中合法的字符(排除数字、特殊字符等).这个答案会有所帮助.

Your can try improving the results by shortening the character set, and only allowing characters that are legal in your particular language (exclude numbers, special characters etc) . This Answer will help.

Tesseract OCR 在识别图像中的字符方面并不是最好的.您可以尝试稍微处理一下图像,以改善结果.这会有所帮助

Tesseract OCR isn't the best at figuring out characters in a image. Your can try processing the image a bit, in order to improve the results. This will help

  • 确保图像 dpi/ppi 高于 250,否则结果可能会不准确.

我通常更喜欢这个网站 www.onlineocr.net 进行光学字符识别,因为每次的结果几乎都是完美的.您可以尝试使用他们自己的 API 来进行字符识别(需要互联网连接才能正常工作).使用此 API 获得的结果远远优于 tesseract OCR.所以你可以试一试.

I generally prefer this website www.onlineocr.net for doing Optical Character Recognition as the results are almost perfect each time. Your can try using their own API, for doing character recognition (requires internet connectivity to be functional). The Results obtained by using this API, are far superior then from tesseract OCR. So you may give it a try.

这篇关于从图像python中识别明文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 04:45
查看更多