本文介绍了用pytesser识别简单数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PyTesserTesseract学习OCR.作为第一个里程碑,我想编写一个仅由一些数字组成的识别验证码的工具.我阅读了一些教程,并编写了这样的测试程序.

I'm learning OCR using PyTesser and Tesseract. As the first milestone, I want to write a tool to recognize captcha that simply consists of some digits. I read some tutorials and wrote such a test program.

from pytesser.pytesser import *
from PIL import Image, ImageFilter, ImageEnhance

im = Image.open("test.tiff")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
text = image_to_string(im)
print "text={}".format(text)

我用下面的图片测试了我的代码.但是结果是2(T?770.而且我还测试了其他一些类似的图像,在80%的情况下,结果是不正确的.

I tested my code with the image below. But the result is 2(T?770. And I've tested some other similar images as well, in 80% case the results are incorrect.

我对成像处理不熟悉.我在这里有两个问题:

I'm not familiar with imaging processing. I've two questions here:

  1. 是否可以告诉PyTesser仅猜测数字?

我认为图像很容易被人阅读.如果PyTesser很难读取仅数字的图像,是否有其他选择可以做得更好的OCR?

I think the image is quite easy for human to read. If it is so difficult for PyTesser to read digits only image, is there any alternatives can do a better OCR?

任何提示都非常感谢.

推荐答案

我认为您的代码还可以.它可以识别207770.问题出在pytesser安装上. pytesser中的Tesseract已过期.您将下载最新版本并覆盖相应的文件.您还可以编辑pytesser.py并进行更改

I think your code is quite okay. It can recognize 207770. The problem is at pytesser installation. The Tesseract in pytesser is out-of-date. You'd download a most recent version and overwrite corresponding files. You'd also edit pytesser.py and change

tesseract_exe_name = 'tesseract'

import os.path
tesseract_exe_name = os.path.join(os.path.dirname(__file__), 'tesseract')

这篇关于用pytesser识别简单数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 17:30