问题描述
我希望通过pytesseract改善数字识别的性能.
我拍摄我的原始图像并将其分成如下所示的部分:
大小可能会有所不同.
为此,我采用了一些类似的预处理方法
image = cv2.imread(im,cv2.IMREAD_GRAYSCALE)图片= cv2.GaussianBlur(图片,(1,1),0)内核= np.ones((5,5),np.uint8)result_img = cv2.blur(img,(2,2),0)result_img = cv2.dilate(result_img,内核,迭代次数= 1)result_img = cv2.erode(result_img,内核,迭代次数= 1)
我明白了
然后我将其传递给pytesseract:
num = pytesseract.image_to_string(result_img,lang ='eng',config ='-psm 10 -oem 3 -c tessedit_char_whitelist = 0123456789')
但这对我来说还不够好,并且经常会弄错数字.
我正在寻找改进的方法,我试图保持这种最小化和自我约束,但是如果我不清楚,请告诉我.
谢谢.
在执行OCR之前尝试对图像进行预处理,但使用了不正确的方法,因此您处在正确的轨道上.由于这些操作主要用于去除较小的噪声颗粒,因此没有理由扩大或侵蚀图像.此外,您当前的输出不是二进制图像.看起来它只包含黑白像素,但实际上是3通道BGR图像,这可能就是为什么您获得错误的OCR结果的原因.如果您查看
Pytesseract OCR的结果
8
代码
import cv2导入pytesseractpytesseract.pytesseract.tesseract_cmd = r"C:\ Program Files \ Tesseract-OCR \ tesseract.exe"#加载图像,灰度,大津的阈值,反转图片= cv2.imread('1.png')灰色= cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)阈值= cv2.threshold(灰色,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]反转= 255-脱粒#OCR数据= pytesseract.image_to_string(反转,lang ='eng',config ='-psm 6')打印(数据)cv2.imshow('thresh',thresh)cv2.imshow('invert',invert)cv2.waitKey()
Hi I am looking to improve my performance with pytesseract at digit recognition.
I take my raw image and split it into parts that look like this:
The size can vary.
To this I apply some pre-processing methods like so
image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)
and I get this
I then pass this to pytesseract:
num = pytesseract.image_to_string(result_img, lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
However this is not good enough for me and often gets numbers wrong.
I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.
Thank you.
You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string
. We use the --psm 6
configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:
Input image ->
Binary ->
Invert
Result from Pytesseract OCR
8
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
这篇关于如何用Pytesseract文本识别提高OCR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!